Re: [squid-users] Re: ICP and HTCP and StoreID from Nikolai Gorchilov on 2014-02-14 (squid-users)

From: Nikolai Gorchilov <niki_at_x3me.net>
Date: Fri, 14 Feb 2014 13:38:25 +0200

On Fri, Feb 14, 2014 at 7:22 AM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 02/13/2014 06:05 PM, Nikolai Gorchilov wrote:
>> On Thu, Feb 13, 2014 at 10:04 PM, Alex Rousskov wrote:
>>> AFAICT, if Squid always uses URLs for anything
>>> outside internal storage, everything would work correctly and all use
>>> cases will be supported well, without any additional options.
>
>> I believe this optimization covers the most common scenario when using
>> cache peers and StoreID at the same time.
>
> Yes, but I suspect that important scenario is relatively rare. And even
> if it was common, we should not break a protocol and an interface design
> principle to optimize one important use case, especially when that case
> can be optimized using different means.

OK.

> If you want to add an option to use the received ICP reqnum field as a
> public cache key for lookup, you should be allowed to do that IMO. If
> you want to add an option to add Store ID to ICP and HTCP requests, you
> should be allowed to do that too. AFAICT, each will give you the
> performance optimization you want without violating protocols and
> interfaces.

As you also noticed, extending UDP request size with additional
parameter will bump into packet size limits more often.

>> There's almost no practical
>> sense to have different cache peers using different StoreID logic.
>> They either use the same rewriter, or use no rewriter at all. Seems
>> common sense for me.
>
> Sure! Or one of them is using a rewriter and the other one does not use
> it at all (or is not even running Squid software). Or both of them were
> using a rewriter yesterday, but one of them was changed to use no
> rewriter today. Or there is now a load balancer/traffic auditing hop
> that blocks or complains about ICP/HTCP requests with bogus URLs.
>
> Using internal StoreIDs instead of URLs for proxy-to-proxy communication
> introduces too many problems to be a viable general solution. Yes, in a
> tightly controlled cache hierarchy, it is technically possible to throw
> all those considerations away to gain a few extra processing
> milliseconds, but that is just not enough of a reason to support that as
> a general solution. And, again, there may be two ways to save those
> milliseconds without introducing serious problems.

Fully I agree with all of the above. My original suggestion was to
keep the current behaviour as an option to the cache_peer directive,
instead of cutting it away completely.

>> Maybe I'm wrong, but AFAIK Squid never uses "slow" processing methods
>> on incoming ICP/HTCP requests. Passing the incoming ICP/HTCP requested
>> URL via the StoreID will change this design principle.
>
> Lack of async code is not really a design principle and I am guessing
> that HTCP is already async by the very nature of TCP message processing
> (i.e., Squid may read a partial message). It is just that the code never
> needed an async step [badly enough]. However, with both of the solutions
> I am suggesting above, that async step is still not needed!

I'm confused what solutions do you refer to.

>>> If somebody wants to extend ICP/HTCP to include StoreId in the request
>>> (as an optional additional field), they may do so, but that optional
>>> optimization does not change the overall design principle: StoreId for
>>> the internal storage; URL for everything else.
>>
>> Let's put it another way: if we need correct UDP_HIT/UDP_MISS
>> responses between peers using StoreID we have to compromize on either
>> one of the following design priciples:
>> - "Squid always uses URLs for anything outside internal storage"
>> - "Squid never uses slow processing on UDP requests"
>>
>> Please correct me if I'm wrong.
>
> Using ICP reqnum as a cache key or adding StoreID to ICP/HTCP requests
> does not compromise either AFAICT.
>
>
>> What is important for me is to be able to properly answer incoming UDP
>> requests that require StoreID normalization (UDP_HIT/UDP_MISS), and
>> later, when the actual HTTP request comes, to be able to refresh the
>> object if refresh logic requires to do so.
>
> Would using ICP reqnum field as a cache key or adding StoreID to
> ICP/HTCP requests work for your use cases? I have not fully checked
> whether the former is possible, but I think it is. The latter is
> possible, but is more difficult to implement (and will bump into UDP
> packet size limits more often?).

Yep. Both will do. I personally prefer the second option - StoreID URL
normalization on incoming ICP/HTCP request, in order to avoid packet
size bumps as much as possible. Especially if we memcache the StoreID
for later use, when the eventual HTTP request arrives few seconds
later. Caching to be considered as wish, not a must-have :)

>> Current implementation prevents the refresh.
>
> We know that the current implementation is broken, no questions about
> it! IIRC, the developer responsible for that breakage promised to fix it
> when the code with a known breakage was committed, but even if he does
> not, Amos, I, or others will [eventually].

I got the attention of one of my team mates regarding this.
Unfortunately he didn't have enough time to comprehend the complete
Squid logic, thus the ugliness of his patch. If you provide me with
high-level design advise for a proper fix, we may try to produce one.

Best,
Niki
Received on Fri Feb 14 2014 - 11:39:17 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 14 2014 - 12:00:04 MST