On Fri, 13 Feb 1998, Micah Beck wrote:
> I'm interested in modifying Squid's behavior when there more than one
> IP address is returned from the DNS, so I thought it good to analyze it first.
> >From reading the code, I came up with this characterization:
>
> 1. All returned IP addresses are stored in a list
> 2. The first element of the list is always used
> 3. If the first element of the list is bad, the last element of the
> list is then copied to the first element and the length of the list
> is decreased.
> 4. When the length of the list reaches zero, that's bad.
>
> Can anyone verify that I have this right, or tell me if I've got it wrong?
> I'm planning to experiment with the use of SONAR to choose the closest one.
> Has anyone tried something similar?
Your description of the IP cache is basically correct. The first address
is not always used -- see the "ipcacheCycleAddr()" function in ipcache.c.
And when the length of the list reaches zero, it's not necessarily bad --
the ipcache entry is simply removed, forcing another DNS lookup the next
time that host is accessed.
You may want to take a look at the connection-retry patch at
<ftp://ftp.comshare.com/pub/squid>. This patch substantially modifies the
behaviour of the ip cache in the interests of retrying failed connections
to a multi-address host.
The behaviour is roughly this:
1. All returned addresses are stored in a list.
2. Every address is marked "OK" at the outset.
3. All the addresses which are currently marked "OK" are cycled through
round-robin style on each connection request.
4. If a connection to an address fails, that address is marked "BAD" in
the list, and the connection is retried using the next "OK" address in
the list. Repeat until success or all addresses tried.
5. If there are no more "OK" addresses, all the now-BAD addresses are
tried one more time.
6. If a formerly-BAD address now accepts a connection, it is marked "OK"
and used to fetch the request, and the ipcache entry's TTL is reset to
Squid's default IP cache TTL as defined in the conf file.
7. And finally, if none of the BAD addresses can be connected, an error is
returned to the user, and the ipcache entry's TTL is left as-is.
8. When a cache entry's TTL expires, it is removed from the cache, but not
under any other circumstance.
My ipcache hit rate is routinely in the high 99%. Take a look at
<http://proxy.comshare.com/cgi-bin/cachemgr.cgi>'s IP cache info page
(hostname "proxy.comshare.com", no password needed for that data). Note
the "-BAD" and "-OK" suffixes on each address in a given entry, and the
number in parens next to the number of addresses column indicates the
number of bad addresses.
Sites are usually less busy in the morning, so it probably won't be as
interesting to look at as it will be later in the day -- there's currently
no addresses marked bad. (Oops, now there's a few -- adex3.flycast.com has
five bad out of 24.) Plus I had a system crash (got overzealous with
memory allocation, combined with a heavy beating of the proxy from UK
users) this morning at 4:00am which reset the counters, so the hit rate
won't be as impressive as it will be next week.
I'm not sure how valuable SONAR-tracking on multi-IP address sites would
be, however.
The only sites that really use multi-address hostnames extensively are the
large sites running non-UNIX web servers that can't handle heavy traffic
alone, or the financial and other sites that need high-availability. And
for the most part, all the addresses are on the same network or group of
networks with similar or identical paths to reach them.
Take a look at the above-mentioned cachemgr.cgi IP cache info output, and
you'll see what I mean. I tend to doubt that there'd be a
significant-enough difference in performance between the addresses of a
single hostname to make trying to figure it out worthwhile, but it does
sound like an interesting experiment.
-Mike Pelletier.
Received on Fri Feb 13 1998 - 17:58:55 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:38:53 MST