Re: [squid-users] Strange Problem regarding Accept-Encoding and compression / Regex anyone?

From: Stefan Hartmann <hartm_at_odn.de>
Date: Tue, 28 Apr 2009 12:41:45 +0200

Amos Jeffries wrote:
>> Hello,
>>
>> i am running squid as reverse proxy in front of a web server farm. We
>> are trying to implement Content-Compression, and it gets broken from
>> "time to time".
>>
>> The www-servers are windows IIS 5, and the compression is done using a
>> ISAPI Filter (no, not the original broken M$ filter from the server).
>>
>> We are using Version 2.7.STABLE6 in our setup. The www-servers are all
>> sending a "Vary: Accept-Enconding" header, and the setup is working
>> perfectly in my test scenarios. We have no "broken_vary_encoding"
>> configured, and no ETag in the responses (we are only using Expire:
>> Headers).
>>
>> We installed the ISAPI Filters last week without putting the "Vary:
>> Accept-Encoding" header on the www-servers in place, and blocked the
>> "Accept-Encoding:" and the "Vary:" headers at squid, waiting for
>> maintainance window to activate it. The site worked without any problem.
>>
>> During the last maintainance window, we activated the "Accept-Encoding:"
>> and "Vary:" Headers (no longer blocking it in squid), and set up the
>> WWW-Servers to send "Vary: Accept-Encoding" headers, and it works -
>> sometimes with some browsers.
>>
>> The failure we see are content-pages which are ending after some kB of
>> correct data. ie the homepage is about 150 kB uncompressed, compressed
>> around 30 kB (this is why we want compression), and the Serverfarm
>> delivered Content-Pages consisting of the first 18 to 25 kB
>> (uncompressed, different sizes possible) of the complete page, never
>> coming to an end. This never happened in our test setup.
>>
>> The pages were (as intended) cached by squid, so we had the situation
>> that for example Internet Explorer was working, but Firefox got the
>> short page. And vice versa, sometimes Firefox worked, but IE failed. And
>> sometimes all browsers worked.
>>
>> From tonight logs:
>> 11:30 pm to 01:00 am: IE pages broken
>> 01:00 am to 09:30 am: all working
>> 09:30 am to 10:15 am: Firefox pages broken
>> 10:15 am to 11:00 am: IE and Firefox pages broken
>>
>> Ok, perhaps the ISAPI filter is faulty in some conditions we did not
>> test, with some browsers or bots or... so we uninstalled the ISAPI
>> filter from all WWW-Servers, but left the "Vary: Accept-Encoding" header
>> in place.
>>
>> Result: The error did not stop! I had to block the "Accept-Encoding:"
>> and "Vary:" in squid to get the site working properly.
>>
>> Next step was to remove the Vary: Header from the WWW-Servers and not
>> blocking the "Accept-Encoding:" and "Vary:" headers in squid: the site
>> is working properly.
>>
>> So... are there any issues regarding squid and WWW-Servers sending
>> "Vary: Accept-Encoding" (without actually doing Content-Compression)?
>>
>> When the error occurs, our logs are showing connections with "short"
>> pages (ie 18 kB vs. 150 kB normaly), which are obviously aborted after
>> 900 seconds:
>>
>> Mon Apr 27 15:38:29 2009 900301 111.111.111.111 TCP_MISS/200 18806 GET
>> http://real.server.de/ - DEFAULT_PARENT/real.server.de text/html
>> [
>> Accept-Encoding: gzip, deflate
>> User-Agent: Nutscrape/1.0 (CP/M; 8-bit)
>> Host: real.server.de
>> Cookie: WT_SET=id=213.253.......
>> Cache-Control: max-age=259200
>> ]
>>
>> [
>> HTTP/1.0 200 OK
>> Date: Mon, 27 Apr 2009 13:23:29 GMT
>> X-Powered-By: ASP.NET
>> X-AspNet-Version: 2.0.50727
>> Realserver-info: BuildTime: 27.04.2009 15:23:29; TimeSpan:
>> 00:00:02.6719434; CacheTime: 120; Server: WWW31
>> Publisher: Real-Server
>> Expires: Mon, 27 Apr 2009 13:25:29 GMT
>> Content-Type: text/html; charset=iso-8859-1
>> Content-Length: 168830
>> X-Cache: HIT from accel3
>> Connection: close
>> ]
>>
>> Please help!
>>
>
> Since the browser seems to be eratic, I assume that one particular client
> request is causing some bad data to enter squid cache and being served for
> all following clients for a period.
> Look to the requests at the beginning of the time when things break. If
> you can find the exact conditions or client it will be much easier to
> track through the logs on later occurances.
>
> It sounds a little bit like:
> http://squidproxy.wordpress.com/2008/04/29/chunked-decoding/
>
> except for a few factors that don't fit:
> IIS 5 is not known for this issue,
> 2.7 has a decoding hack to fix it
> and Vary: seemed to show relevance.
>
> I'd try raising the debug_options levels for request processing a bit and
> see what becomes visible.

Amos,

thanks for the reply. debugging is somewhat tricky, since the serverfarm
has to handle lots of traffic (around 200 Mio content pages per month)
and debugging the real servers would generate a (too) huge amount of
data. And in my test scenario i don`t get the error...

I will try to filter the "bad" requests. The idea is to stop the
Accept-Encoding headers if the are "crazy", ie (all seen live)

Accept-Encoding: FFFF, FFFFFFF
Accept-Encoding: mzip, meflate
Accept-Encoding: identity, deflate, gzip
Accept-Encoding: gzip;q=1.0, deflate;q=0.8, chunked;q=0.6,
identity;q=0.4, *;q=0
Accept-Encoding: gzip, deflate, x-gzip, identity; q=0.9
Accept-Encoding: gzip,deflate,bzip2
Accept-Encoding: nnnnndeflate
Accept-Encoding: x-gzip, gzip
Accept-Encoding: gzip,identity
Accept-Encoding: gzip, deflate, compress;q=0.9
Accept-Encoding: gzip,deflate,X.509

and only let pass these two:

Accept-Encoding: gzip,deflate
Accept-Encoding: gzip, deflate

first one is Firefox, the other is IE. This will match in about 80-90%
of all requests, which would be ok.

so i tried

acl zipit req_header Accept-Encoding ^gzip,deflate$
acl zipit req_header Accept-Encoding ^gzip, deflate$
[...]
header_access Accept-Encoding allow zipit

but something seems to be wrong with the regex above, squid will let
pass not only "gzip,deflate" as i would expect but also
"gzip,deflate,xx" and "gzip,xx". "bla" will be blocked.

seems like squid will let pass the header if it starts with gzip,
disregarding the rest. am i wrong with my regex?

Regards,
Stefan

-- 
09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-c0
---
OnlineDienst Nordbayern   | http://www.odn.de/    | Internet-Systemhaus
GmbH & Co.KG              | E-Mail: hartm_at_odn.de  | Hosting, Housing
Steinstr. 19              | Tel: 0911 / 933877-0  | Consulting, VoIP
90419 Nuernberg - Germany | Fax: 0911 / 933877-55 | Programmierung
GF Christiane Teichgräber | AG Nürnberg HRA 13304 |

Received on Tue Apr 28 2009 - 10:41:50 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 29 2009 - 12:00:02 MDT