Re: [squid-users] Caching identical items from a dynamic URL

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 11 Dec 2010 13:49:23 +1300

On 11/12/10 10:59, Volker-Yoblick, Adam wrote:
> Greetings,
>
> I've got a fairly unique problem that maybe someone can assist with.
>
> I'm sending files to a machine through my cache, but part of the URL
> is dynamic, even if the file is exactly the same. For example, the
> lines in my access.log all look like this:
>
> GET http://1.2.3.4/foo/<GUID>/bar/abc.txt
>
> Where GUID is different for every single deploy, even if the file is
> exactly the same. This is done by creating a virtual directory that
> points to a fixed location, but the name of the virtual directory is
> a GUID, and changes on every run. This system is already in place,
> and cannot be changed.
>
> I have found that the files are NEVER served from the cache when the
> GUID is different, even if the file MD5 is exactly the same. Every
> single fill is a cache miss, every time. (I've verified that I DO get
> cache hits across multiple deploys when the GUID is the same)
>
> I imagine this is because squid is using the full URL to determine
> whether or not the file is cached, either by including it in the MD5
> hash, or using it as the lookup, or something similar.

It is. That is how HTTP works.

You can work around such broken server software internally with
storeurl_rewrite, but this does nothing to reduce the external bandwidth
costs added unnecessarily by your nasty backend.

If the client software is capable of handing 30x redirects I recommend
performing one from all those GUID paths back to the actual data URI:

   acl guidBounce urlpath_regex ^/foo/[^/]+/bar/abc.txt$
   deny_info http://1.2.3.4/foo/bar/abc.txt guidBounce
   http_access deny guidBounce

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.9
   Beta testers wanted for 3.2.0.3
Received on Sat Dec 11 2010 - 00:49:28 MST

This archive was generated by hypermail 2.2.0 : Mon Dec 13 2010 - 12:00:02 MST