Re: thoughts on memory usage... from Oskar Pearson on 1997-08-21 (squid-dev)

From: Oskar Pearson <oskar@dont-contact.us>
Date: Thu, 21 Aug 1997 10:29:39 +0200

--MimeMultipartBoundary
Content-Type: text/plain; charset=us-ascii

Hi all.

> > Agreed, MD5 or HSA (HSA is the default/preferred algorithm in the Linux
> > kernel, but I'm not sure if that's just because of cryptographic security,
Anyone got an HSA reference anywhere?

> > patent restrictions, performance or whatever... and we'd be more worried
> > about a combination of performance and collision probability) would be

> Well, looking at the web page, it doesn't look like hash collision is
> really something we'd need to worry about. We're safe by about 20
> orders of magnitude... :) (with 16 byte hashs)

> > works...). When a new request for the URL is recieved, it can be decided
> > if the object is out of date or not since you now have the real URL.

> Or you can tag the object with the appropriate rule when you build it?
> i.e. You get a request for foobar.gif, you build the object, store it
> on disk, and then run down the refresh_rules to see which one matches,
> and then say "rule 6 is it for this one!", and voilo!
what if you change your refresh patterns?

As far as I know the decision either to refresh or not to refresh an
object is taken at the time of the 'subsequent request', since we
don't do garbage collection, and instead use LRU to expire objects.

Do the other expiration policies that people have been playing with
require the URL? If so, we could combine some of this stuff...

Obvious suggestion:
Keep the cache/log file with the URLs, but also keep another log
in binary format that squid can use to quickly locate objects. Expiration
runs from the 'original' log.

> Actually I think I can see all sorts of wild advantages to that....
Ok - I give up.. :)

> > The idea of storing extra metadata in the objects in the cache is
> > interesting (the putting the url at the beginning). Allowing for a

> > so nice because of the performance of ICP queries (or maybe just give a
> > false "yes" and then return a tcp denied and fix up squid in a way that
> In for a penny, in for a pound. If you trust MD5, which bother sending
> the entire URL over in the ICP? that's just a waste of bandwidth and
> CPU time. Just send over the md5 signature.... half a :)

I think that we can pretty much trust MD5. It also has the advantage
of being an open standard - it's an rfc.

> > * the URLs could be kept on disk as the first line of the cache swap file.
> > it would nuke all existing caches, but, this would only be in the
> > upgrade to 1.2 for most users, and if they are patient enough they
> > *could* run a script... this would then mean that people wouldn't loose

> Still not sure why you need the URLs that much. why not write them to
> a seperate log? if reindexing is an oddity, then you're not really
> fussed about the speed....

Since we are going to have a pre-computed hash, are we going to change
the disk-layout, Duane? Why compute twice, and all that.

I think the following:
Trust MD5 for ICP queries, but keep the URL (and download time) in the file
(mainly to support different expires-systems, and to allow you to
save your cache contents if you lose the index),
add support to ICP to handle MD5 hashes as the request-string. If we are
changing the structure of the cache directory so we don't have to do someting
'computationally expensive' (calculate MD5 and then calculate the
current hash-system we use to find the path to the file) we don't have
to worry about the structure of the cache-layout changing, since it will.

My question is rather:
Do we want to calculate the MD5 hash over only the URL, or do we want
to include the headers in the md'd contents (this would allow us to
cache pages with cookies, for example).

There are various 'political' advantages to not doing this now... most
of them hang around allowing the internet-drafts for ICP to
actually become standards, and make the world a better place. If we make
drastic changes to ICP now, this probably wont' happen, and if we
bring out a new version of ICP very shortly after making the current
version a standard, we cause more problems. The good standards (nntp,
ftp,http,smtp) haven't changed... almost ever. Of course, squid has always
been far ahead of the other cache software, and for the most part
hasn't been able to talk to them except through the
'lowest common denominator', http.

Oskar

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:42 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:24 MST