> > If you want to search on URL regex's (e.g. blocking partial sites), that's
> > indeed a problem.
>
> Well, that's a problem no matter how you hash the list, but it isn't too
> difficult to take care of. Assuming you don't mind ugly data structures,
> that is...
The most efficient way to deal with this is to restrict wildcards
within a single directory or leaf name, ie don't allow them to
match "/". Then anchor your searches with the components of a
path that are fixed. Maybe allow a special notation (like /.../)
to match multiple levels in a URL. This should cut down the
search space drastically, especially if you use a hash table
for the fixed components. I.e. don't check every string against
the URL; check every part of the URL against all the strings at
once in the hash table. Only when that fails, check all the
patterns containing only wildcards and no fixed components against
each segment of the URL. You'd probably also want a special case
hash table for extensions, as that's a cheap win for little
extra effort.
G
Received on Fri May 09 1997 - 22:56:59 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:35:09 MST