This is a discussion on Re: Benchmarks - Theory ; On Nov 7, 10:54 am, Jerry Coffin <jcof...@taeus.com> wrote: > In article <87zlkbph9z....@nonospaz.fatphil.org>, > thefatphil_demun...@yahoo.co.uk says... > > [ ... hash table operations ] > > > The insert can be quite a complicated procedure. It might be > > amortized ...
On Nov 7, 10:54 am, Jerry Coffin <jcof...@taeus.com> wrote:
> In article <87zlkbph9z....@nonospaz.fatphil.org>,
> thefatphil_demun...@yahoo.co.uk says...
>
> [ ... hash table operations ]
>
> > The insert can be quite a complicated procedure. It might be
> > amortized O(1), but worst-case, it's O(epic fail).
>
> That depends on how the hash table is designed. It's entirely possible
> (for one example) to create a hash table that uses balanced trees for
> collision resolution. This gives a worst case of O(log N), which isn't
> really all that awful at all.
I like this scheme:
Say that you want a hash table of 2^20th elements...
Create a hash table of 2^20th elements,
Create a shadow hash table of 2^16th elements
Create a shadow hash table of 2^12th elements
Create a shadow hash table of 2^8th elements, each of which is a
skiplist holding the data type.
You form your 32 bit hash using some nice algorithm like Bob Jenkin's
hash or Goulburn hash.
You mask off the lower 20 bits and this is the address for the large
table.
If there is a collision, you mask off 4 more bits and use the second
largest table.
If there is a collision, you mask off 4 more bits and use the third
largest table.
If there is a collision you take the low byte of the hash and insert
into that skiplist.
If you get a lot more data than you expect, you construct a new larger
table of 2^24th elements and put it in the front as the next new
layer.
Of course, if you want, you can rehash the data each time, but I think
it is better to simply mask. That way, if you suddenly start growing
your skiplists and the first level tables are lightly populated, then
you can detect what is obviously a denial of service attack.