Objectmix
Tags Register Mark Forums Read

Benchmarks : c++

This is a discussion on Benchmarks within the c++ forums in Programming Languages category; Richard Harter wrote: > .... snip ... > > Just as a note, the common claim that hash table are O(1) per > access whereas search trees are O(log n) per access is > misleading, and is arrived at by an apples and oranges > comparison. The number of probes in a hash table is O(1) whereas > the number of probes in a search tree is O(log n). However the > cost of computing the hash code for independent keys is O(m) > where m is the average key length, which is necessarily greater > than log2(n). In comparison ...



c++ comp.lang.c++ usenet archive

Reply

 

LinkBack Thread Tools
  #21  
Old 11-07-2008, 03:56 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

Richard Harter wrote:
>

.... snip ...
>
> Just as a note, the common claim that hash table are O(1) per
> access whereas search trees are O(log n) per access is
> misleading, and is arrived at by an apples and oranges
> comparison. The number of probes in a hash table is O(1) whereas
> the number of probes in a search tree is O(log n). However the
> cost of computing the hash code for independent keys is O(m)
> where m is the average key length, which is necessarily greater
> than log2(n). In comparison based trees the cost of the
> comparison is O(m), i.e., the probe cost is O(m). In radix based
> trees the probe cost is O(1). If O(m) = O(n) the execution costs
> per probe (ignoring access issues) are:
>
> hash table - O(log n)
> comparison tree - O((log n)^2)
> radix trees - O(log n)
>
> That is, hash tables and radix trees have the same order of
> performance with comparison trees a distant third.


I think you might be surprised. Try the implementation of
wdfreq.c, which is a part of my hashlib distribution, as a usage
demonstration. Diddle it to use your comparison tree methods, and
I think you will be shocked. For example, processing n869.txt, a
1.2 Meg text file, on a 450 Mhz machine, takes 0.88 seconds,
including loading the program, and shows the following statistics.
All i/o is to disk.

143209 words, 3910 entries, 227358 probes, 77381 misses

<http://cbfalconer.home.att.net/download/hashlib.zip>

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
Reply With Quote
  #22  
Old 11-07-2008, 04:20 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

Juha Nieminen <nospam@thanks.invalid> writes:
> user923005 wrote:
>> The hash table solution is O(n).

>
> You would have hard time proving that.


He shouldn't. It should be pretty clear. No hash table needs
to examine each element more than once.

Phil
--
We must respect the other fellow's religion, but only in the sense and to the
extent that we respect his theory that his wife is beautiful and his children
smart. -- Henry Louis Mencken (1880-1956), American editor and critic
Reply With Quote
  #23  
Old 11-07-2008, 04:25 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

Nick Keighley wrote:
> CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION


That's so true. However, we shouldn't completely dismiss
"micro-optimization" and "hacker-optimization" once we have come
up with the fastest possible overall algorithm. Even though two
implementations might both by eg. O(n lg n) with a rather small
constant factor, one of them might still be 10 times faster than
the other if it performs clever low-level tricks inside the
innermost loop (which might be run eg. millions of times).

Of course when dealing with hacker optimization, there's always
a point where it's too detrimental to the readability and
maintainability of the code in relation to the speed benefit it
gives. If the code becomes significantly more obfuscated while
giving a speed boost of 1%, it might not be worth the trouble.
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.
Reply With Quote
  #24  
Old 11-07-2008, 04:27 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

Phil Carmody <thefatphil_demunged@yahoo.co.uk> writes:
> Juha Nieminen <nospam@thanks.invalid> writes:
>> user923005 wrote:
>>> The hash table solution is O(n).

>>
>> You would have hard time proving that.

>
> He shouldn't. It should be pretty clear. No hash table needs
> to examine each element more than once.


Oooops, suffering from Falconeritis - but you did over-snip rather.
The whole task is indeed o(n^2), but omega(n).

Phil
--
We must respect the other fellow's religion, but only in the sense and to the
extent that we respect his theory that his wife is beautiful and his children
smart. -- Henry Louis Mencken (1880-1956), American editor and critic
Reply With Quote
  #25  
Old 11-07-2008, 04:32 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

Juha Nieminen wrote:
> [...]
> Personally I prefer to write 100 lines of clear, readable and
> maintainable code, even if it's 1% slower than 1000 lines of code
> worthy of the IOCCC.


It's always easier to make a bug-free program fast than to
bug-fix a fast program.

Schobi
Reply With Quote
  #26  
Old 11-07-2008, 04:41 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

user923005 wrote:
> Cuckoo hashing has guaranteed O(1) lookup and delete and amortized
> O(1) insert.


Wikipedia says: "Lookup requires inspection of just two locations in
the hash table, which takes constant time in the worst case".

I don't understand how that is possible. Suppose an element is
inserted into the table. Then later a new element displaces this first
element to a second location. Then later another new element, to be put
in this second location, displaces this first element to a third
location. How exactly do you find this first element anymore? It's not
in either of the two alternative locations defined by the two hashing
functions for that element.
Reply With Quote
  #27  
Old 11-07-2008, 04:56 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

On Nov 7, 8:00 am, Jerry Coffin <jcof...@taeus.com> wrote:
> In article <49135fbe.322673...@news.sbtc.net>, c...@tiac.net says...


> [ ... ]


> > Just as a note, the common claim that hash table are O(1)
> > per access whereas search trees are O(log n) per access is
> > misleading, and is arrived at by an apples and oranges
> > comparison. The number of probes in a hash table is O(1)
> > whereas the number of probes in a search tree is O(log n).
> > However the cost of computing the hash code for independent
> > keys is O(m) where m is the average key length, which is
> > necessarily greater than log2(n). In comparison based trees
> > the cost of the comparison is O(m), i.e., the probe cost is
> > O(m). In radix based trees the probe cost is O(1). If O(m)
> > = O(n) the execution costs per probe (ignoring access
> > issues) are:


> Keeping in mind, however, that m and n will usually be quite a
> bit difference, so any similarity between O(m) and O(n) is
> purely coincidental. In reality, the performance of hash
> tables tends to be rather different in general than the
> performance of tree-based systems (radix or comparison).


> > hash table - O(log n)
> > comparison tree - O((log n)^2)
> > radix trees - O(log n)


> > That is, hash tables and radix trees have the same order of
> > performance with comparison trees a distant third.


Funny that no one has mentionned it, but the performance of a
hash table is very, very dependent on the quality of the hashing
function. With a perfect hashing function, the access time is
O(1), but with a poor hashing function, it can be significantly
worse---even to the point of O(n). Unless you're sure that you
can define a good hashing function, you're probably better off
with a balanced tree. (The radix tree is an interesting
suggestion, often ignored. Given the sparseness of the entries,
it may result in a significant lack of locality and a too many
cache misses. But given typical word length, this isn't
certain, and it has other advantages, like not requiring the
word to be actually assembled in memory.)

> Note, however, that there are factors that don't show up in a
> big-O comparison that can still be quite substantial,
> especially for collectiosn of reasonble sizes. For one obvious
> one, consider the common case (like this one) where the keys
> are strings. Hashing the string requires looking at the whole
> string, but comparing two strings will often look at _far_
> fewer -- for the first few probes, it'll typically only
> involve looking at one or two characters.


> From a theoretical viewpoint, this effect is negligible -- but
> given the number of words in a typical vocabulary, it can be
> quite substantial.


I've done some actual measures (some time back), see
http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. In
this case, I only measured access, not insertion, but for the
list of words in the ispell French dictionary, I found that a
hash table with a good hashing function (FNV or my Mersenne
prime algorithm) was better than three times faster; using a
list of actual program symbols, the difference was still better
than 2.5 (the French word list contained 94883 different words,
the program symbols 15047).

I've since added some additional hashing algorithms, and tested
on differernt machines (which does make a difference), but I've
not gotten the results written up. I think I'll add an
implementation using a radix tree, just for good measure.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Reply With Quote
  #28  
Old 11-07-2008, 05:03 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

On Nov 7, 10:25 am, Juha Nieminen <nos...@thanks.invalid> wrote:
> Nick Keighley wrote:
> > CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION


> That's so true. However, we shouldn't completely dismiss
> "micro-optimization" and "hacker-optimization" once we have
> come up with the fastest possible overall algorithm.


Yes, but this should only be undertaken when necessary, under
the control of a profiler---what optimizes on one platform may
slow down on another. And don't forget that a good compiler
will do many of these optimizations for you. A long time ago, I
tried replacing "h = 127 * h + *p" (in a hashing algorithm) with
"h = (h << 7 - h) + *p", knowing that my hardware didn't have
hardware multiply. The results were slower than the original;
the compiler also converted the multiplication into a shift and
subtract, and since it knew the final purpose in those
operations, was actually able to save one machine instruction
over an explicit shift and subtract.

> Even though two implementations might both by eg. O(n lg n)
> with a rather small constant factor, one of them might still
> be 10 times faster than the other if it performs clever
> low-level tricks inside the innermost loop (which might be run
> eg. millions of times).


I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Reply With Quote
  #29  
Old 11-07-2008, 05:06 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

On Nov 7, 4:51*am, c...@tiac.net (Richard Harter) wrote:
> Just as a note, the common claim that hash table are O(1) per
> access whereas search trees are O(log n) per access is
> misleading, ...computing the hash code ...
> is necessarily greater
> than log2(n)....
> hash table * * *- O(log n)


Note that, for sufficiently pedantic O(.)
estimates, many of the usual O(.) measures are wrong.
A single arithmetic op may be considered to require
O(log N) time since log N bits are needed to represent
an index on a set of size N.

Returning to OP's problem, I wonder: are Compact
Hash Trees are in wide use? If there's interest
I'll post my implementation.

James Dow Allen
Reply With Quote
  #30  
Old 11-07-2008, 05:51 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

James Kanze wrote:
>> Even though two implementations might both by eg. O(n lg n)
>> with a rather small constant factor, one of them might still
>> be 10 times faster than the other if it performs clever
>> low-level tricks inside the innermost loop (which might be run
>> eg. millions of times).

>
> I've never seen a factor of 10 here. Even a factor of 2 is
> exceptional (and usually indicative of a very poor optimizer in
> the compiler).


There's just so much a compiler can do. There are cases where only the
programmer can understand what's going on (rather than the compiler) and
be able to optimize something.

For example, comparing two strings for inequality is O(1) if the
maximum size of the string is fixed (eg. you know that no string has
more than 30 characters), but still much slower than comparing two
integers. If you can somehow change the string comparison to an integer
comparison, and these comparisons are performed millions of times in the
innermost loop of your code, the program may speed up considerably. This
is a kind of optimization which is completely out or the realm of
compiler optimizations.
Reply With Quote
Reply

Thread Tools



All times are GMT -5. The time now is 01:58 AM.