Benchmarks : c++
This is a discussion on Benchmarks within the c++ forums in Programming Languages category; Richard Harter wrote: > .... snip ... > > Just as a note, the common claim that hash table are O(1) per > access whereas search trees are O(log n) per access is > misleading, and is arrived at by an apples and oranges > comparison. The number of probes in a hash table is O(1) whereas > the number of probes in a search tree is O(log n). However the > cost of computing the hash code for independent keys is O(m) > where m is the average key length, which is necessarily greater > than log2(n). In comparison ...
| c++ comp.lang.c++ usenet archive |
![]() |
| | LinkBack | Thread Tools |
|
#21
| |||
| |||
| > .... snip ... > > Just as a note, the common claim that hash table are O(1) per > access whereas search trees are O(log n) per access is > misleading, and is arrived at by an apples and oranges > comparison. The number of probes in a hash table is O(1) whereas > the number of probes in a search tree is O(log n). However the > cost of computing the hash code for independent keys is O(m) > where m is the average key length, which is necessarily greater > than log2(n). In comparison based trees the cost of the > comparison is O(m), i.e., the probe cost is O(m). In radix based > trees the probe cost is O(1). If O(m) = O(n) the execution costs > per probe (ignoring access issues) are: > > hash table - O(log n) > comparison tree - O((log n)^2) > radix trees - O(log n) > > That is, hash tables and radix trees have the same order of > performance with comparison trees a distant third. I think you might be surprised. Try the implementation of wdfreq.c, which is a part of my hashlib distribution, as a usage demonstration. Diddle it to use your comparison tree methods, and I think you will be shocked. For example, processing n869.txt, a 1.2 Meg text file, on a 450 Mhz machine, takes 0.88 seconds, including loading the program, and shows the following statistics. All i/o is to disk. 143209 words, 3910 entries, 227358 probes, 77381 misses <http://cbfalconer.home.att.net/download/hashlib.zip> -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. |
|
#22
| |||
| |||
| Juha Nieminen <nospam@thanks.invalid> writes: > user923005 wrote: >> The hash table solution is O(n). > > You would have hard time proving that. He shouldn't. It should be pretty clear. No hash table needs to examine each element more than once. Phil -- We must respect the other fellow's religion, but only in the sense and to the extent that we respect his theory that his wife is beautiful and his children smart. -- Henry Louis Mencken (1880-1956), American editor and critic |
|
#23
| |||
| |||
| Nick Keighley wrote: > CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION That's so true. However, we shouldn't completely dismiss "micro-optimization" and "hacker-optimization" once we have come up with the fastest possible overall algorithm. Even though two implementations might both by eg. O(n lg n) with a rather small constant factor, one of them might still be 10 times faster than the other if it performs clever low-level tricks inside the innermost loop (which might be run eg. millions of times). Of course when dealing with hacker optimization, there's always a point where it's too detrimental to the readability and maintainability of the code in relation to the speed benefit it gives. If the code becomes significantly more obfuscated while giving a speed boost of 1%, it might not be worth the trouble. Personally I prefer to write 100 lines of clear, readable and maintainable code, even if it's 1% slower than 1000 lines of code worthy of the IOCCC. |
|
#24
| |||
| |||
| Phil Carmody <thefatphil_demunged@yahoo.co.uk> writes: > Juha Nieminen <nospam@thanks.invalid> writes: >> user923005 wrote: >>> The hash table solution is O(n). >> >> You would have hard time proving that. > > He shouldn't. It should be pretty clear. No hash table needs > to examine each element more than once. Oooops, suffering from Falconeritis - but you did over-snip rather. The whole task is indeed o(n^2), but omega(n). Phil -- We must respect the other fellow's religion, but only in the sense and to the extent that we respect his theory that his wife is beautiful and his children smart. -- Henry Louis Mencken (1880-1956), American editor and critic |
|
#25
| |||
| |||
| Juha Nieminen wrote: > [...] > Personally I prefer to write 100 lines of clear, readable and > maintainable code, even if it's 1% slower than 1000 lines of code > worthy of the IOCCC. It's always easier to make a bug-free program fast than to bug-fix a fast program. ![]() Schobi |
|
#26
| |||
| |||
| user923005 wrote: > Cuckoo hashing has guaranteed O(1) lookup and delete and amortized > O(1) insert. Wikipedia says: "Lookup requires inspection of just two locations in the hash table, which takes constant time in the worst case". I don't understand how that is possible. Suppose an element is inserted into the table. Then later a new element displaces this first element to a second location. Then later another new element, to be put in this second location, displaces this first element to a third location. How exactly do you find this first element anymore? It's not in either of the two alternative locations defined by the two hashing functions for that element. |
|
#27
| |||
| |||
| On Nov 7, 8:00 am, Jerry Coffin <jcof...@taeus.com> wrote: > In article <49135fbe.322673...@news.sbtc.net>, c...@tiac.net says... > [ ... ] > > Just as a note, the common claim that hash table are O(1) > > per access whereas search trees are O(log n) per access is > > misleading, and is arrived at by an apples and oranges > > comparison. The number of probes in a hash table is O(1) > > whereas the number of probes in a search tree is O(log n). > > However the cost of computing the hash code for independent > > keys is O(m) where m is the average key length, which is > > necessarily greater than log2(n). In comparison based trees > > the cost of the comparison is O(m), i.e., the probe cost is > > O(m). In radix based trees the probe cost is O(1). If O(m) > > = O(n) the execution costs per probe (ignoring access > > issues) are: > Keeping in mind, however, that m and n will usually be quite a > bit difference, so any similarity between O(m) and O(n) is > purely coincidental. In reality, the performance of hash > tables tends to be rather different in general than the > performance of tree-based systems (radix or comparison). > > hash table - O(log n) > > comparison tree - O((log n)^2) > > radix trees - O(log n) > > That is, hash tables and radix trees have the same order of > > performance with comparison trees a distant third. Funny that no one has mentionned it, but the performance of a hash table is very, very dependent on the quality of the hashing function. With a perfect hashing function, the access time is O(1), but with a poor hashing function, it can be significantly worse---even to the point of O(n). Unless you're sure that you can define a good hashing function, you're probably better off with a balanced tree. (The radix tree is an interesting suggestion, often ignored. Given the sparseness of the entries, it may result in a significant lack of locality and a too many cache misses. But given typical word length, this isn't certain, and it has other advantages, like not requiring the word to be actually assembled in memory.) > Note, however, that there are factors that don't show up in a > big-O comparison that can still be quite substantial, > especially for collectiosn of reasonble sizes. For one obvious > one, consider the common case (like this one) where the keys > are strings. Hashing the string requires looking at the whole > string, but comparing two strings will often look at _far_ > fewer -- for the first few probes, it'll typically only > involve looking at one or two characters. > From a theoretical viewpoint, this effect is negligible -- but > given the number of words in a typical vocabulary, it can be > quite substantial. I've done some actual measures (some time back), see http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. In this case, I only measured access, not insertion, but for the list of words in the ispell French dictionary, I found that a hash table with a good hashing function (FNV or my Mersenne prime algorithm) was better than three times faster; using a list of actual program symbols, the difference was still better than 2.5 (the French word list contained 94883 different words, the program symbols 15047). I've since added some additional hashing algorithms, and tested on differernt machines (which does make a difference), but I've not gotten the results written up. I think I'll add an implementation using a radix tree, just for good measure. -- James Kanze (GABI Software) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
|
#28
| |||
| |||
| On Nov 7, 10:25 am, Juha Nieminen <nos...@thanks.invalid> wrote: > Nick Keighley wrote: > > CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION > That's so true. However, we shouldn't completely dismiss > "micro-optimization" and "hacker-optimization" once we have > come up with the fastest possible overall algorithm. Yes, but this should only be undertaken when necessary, under the control of a profiler---what optimizes on one platform may slow down on another. And don't forget that a good compiler will do many of these optimizations for you. A long time ago, I tried replacing "h = 127 * h + *p" (in a hashing algorithm) with "h = (h << 7 - h) + *p", knowing that my hardware didn't have hardware multiply. The results were slower than the original; the compiler also converted the multiplication into a shift and subtract, and since it knew the final purpose in those operations, was actually able to save one machine instruction over an explicit shift and subtract. > Even though two implementations might both by eg. O(n lg n) > with a rather small constant factor, one of them might still > be 10 times faster than the other if it performs clever > low-level tricks inside the innermost loop (which might be run > eg. millions of times). I've never seen a factor of 10 here. Even a factor of 2 is exceptional (and usually indicative of a very poor optimizer in the compiler). -- James Kanze (GABI Software) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
|
#29
| |||
| |||
| On Nov 7, 4:51*am, c...@tiac.net (Richard Harter) wrote: > Just as a note, the common claim that hash table are O(1) per > access whereas search trees are O(log n) per access is > misleading, ...computing the hash code ... > is necessarily greater > than log2(n).... > hash table * * *- O(log n) Note that, for sufficiently pedantic O(.) estimates, many of the usual O(.) measures are wrong. A single arithmetic op may be considered to require O(log N) time since log N bits are needed to represent an index on a set of size N. Returning to OP's problem, I wonder: are Compact Hash Trees are in wide use? If there's interest I'll post my implementation. James Dow Allen |
|
#30
| |||
| |||
| James Kanze wrote: >> Even though two implementations might both by eg. O(n lg n) >> with a rather small constant factor, one of them might still >> be 10 times faster than the other if it performs clever >> low-level tricks inside the innermost loop (which might be run >> eg. millions of times). > > I've never seen a factor of 10 here. Even a factor of 2 is > exceptional (and usually indicative of a very poor optimizer in > the compiler). There's just so much a compiler can do. There are cases where only the programmer can understand what's going on (rather than the compiler) and be able to optimize something. For example, comparing two strings for inequality is O(1) if the maximum size of the string is fixed (eg. you know that no string has more than 30 characters), but still much slower than comparing two integers. If you can somehow change the string comparison to an integer comparison, and these comparisons are performed millions of times in the innermost loop of your code, the program may speed up considerably. This is a kind of optimization which is completely out or the realm of compiler optimizations. |




