| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Hi All, It is common knowledge that set-like containers in STL are implemented using red-black trees. However, Ben Pfaff showed of Stanford showed in his paper: http://www.stanford.edu/~blp/papers/libavl.pdf ....that, not suprisingly, red-black implementations, while good overall, are not optimal for all situations. Sometimes an AVL implementation will perform better [input comes in sorted, later random access], and sometimes regular BST is best [input comes in randomly]. Faced with implementing your own set-like container, would you: 1. Do what STL did: Pick one, ignore the rest, and get on with it. 2. Create compile-time framework to choose among implementations according to anticipated use. 3. Create run-time framework to choose among implementations, according to anticipated use. I recall somewhere on the WWW an author following #3, but cannot find the code. The mechanism I use now to get at various possible implementations is embarrassing: I prefix Set<> with a namespace for the implementation: AVL::Set<>. I do have abstract base for all of them, which I use when only an interface is needed. Yes, I know this is yuck. I would rather just write Set<>, but then I would have to commit to a specific implementation unless I use schemes #2 or #3. What would you do? -Le Chaud Lapin- -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#2
| |||
| |||
| On 31 ago, 05:55, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > The mechanism I use now to get at various possible implementations is > embarrassing: I prefix Set<> with a namespace for the implementation: > AVL::Set<>. I do have abstract base for all of them, which I use when > only an interface is needed. Yes, I know this is yuck. I would rather > just write Set<>, but then I would have to commit to a specific > implementation unless I use schemes #2 or #3. > > What would you do? Maybe use a "set policy" with a default value? template <typename T, typename Policy = DefaultSet> class set; -- P. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#3
| |||
| |||
| On 31 Aug, 10:55, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > Hi All, > > It is common knowledge that set-like containers in STL are implemented > using red-black trees. However, Ben Pfaff showed of Stanford showed > in his paper: > > http://www.stanford.edu/~blp/papers/libavl.pdf > > ...that, not suprisingly, red-black implementations, while good > overall, are not optimal for all situations. Sometimes an AVL > implementation will perform better [input comes in sorted, later > random access], and sometimes regular BST is best [input comes in > randomly]. .... > What would you do? The line of arguments I prefer to use in my day job is 1) Start out with what 'everybody else' do 2) Check out how well the 'standard' way works 3) Test alternatives 4) If you can demonstrate that some 'non-standard' method *both* works better *and* that the reasons why are properties of the problem scope (as opposed to some coincidense in your particular test run), use the alternative way. Which is basically a roundabout way to say that 'use the usual/standard/conventional/accepted stuff unless you have specific, justifyable reasons not to.' I prefer to stick with the conservative standard stuff because it becomes a lot easier to identify the exceptional cases when the baseline is fixed. If everybody uses individual preferences when doing standard stuff it becomes difficult to recognize what fluctuations are caused by inherent properties of the individual problem and what are caused by the operator's (or in this case, programmer's) personal methods of work. Rune -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#4
| |||
| |||
| On Aug 31, 4:55 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > 1. Do what STL did: Pick one, ignore the rest, and get on with it. This is not what "STL" did. In fact, the original STL libraries had no containers at all. When the STL was adapted as a C++ standard, some containers were added. These are the out of the box containers you get for free. These are the C++ standard library containers, which adhere to STL design guidelines. Of course, the idea is to use these as a default, and if you need something better, to design your own container that also adheres to STL guidelines. > 2. Create compile-time framework to choose among implementations > according to anticipated use. This is pretty easy. If the BST, splay, etc sets that the same identical interface as std::set, well, you could just switch between the implementations via a typedef. This case is what the STL was designed for. > 3. Create run-time framework to choose among implementations, > according to anticipated use. This is a LOT of work. The runtime would have to choose amongst implementations before they are actually used. SO the first run it would pick one, and log statitics. Then on the second try, look at those statistics and guess again. It is similar to how Java does runtime profiling. And that was a lot of work to get right. Lance -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#5
| |||
| |||
| On 2008-08-31 10:55, Le Chaud Lapin wrote: > Hi All, > > It is common knowledge that set-like containers in STL are implemented > using red-black trees. However, Ben Pfaff showed of Stanford showed > in his paper: > > http://www.stanford.edu/~blp/papers/libavl.pdf > > ...that, not suprisingly, red-black implementations, while good > overall, are not optimal for all situations. Sometimes an AVL > implementation will perform better [input comes in sorted, later > random access], and sometimes regular BST is best [input comes in > randomly]. > > Faced with implementing your own set-like container, would you: > > 1. Do what STL did: Pick one, ignore the rest, and get on with it. > 2. Create compile-time framework to choose among implementations > according to anticipated use. > 3. Create run-time framework to choose among implementations, > according to anticipated use. > > I recall somewhere on the WWW an author following #3, but cannot find > the code. > > The mechanism I use now to get at various possible implementations is > embarrassing: I prefix Set<> with a namespace for the implementation: > AVL::Set<>. I do have abstract base for all of them, which I use when > only an interface is needed. Yes, I know this is yuck. I would rather > just write Set<>, but then I would have to commit to a specific > implementation unless I use schemes #2 or #3. I have not studies the subject in great detail so the following might not be entirely correct, and there might be ways to solve some of the issues I see with more advanced template programming or using C++0x features. I can see advantages of all three options, #1 has the advantage of being simple and easy to use for less knowledgeable users. On the other hand the same effect be be achieved by using default arguments in #2 and #3. #2 would be a small extension of the approach taken by STL, where the tree-implementation is an additional template-parameter. The drawback of this would be that Set<int, AVLTree> and Set<int, RBTree> would be different types and you would either have to parameterise functions that can expect to handle both types. Or you could use a common interface which they all inherit from, but then you will have problems using the container with anything except references/pointers. #3 is nice because you can create a Tree-interface which your RB, AVL, BST, etc. implements and then use something similar to PIMPL. I'm not sure that there's a way to make creating such a container look nice unless you want the default implementation since it will probably involve passing a pointer to the tree-object, but there might be a way to work around that. -- Erik Wikström [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#6
| |||
| |||
| Le Chaud Lapin wrote: > Hi All, > > It is common knowledge that set-like containers in STL are implemented > using red-black trees. However, Ben Pfaff showed of Stanford showed > in his paper: > > http://www.stanford.edu/~blp/papers/libavl.pdf > > ...that, not suprisingly, red-black implementations, while good > overall, are not optimal for all situations. Sometimes an AVL > implementation will perform better [input comes in sorted, later > random access], and sometimes regular BST is best [input comes in > randomly]. > > Faced with implementing your own set-like container, would you: > > 1. Do what STL did: Pick one, ignore the rest, and get on with it. > 2. Create compile-time framework to choose among implementations > according to anticipated use. > 3. Create run-time framework to choose among implementations, > according to anticipated use. > > I recall somewhere on the WWW an author following #3, but cannot find > the code. > > The mechanism I use now to get at various possible implementations is > embarrassing: I prefix Set<> with a namespace for the implementation: > AVL::Set<>. I do have abstract base for all of them, which I use when > only an interface is needed. Yes, I know this is yuck. I would rather > just write Set<>, but then I would have to commit to a specific > implementation unless I use schemes #2 or #3. > > What would you do? > > -Le Chaud Lapin- > This depends on the situation. If I were writing an application where there was only one, well-known usage of the container, then I would specify the type of container required at compile-time. If I were writing a general-purpose container like std::set, however, I would make it "adaptive". That is, it would initially use the implementation which is considered to be best, in general, but it would keep track of its usage and possibly change the underlying implementation at runtime, if an alternative implementation were better. I think the scheme that you have right now is good, since it allows the user to choose a specific implementation, depending on their needs. The only improvement I would make would be to create Adaptive::Set, and to make Adaptive::Set the default implementation of Set. - Michael Safyan -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#7
| |||
| |||
| On Aug 31, 5:55 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > Faced with implementing your own set-like container, would you: > > 1. Do what STL did: Pick one, ignore the rest, and get on with it. > 2. Create compile-time framework to choose among implementations > according to anticipated use. > 3. Create run-time framework to choose among implementations, > according to anticipated use. I'm sure most of us are generally happy with 1, myself included. If you do need to create multiple implementations, and you're planning on passing them around in very large C++ systems, then you run into some practical issues, e.g.: - what would have been void fn(std::set<T>& x) needs to become one of: - template <class Set> void fn(Set& x) - fast but bloated, especially for fn(Set1&, Set2&, Set3&...) - compilation dependency / set implementation must be in header OR - void fn(Abstract_Set<T>& x) - virtual dispatch - not bloated - sometimes slow (out of line functions can be an order of magnitude slower) - can put any common functions/members (maybe "size") directly in base - one interesting technique is on-demand hand-off from concrete to virtual dispatch: class Set__X { public: class Accessor : public Abstract_Set { public: Accessor(Set& s) : s_(s) { } ... insert(...) { return s_.insert(...); } ... private: Set& s_; } operator Accessor() { return Accessor(*this); } ... }; OR - targeted overloading void fn(Set1& x) void fn(Set2& x) void fn(Set3& x) ... for some smaller number of Sets that span the storage layouts (but not necessarily all the behavioural policies) - automatic conversion is easier when the sets are implemented as alternative instantiations of one class OR - void fn(Discriminated_Set& x) - each concrete set derive from struct Discriminated_Set { enum Type { BST, AVL, RB, Splay }; Type type_; }; - manual run-time "switch (x.type_())" between implementations - fn needs to be written with knowledge of all possible implementations - lots of manual work to implement / maintain - error prone - may be faster than virtual dispatch for small inline-able functions - can be selectively combined with virtual dispatch OR - on-the-fly data conversion class Set__X { ... Set__X(const Set__Y&); ... } - Set__X fn(const Set__X& in); - conversion cost dominant unless fn performs a lot of operations! OR - whatever's escaped my off-the-top-of-my-head-on-a-Monday-morning list As noted, making the templates instantiations of one template helps to control and automate the implementation of some of these approaches. You need a problematically-long-running major app. very tightly bound to the set's performance to make any of this worthwhile. Run-time adaptive containers not only need time to select BST/AVL/ Splay/RB, but also need per-function-call function-pointer/switch between implementations. This may be more overhead than any subsequent performance improvement is worth. Manually timing performance and recompiling is much simpler and adequate in most cases. Cheers, Tony -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#8
| |||
| |||
| On Aug 31, 4:55 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > ...that, not suprisingly, red-black implementations, while good > overall, are not optimal for all situations. Sometimes an AVL > implementation will perform better [input comes in sorted, later > random access], and sometimes regular BST is best [input comes in > randomly]. > I think the correct way to handle the sorted input case is to create a separate function for it, since you can create an RB-Tree from a sorted sequence in linear time, which is almost certainly faster than an AVL Tree. As for the "it comes in pretty much randomly" case, it seems more like a fundamental program design decision whether you need real-time O(log n), and thus will use an RB-tree that may perform worse, or amortized O(log n) is sufficient, and using a possibly-faster Splay tree is acceptable. So I think 1 is my choice, with the caveat that additional per- complexity classes exist, and can be used by changing typedefs (which is, in a way, the same thing as 2, I suppose). With TR1 adding an unordered_set, having an amortized_set (based on a splay tree) as well seems reasonable. Perhaps a linear_set (based on a sorted vector) as well, where you can swap in an (assumed unsorted) vector, for probably the best speed in the "insertions only, then lookups only" case. ~ Scott -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#9
| |||
| |||
| > What would you do? > > -Le Chaud Lapin- Perhaps the best answer -- find someone who has already implemented it, an use their code http://gcc.gnu.org/onlinedocs/libstd...ontainers.html Lance -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
|
#10
| |||
| |||
| Le Chaud Lapin wrote: [...] > Faced with implementing your own set-like container, would you: > > 1. Do what STL did: Pick one, ignore the rest, and get on with it. > 2. Create compile-time framework to choose among implementations > according to anticipated use. > 3. Create run-time framework to choose among implementations, > according to anticipated use. 4. Use std::vector, optionally sorting it. IO for filling it is always far slower than container operations and the number of elements is typically so small that while tree-based containers do scale better, the small vector still performs better, both in terms of speed and memory use. Using a binary search (not std::binary_search) on a sorted vector is actually faster than a set because of the CPU cache footprint and locality of reference. BTW: I'm not actually sure what you mean with version 2, do you mean you determine the best type during build (i.e. via a test that runs first and then sets a macro for the real build) or that you hard-code it manually? Anyway, if you actually do have a case where you have many objects and care a lot for speed, you need to set up a test that allows quantifying the differences. I personally would do that externally and then select the proper container at compile time, so one of the two interpretations of method 2 above. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.