"Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

This is a discussion on "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}." within the c++ forums in Programming Languages category; Hi All, It is common knowledge that set-like containers in STL are implemented using red-black trees. However, Ben Pfaff showed of Stanford showed in his paper: http://www.stanford.edu/~blp/papers/libavl.pdf ....that, not suprisingly, red-black implementations, while good overall, are not optimal for all situations. Sometimes an AVL implementation will perform better [input comes in sorted, later random access], and sometimes regular BST is best [input comes in randomly]. Faced with implementing your own set-like container, would you: 1. Do what STL did: Pick one, ignore the rest, and get on with it. 2. Create compile-time framework to choose among implementations according to anticipated ...

Go Back   Application Development Forum > Programming Languages > c++

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-31-2008, 04:55 AM
Le Chaud Lapin
Guest
 
Default "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

Hi All,

It is common knowledge that set-like containers in STL are implemented
using red-black trees. However, Ben Pfaff showed of Stanford showed
in his paper:

http://www.stanford.edu/~blp/papers/libavl.pdf

....that, not suprisingly, red-black implementations, while good
overall, are not optimal for all situations. Sometimes an AVL
implementation will perform better [input comes in sorted, later
random access], and sometimes regular BST is best [input comes in
randomly].

Faced with implementing your own set-like container, would you:

1. Do what STL did: Pick one, ignore the rest, and get on with it.
2. Create compile-time framework to choose among implementations
according to anticipated use.
3. Create run-time framework to choose among implementations,
according to anticipated use.

I recall somewhere on the WWW an author following #3, but cannot find
the code.

The mechanism I use now to get at various possible implementations is
embarrassing: I prefix Set<> with a namespace for the implementation:
AVL::Set<>. I do have abstract base for all of them, which I use when
only an interface is needed. Yes, I know this is yuck. I would rather
just write Set<>, but then I would have to commit to a specific
implementation unless I use schemes #2 or #3.

What would you do?

-Le Chaud Lapin-

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #2  
Old 08-31-2008, 06:53 PM
Pedro Lamarão
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

On 31 ago, 05:55, Le Chaud Lapin <jaibudu...@gmail.com> wrote:

> The mechanism I use now to get at various possible implementations is
> embarrassing: I prefix Set<> with a namespace for the implementation:
> AVL::Set<>. I do have abstract base for all of them, which I use when
> only an interface is needed. Yes, I know this is yuck. I would rather
> just write Set<>, but then I would have to commit to a specific
> implementation unless I use schemes #2 or #3.
>
> What would you do?


Maybe use a "set policy" with a default value?

template <typename T, typename Policy = DefaultSet>
class set;

--
P.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #3  
Old 08-31-2008, 06:54 PM
Rune Allnor
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

On 31 Aug, 10:55, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> Hi All,
>
> It is common knowledge that set-like containers in STL are implemented
> using red-black trees. However, Ben Pfaff showed of Stanford showed
> in his paper:
>
> http://www.stanford.edu/~blp/papers/libavl.pdf
>
> ...that, not suprisingly, red-black implementations, while good
> overall, are not optimal for all situations. Sometimes an AVL
> implementation will perform better [input comes in sorted, later
> random access], and sometimes regular BST is best [input comes in
> randomly].

....
> What would you do?


The line of arguments I prefer to use in my day job is

1) Start out with what 'everybody else' do
2) Check out how well the 'standard' way works
3) Test alternatives
4) If you can demonstrate that some 'non-standard'
method *both* works better *and* that the reasons
why are properties of the problem scope (as opposed
to some coincidense in your particular test run),
use the alternative way.

Which is basically a roundabout way to say that 'use the
usual/standard/conventional/accepted stuff unless you have
specific, justifyable reasons not to.'

I prefer to stick with the conservative standard stuff
because it becomes a lot easier to identify the exceptional
cases when the baseline is fixed. If everybody uses
individual preferences when doing standard stuff it
becomes difficult to recognize what fluctuations are
caused by inherent properties of the individual problem
and what are caused by the operator's (or in this case,
programmer's) personal methods of work.

Rune


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #4  
Old 08-31-2008, 06:57 PM
Lance Diduck
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

On Aug 31, 4:55 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:


> 1. Do what STL did: Pick one, ignore the rest, and get on with it.

This is not what "STL" did. In fact, the original STL libraries had no
containers at all. When the STL was adapted as a C++ standard, some
containers were added. These are the out of the box containers you get
for free. These are the C++ standard library containers, which adhere
to STL design guidelines.
Of course, the idea is to use these as a default, and if you need
something better, to design your own container that also adheres to
STL guidelines.

> 2. Create compile-time framework to choose among implementations
> according to anticipated use.

This is pretty easy. If the BST, splay, etc sets that the same
identical interface as std::set, well, you could just switch between
the implementations via a typedef. This case is what the STL was
designed for.

> 3. Create run-time framework to choose among implementations,
> according to anticipated use.

This is a LOT of work. The runtime would have to choose amongst
implementations before they are actually used. SO the first run it
would pick one, and log statitics. Then on the second try, look at
those statistics and guess again. It is similar to how Java does
runtime profiling. And that was a lot of work to get right.

Lance


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #5  
Old 08-31-2008, 06:58 PM
Erik Wikström
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

On 2008-08-31 10:55, Le Chaud Lapin wrote:
> Hi All,
>
> It is common knowledge that set-like containers in STL are implemented
> using red-black trees. However, Ben Pfaff showed of Stanford showed
> in his paper:
>
> http://www.stanford.edu/~blp/papers/libavl.pdf
>
> ...that, not suprisingly, red-black implementations, while good
> overall, are not optimal for all situations. Sometimes an AVL
> implementation will perform better [input comes in sorted, later
> random access], and sometimes regular BST is best [input comes in
> randomly].
>
> Faced with implementing your own set-like container, would you:
>
> 1. Do what STL did: Pick one, ignore the rest, and get on with it.
> 2. Create compile-time framework to choose among implementations
> according to anticipated use.
> 3. Create run-time framework to choose among implementations,
> according to anticipated use.
>
> I recall somewhere on the WWW an author following #3, but cannot find
> the code.
>
> The mechanism I use now to get at various possible implementations is
> embarrassing: I prefix Set<> with a namespace for the implementation:
> AVL::Set<>. I do have abstract base for all of them, which I use when
> only an interface is needed. Yes, I know this is yuck. I would rather
> just write Set<>, but then I would have to commit to a specific
> implementation unless I use schemes #2 or #3.


I have not studies the subject in great detail so the following might
not be entirely correct, and there might be ways to solve some of the
issues I see with more advanced template programming or using C++0x
features.

I can see advantages of all three options, #1 has the advantage of being
simple and easy to use for less knowledgeable users. On the other hand
the same effect be be achieved by using default arguments in #2 and #3.

#2 would be a small extension of the approach taken by STL, where the
tree-implementation is an additional template-parameter. The drawback of
this would be that Set<int, AVLTree> and Set<int, RBTree> would be
different types and you would either have to parameterise functions that
can expect to handle both types. Or you could use a common interface
which they all inherit from, but then you will have problems using the
container with anything except references/pointers.

#3 is nice because you can create a Tree-interface which your RB, AVL,
BST, etc. implements and then use something similar to PIMPL. I'm not
sure that there's a way to make creating such a container look nice
unless you want the default implementation since it will probably
involve passing a pointer to the tree-object, but there might be a way
to work around that.

--
Erik Wikström


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #6  
Old 08-31-2008, 06:59 PM
Michael Aaron Safyan
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

Le Chaud Lapin wrote:
> Hi All,
>
> It is common knowledge that set-like containers in STL are implemented
> using red-black trees. However, Ben Pfaff showed of Stanford showed
> in his paper:
>
> http://www.stanford.edu/~blp/papers/libavl.pdf
>
> ...that, not suprisingly, red-black implementations, while good
> overall, are not optimal for all situations. Sometimes an AVL
> implementation will perform better [input comes in sorted, later
> random access], and sometimes regular BST is best [input comes in
> randomly].
>
> Faced with implementing your own set-like container, would you:
>
> 1. Do what STL did: Pick one, ignore the rest, and get on with it.
> 2. Create compile-time framework to choose among implementations
> according to anticipated use.
> 3. Create run-time framework to choose among implementations,
> according to anticipated use.
>
> I recall somewhere on the WWW an author following #3, but cannot find
> the code.
>
> The mechanism I use now to get at various possible implementations is
> embarrassing: I prefix Set<> with a namespace for the implementation:
> AVL::Set<>. I do have abstract base for all of them, which I use when
> only an interface is needed. Yes, I know this is yuck. I would rather
> just write Set<>, but then I would have to commit to a specific
> implementation unless I use schemes #2 or #3.
>
> What would you do?
>
> -Le Chaud Lapin-
>



This depends on the situation. If I were writing an application where
there was only one, well-known usage of the container, then I would
specify the type of container required at compile-time. If I were
writing a general-purpose container like std::set, however, I would make
it "adaptive". That is, it would initially use the implementation which
is considered to be best, in general, but it would keep track of its
usage and possibly change the underlying implementation at runtime, if
an alternative implementation were better.

I think the scheme that you have right now is good, since it allows the
user to choose a specific implementation, depending on their needs. The
only improvement I would make would be to create Adaptive::Set, and to
make Adaptive::Set the default implementation of Set.

- Michael Safyan

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #7  
Old 09-01-2008, 03:05 AM
Tony Delroy
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

On Aug 31, 5:55 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> Faced with implementing your own set-like container, would you:
>
> 1. Do what STL did: Pick one, ignore the rest, and get on with it.
> 2. Create compile-time framework to choose among implementations
> according to anticipated use.
> 3. Create run-time framework to choose among implementations,
> according to anticipated use.


I'm sure most of us are generally happy with 1, myself included. If
you do need to create multiple implementations, and you're planning on
passing them around in very large C++ systems, then you run into some
practical issues, e.g.:

- what would have been void fn(std::set<T>& x) needs to become one of:
- template <class Set> void fn(Set& x)
- fast but bloated, especially for fn(Set1&, Set2&, Set3&...)
- compilation dependency / set implementation must be in header
OR
- void fn(Abstract_Set<T>& x)
- virtual dispatch
- not bloated
- sometimes slow (out of line functions can be an order of
magnitude slower)
- can put any common functions/members (maybe "size") directly
in base
- one interesting technique is on-demand hand-off from concrete to
virtual dispatch:

class Set__X {
public:
class Accessor : public Abstract_Set {
public:
Accessor(Set& s) : s_(s) { }
... insert(...) { return s_.insert(...); }
...
private:
Set& s_;
}
operator Accessor() { return Accessor(*this); }
...
};
OR
- targeted overloading
void fn(Set1& x)
void fn(Set2& x)
void fn(Set3& x)
...
for some smaller number of Sets that span the storage layouts
(but not necessarily all the behavioural policies)
- automatic conversion is easier when the sets are implemented
as alternative instantiations of one class
OR
- void fn(Discriminated_Set& x)
- each concrete set derive from
struct Discriminated_Set {
enum Type { BST, AVL, RB, Splay };
Type type_;
};
- manual run-time "switch (x.type_())" between implementations
- fn needs to be written with knowledge of all possible
implementations
- lots of manual work to implement / maintain
- error prone
- may be faster than virtual dispatch for small inline-able
functions
- can be selectively combined with virtual dispatch
OR
- on-the-fly data conversion
class Set__X { ... Set__X(const Set__Y&); ... }
- Set__X fn(const Set__X& in);
- conversion cost dominant unless fn performs a lot of operations!
OR
- whatever's escaped my off-the-top-of-my-head-on-a-Monday-morning
list

As noted, making the templates instantiations of one template helps to
control and automate the implementation of some of these approaches.

You need a problematically-long-running major app. very tightly bound
to the set's performance to make any of this worthwhile.

Run-time adaptive containers not only need time to select BST/AVL/
Splay/RB, but also need per-function-call function-pointer/switch
between implementations. This may be more overhead than any
subsequent performance improvement is worth. Manually timing
performance and recompiling is much simpler and adequate in most
cases.

Cheers,

Tony

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #8  
Old 09-01-2008, 03:49 PM
Scott McMurray
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

On Aug 31, 4:55 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> ...that, not suprisingly, red-black implementations, while good
> overall, are not optimal for all situations. Sometimes an AVL
> implementation will perform better [input comes in sorted, later
> random access], and sometimes regular BST is best [input comes in
> randomly].
>


I think the correct way to handle the sorted input case is to create a
separate function for it, since you can create an RB-Tree from a
sorted sequence in linear time, which is almost certainly faster than
an AVL Tree.

As for the "it comes in pretty much randomly" case, it seems more like
a fundamental program design decision whether you need real-time O(log
n), and thus will use an RB-tree that may perform worse, or amortized
O(log n) is sufficient, and using a possibly-faster Splay tree is
acceptable.

So I think 1 is my choice, with the caveat that additional per-
complexity classes exist, and can be used by changing typedefs (which
is, in a way, the same thing as 2, I suppose). With TR1 adding an
unordered_set, having an amortized_set (based on a splay tree) as well
seems reasonable. Perhaps a linear_set (based on a sorted vector) as
well, where you can swap in an (assumed unsorted) vector, for probably
the best speed in the "insertions only, then lookups only" case.

~ Scott

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #9  
Old 09-02-2008, 03:52 PM
Lance Diduck
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

> What would you do?
>
> -Le Chaud Lapin-

Perhaps the best answer -- find someone who has already implemented
it, an use their code
http://gcc.gnu.org/onlinedocs/libstd...ontainers.html

Lance

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
  #10  
Old 09-03-2008, 04:58 PM
Ulrich Eckhardt
Guest
 
Default Re: "Hi. I'm A Set, Implemented As {BST | AVL | Splay | Red-Black}."

Le Chaud Lapin wrote:
[...]
> Faced with implementing your own set-like container, would you:
>
> 1. Do what STL did: Pick one, ignore the rest, and get on with it.
> 2. Create compile-time framework to choose among implementations
> according to anticipated use.
> 3. Create run-time framework to choose among implementations,
> according to anticipated use.


4. Use std::vector, optionally sorting it. IO for filling it is always far
slower than container operations and the number of elements is typically so
small that while tree-based containers do scale better, the small vector
still performs better, both in terms of speed and memory use. Using a
binary search (not std::binary_search) on a sorted vector is actually
faster than a set because of the CPU cache footprint and locality of
reference.

BTW: I'm not actually sure what you mean with version 2, do you mean you
determine the best type during build (i.e. via a test that runs first and
then sets a macro for the real build) or that you hard-code it manually?

Anyway, if you actually do have a case where you have many objects and care
a lot for speed, you need to set up a test that allows quantifying the
differences. I personally would do that externally and then select the
proper container at compile time, so one of the two interpretations of
method 2 above.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 07:23 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.