Objectmix
Tags Register Mark Forums Read

Benchmarks : c++

This is a discussion on Benchmarks within the c++ forums in Programming Languages category; The task: Write a program that reads a set of words from standard input and prints the number of distinct words. I came across a website that listed a few programs to accomplish this task: http://unthought.net/c++/c_vs_c++.html (ignore all the language flaming :-), and thought that all of them did unnecessary operations, so I wrote my own. But for some reason, my version turned out slower that ALL of the versions in the website, even though it seems to perform less operations (yes, I benchmarked them on my own computer). According to the website, the slowest version is: #include <set> #include ...


Object Mix > Programming Languages > c++ > Benchmarks

c++ comp.lang.c++ usenet archive

Reply

 

LinkBack Thread Tools
  #1  
Old 11-06-2008, 10:53 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Benchmarks

The task: Write a program that reads a set of words from standard
input and prints the number of distinct words.

I came across a website that listed a few programs to accomplish this
task: http://unthought.net/c++/c_vs_c++.html (ignore all the language
flaming :-), and thought that all of them did unnecessary operations,
so I wrote my own. But for some reason, my version turned out slower
that ALL of the versions in the website, even though it seems to
perform less operations (yes, I benchmarked them on my own computer).

According to the website, the slowest version is:

#include <set>
#include <string>
#include <iostream>

int main(int argc, char **argv)
{
// Declare and Initialize some variables
std::string word;
std::set<std::string> wordcount;
// Read words and insert in rb-tree
while (std::cin >> word) wordcount.insert(word);
// Print the result
std::cout << "Words: " << wordcount.size() << std::endl;
return 0;
}

My version is about 12 times slower than that. It uses lower-level
constructs. Here it is:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

struct SetNode
{
char *word;
struct SetNode *next;
};

// An unorderd set of words
//
static struct SetNode *gSet = 0;
static int gSetSize = 0;

#define kInitWordSize 32

// Returns a word read from stdin. The returned pointer must be
// deallocated with free().
//
static char *
ReadOneWord(void)
{
int ch = getchar();

while (ch != EOF && isspace(ch))
ch = getchar();
if (ch == EOF)
return 0;

char *word = (char *) malloc(kInitWordSize);
if (!word)
return 0;

int size = kInitWordSize;
int i = 0;

while (ch != EOF && !isspace(ch)) {
if (i >= size) {
size *= 2;

char *newWord = (char *) realloc(word, size);
if (!newWord) {
free(word);
return 0;
}
word = newWord;
}

word[i++] = ch;
ch = getchar();
}

if (i >= size) {
size *= 2;

char *newWord = (char *) realloc(word, size);
if (!newWord) {
free(word);
return 0;
}
word = newWord;
}
word[i] = '\0';

return word;
}

// Inserts a word into the set if it isn't in the set.
// The passed string is expected to have been allocated with
// a memory allocation function, and it should be considered
// lost after passed to this function.
//
static void
InsertWord(char *aWord)
{
struct SetNode *node;

for (node = gSet; node; node = node->next) {
if (strcmp(node->word, aWord) == 0) {
free(aWord);
return;
}
}

node = (struct SetNode *) malloc(sizeof(struct SetNode));
if (!node) {
free(aWord);
return;
}

node->word = aWord;
node->next = gSet;
gSet = node;
++gSetSize;
}

static void
DeleteSet(void)
{
struct SetNode *node = gSet;
struct SetNode *temp;

while (node) {
temp = node;
node = node->next;
free(temp->word);
free(temp);
}

gSet = 0;
gSetSize = 0;
}

int
main(void)
{
char *word;

while ((word = ReadOneWord()))
InsertWord(word);

printf("Words: %d\n", gSetSize);

// Skip cleanup for now...
//DeleteSet();
}

Any ideas as to what causes the big slowdown?

Sebastian

  #2  
Old 11-06-2008, 11:13 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

s0suk3@gmail.com wrote:

> The task: Write a program that reads a set of words from standard
> input and prints the number of distinct words.
>
> I came across a website that listed a few programs to accomplish this
> task: http://unthought.net/c++/c_vs_c++.html (ignore all the language
> flaming :-), and thought that all of them did unnecessary operations,
> so I wrote my own. But for some reason, my version turned out slower
> that ALL of the versions in the website, even though it seems to
> perform less operations (yes, I benchmarked them on my own computer).
>
> According to the website, the slowest version is:
>
> #include <set>
> #include <string>
> #include <iostream>
>
> int main(int argc, char **argv)
> {
> // Declare and Initialize some variables
> std::string word;
> std::set<std::string> wordcount;
> // Read words and insert in rb-tree
> while (std::cin >> word) wordcount.insert(word);
> // Print the result
> std::cout << "Words: " << wordcount.size() << std::endl;
> return 0;
> }
>
> My version is about 12 times slower than that. It uses lower-level
> constructs. Here it is:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <ctype.h>
>
> struct SetNode
> {
> char *word;
> struct SetNode *next;
> };


This is a linear list.


> // An unorderd set of words
> //
> static struct SetNode *gSet = 0;
> static int gSetSize = 0;
>
> #define kInitWordSize 32
>
> // Returns a word read from stdin. The returned pointer must be
> // deallocated with free().
> //
> static char *
> ReadOneWord(void)
> {
> int ch = getchar();
>
> while (ch != EOF && isspace(ch))
> ch = getchar();
> if (ch == EOF)
> return 0;
>
> char *word = (char *) malloc(kInitWordSize);
> if (!word)
> return 0;
>
> int size = kInitWordSize;
> int i = 0;
>
> while (ch != EOF && !isspace(ch)) {
> if (i >= size) {
> size *= 2;
>
> char *newWord = (char *) realloc(word, size);
> if (!newWord) {
> free(word);
> return 0;
> }
> word = newWord;
> }
>
> word[i++] = ch;
> ch = getchar();
> }
>
> if (i >= size) {
> size *= 2;
>
> char *newWord = (char *) realloc(word, size);
> if (!newWord) {
> free(word);
> return 0;
> }
> word = newWord;
> }
> word[i] = '\0';
>
> return word;
> }
>
> // Inserts a word into the set if it isn't in the set.
> // The passed string is expected to have been allocated with
> // a memory allocation function, and it should be considered
> // lost after passed to this function.
> //
> static void
> InsertWord(char *aWord)
> {
> struct SetNode *node;
>
> for (node = gSet; node; node = node->next) {
> if (strcmp(node->word, aWord) == 0) {
> free(aWord);
> return;
> }
> }


Here, you do a linear search.

std::set<> maintains a (balanced) tree internally and therefore does fewer
comparisons per word (logarithmic vs. linear).


> node = (struct SetNode *) malloc(sizeof(struct SetNode));
> if (!node) {
> free(aWord);
> return;
> }
>
> node->word = aWord;
> node->next = gSet;
> gSet = node;
> ++gSetSize;
> }
>
> static void
> DeleteSet(void)
> {
> struct SetNode *node = gSet;
> struct SetNode *temp;
>
> while (node) {
> temp = node;
> node = node->next;
> free(temp->word);
> free(temp);
> }
>
> gSet = 0;
> gSetSize = 0;
> }
>
> int
> main(void)
> {
> char *word;
>
> while ((word = ReadOneWord()))
> InsertWord(word);
>
> printf("Words: %d\n", gSetSize);
>
> // Skip cleanup for now...
> //DeleteSet();
> }
>
> Any ideas as to what causes the big slowdown?


Choice of a sub-optimal data structure.


Best

Kai-Uwe Bux
  #3  
Old 11-06-2008, 11:21 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

s0suk3@gmail.com writes:
> The task: Write a program that reads a set of words from standard
> input and prints the number of distinct words.
>
> I came across a website that listed a few programs to accomplish this
> task: http://unthought.net/c++/c_vs_c++.html (ignore all the language
> flaming :-), and thought that all of them did unnecessary operations,
> so I wrote my own. But for some reason, my version turned out slower
> that ALL of the versions in the website, even though it seems to
> perform less operations (yes, I benchmarked them on my own computer).
>
> According to the website, the slowest version is:
>
> #include <set>
> #include <string>
> #include <iostream>
>
> int main(int argc, char **argv)
> {
> // Declare and Initialize some variables
> std::string word;
> std::set<std::string> wordcount;
> // Read words and insert in rb-tree
> while (std::cin >> word) wordcount.insert(word);
> // Print the result
> std::cout << "Words: " << wordcount.size() << std::endl;
> return 0;
> }
>
> My version is about 12 times slower than that. It uses lower-level
> constructs. Here it is:
>

[snip]
> // Inserts a word into the set if it isn't in the set.
> // The passed string is expected to have been allocated with
> // a memory allocation function, and it should be considered
> // lost after passed to this function.
> //
> static void
> InsertWord(char *aWord)
> {
> struct SetNode *node;
>
> for (node = gSet; node; node = node->next) {
> if (strcmp(node->word, aWord) == 0) {
> free(aWord);
> return;
> }
> }


You represent your set of words as a linked list. You compare each
new word to every word already in the set. The C++ solution uses a
std::set which, if I recall correctly, can do searches and insertions
in O(n log n).

If you re-write this to use a balanced binary tree, such as an AVL
tree, you should get performance similar to the C++ version.

> node = (struct SetNode *) malloc(sizeof(struct SetNode));


Not incorrect, but
node = malloc(sizeof *node);
would be better.

> if (!node) {
> free(aWord);
> return;
> }


And if malloc fails, you quietly return without doing anything to
handle the error or report it to the user.

[...]

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
  #4  
Old 11-06-2008, 11:22 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

On Thu, 06 Nov 2008 07:53:18 -0800, s0suk3 wrote:

> The task: Write a program that reads a set of words from standard input
> and prints the number of distinct words.
>
> I came across a website that listed a few programs to accomplish this
> task: http://unthought.net/c++/c_vs_c++.html (ignore all the language
> flaming :-), and thought that all of them did unnecessary operations, so
> I wrote my own. But for some reason, my version turned out slower that
> ALL of the versions in the website, even though it seems to perform less
> operations (yes, I benchmarked them on my own computer).
>
> According to the website, the slowest version is:
>
> #include <set>
> #include <string>
> #include <iostream>
>
> int main(int argc, char **argv)
> {
> // Declare and Initialize some variables std::string word;
> std::set<std::string> wordcount;
> // Read words and insert in rb-tree
> while (std::cin >> word) wordcount.insert(word); // Print the
> result
> std::cout << "Words: " << wordcount.size() << std::endl; return
> 0;
> }
>
> My version is about 12 times slower than that. It uses lower-level
> constructs. Here it is:
>

[snip]

So, since your version uses "lower-level constructs" you assume
it would be automatically faster?

--
OU
Remember 18th of June 2008, Democracy died that afternoon.
http://frapedia.se/wiki/Information_in_English
  #5  
Old 11-06-2008, 11:23 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

On Nov 6, 11:53 pm, s0s...@gmail.com wrote:
> The task: Write a program that reads a set of words from standard
> input and prints the number of distinct words.
>
> I came across a website that listed a few programs to accomplish this
> task:http://unthought.net/c++/c_vs_c++.html(ignore all the language
> flaming :-), and thought that all of them did unnecessary operations,
> so I wrote my own. But for some reason, my version turned out slower
> that ALL of the versions in the website, even though it seems to
> perform less operations (yes, I benchmarked them on my own computer).
>
> According to the website, the slowest version is:
>
> #include <set>
> #include <string>
> #include <iostream>
>
> int main(int argc, char **argv)
> {
>         // Declare and Initialize some variables
>         std::string word;
>         std::set<std::string> wordcount;
>         // Read words and insert in rb-tree
>         while (std::cin >> word) wordcount.insert(word);
>         // Print the result
>         std::cout << "Words: " << wordcount.size() << std::endl;
>         return 0;
>
> }
>
> My version is about 12 times slower than that. It uses lower-level
> constructs. Here it is:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <ctype.h>
>
> struct SetNode
> {
>     char *word;
>     struct SetNode *next;
>
> };
>
> // An unorderd set of words
> //
> static struct SetNode *gSet = 0;
> static int gSetSize = 0;
>
> #define kInitWordSize 32
>
> // Returns a word read from stdin. The returned pointer must be
> // deallocated with free().
> //
> static char *
> ReadOneWord(void)
> {
>     int ch = getchar();
>
>     while (ch != EOF && isspace(ch))
>         ch = getchar();
>     if (ch == EOF)
>         return 0;
>
>     char *word = (char *) malloc(kInitWordSize);
>     if (!word)
>         return 0;
>
>     int size = kInitWordSize;
>     int i = 0;
>
>     while (ch != EOF && !isspace(ch)) {
>         if (i >= size) {
>             size *= 2;
>
>             char *newWord = (char *) realloc(word, size);
>             if (!newWord) {
>                 free(word);
>                 return 0;
>             }
>             word = newWord;
>         }
>
>         word[i++] = ch;
>         ch = getchar();
>     }
>
>     if (i >= size) {
>         size *= 2;
>
>         char *newWord = (char *) realloc(word, size);
>         if (!newWord) {
>             free(word);
>             return 0;
>         }
>         word = newWord;
>     }
>     word[i] = '\0';
>
>     return word;
>
> }
>
> // Inserts a word into the set if it isn't in the set.
> // The passed string is expected to have been allocated with
> // a memory allocation function, and it should be considered
> // lost after passed to this function.
> //
> static void
> InsertWord(char *aWord)
> {
>     struct SetNode *node;
>
>     for (node = gSet; node; node = node->next) {
>         if (strcmp(node->word, aWord) == 0) {
>             free(aWord);
>             return;
>         }
>     }
>
>     node = (struct SetNode *) malloc(sizeof(struct SetNode));
>     if (!node) {
>         free(aWord);
>         return;
>     }
>
>     node->word = aWord;
>     node->next = gSet;
>     gSet = node;
>     ++gSetSize;
>
> }
>
> static void
> DeleteSet(void)
> {
>     struct SetNode *node = gSet;
>     struct SetNode *temp;
>
>     while (node) {
>         temp = node;
>         node = node->next;
>         free(temp->word);
>         free(temp);
>     }
>
>     gSet = 0;
>     gSetSize = 0;
>
> }
>
> int
> main(void)
> {
>     char *word;
>
>     while ((word = ReadOneWord()))
>         InsertWord(word);
>
>     printf("Words: %d\n", gSetSize);
>
>     // Skip cleanup for now...
>     //DeleteSet();
>
> }
>
> Any ideas as to what causes the big slowdown?
>
> Sebastian


Noticed that you've implemented your own mechanism of scanning words
from standard input and insert a new elements in your "sets". I don't
know why you implement it by yourself. Are you clear the principle of
the class cin/cout and set? Are you sure that your own function have a
better performance to the standard one?
I'm interested with how to test your application performance. Can you
tell me? I suggest you compare the performance difference between your
function between the standard one step by step.

Thanks
  #6  
Old 11-06-2008, 12:27 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

Keith Thompson wrote:
> You represent your set of words as a linked list. You compare each
> new word to every word already in the set. The C++ solution uses a
> std::set which, if I recall correctly, can do searches and insertions
> in O(n log n).


That would be worse than linear-time, and is of course false.

You meant: O(lg n).
  #7  
Old 11-06-2008, 12:30 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

s0suk3@gmail.com wrote:
> Any ideas as to what causes the big slowdown?


Why do you expect that searching a linked list could be even close to
the speed of searching a balanced binary tree?
  #8  
Old 11-06-2008, 01:00 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

On Nov 6, 11:22 am, Obnoxious User <OU@127.0.0.1> wrote:
> On Thu, 06 Nov 2008 07:53:18 -0800, s0suk3 wrote:
> > The task: Write a program that reads a set of words from standard input
> > and prints the number of distinct words.

>
> > I came across a website that listed a few programs to accomplish this
> > task:http://unthought.net/c++/c_vs_c++.html(ignore all the language
> > flaming :-), and thought that all of them did unnecessary operations, so
> > I wrote my own. But for some reason, my version turned out slower that
> > ALL of the versions in the website, even though it seems to perform less
> > operations (yes, I benchmarked them on my own computer).

>
> > According to the website, the slowest version is:

>
> > #include <set>
> > #include <string>
> > #include <iostream>

>
> > int main(int argc, char **argv)
> > {
> >         // Declare and Initialize some variables std::string word;
> >         std::set<std::string> wordcount;
> >         // Read words and insert in rb-tree
> >         while (std::cin >> word) wordcount.insert(word); // Print the
> >         result
> >         std::cout << "Words: " << wordcount.size() << std::endl; return
> >         0;
> > }

>
> > My version is about 12 times slower than that. It uses lower-level
> > constructs. Here it is:

>
> [snip]
>
> So, since your version uses "lower-level constructs" you assume
> it would be automatically faster?


Well, in this case it did seem to have a couple of advantages, like
for example, InsertWord() is given a pointer directly to the malloc()-
allocated string, which it simply copies into the .word member of the
struct; it doesn't need to allocate new memory and copy the string
from one place to the other, whereas std::set::insert() does need to
do make a copy.

Also, I'm not sure, but is the set's destructor invoked when main()
goes out of scope (causing memory cleanup)? (Currently the other
version has the DeleteSet() call commented-out.)

But, as others have pointed out, it's clear that the reason for the
difference in performance is the searching mechanism.

Sebastian

  #9  
Old 11-06-2008, 01:14 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

s0suk3@gmail.com wrote:
> Well, in this case it did seem to have a couple of advantages, like
> for example, InsertWord() is given a pointer directly to the malloc()-
> allocated string, which it simply copies into the .word member of the
> struct; it doesn't need to allocate new memory and copy the string
> from one place to the other, whereas std::set::insert() does need to
> do make a copy.
>
> Also, I'm not sure, but is the set's destructor invoked when main()
> goes out of scope (causing memory cleanup)? (Currently the other
> version has the DeleteSet() call commented-out.)
>
> But, as others have pointed out, it's clear that the reason for the
> difference in performance is the searching mechanism.


Indeed it is. All that low-level stuff is faster, but it means nothing
when the problem is an algorithm and/or used data structure.

Pawel Dziepak
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkkTNCAACgkQPFW+cUiIHNo9IQCgr6NC76/yXnouBhUYLn3fx3Rn
2HQAn3h19yS62OZqMv+jo0wSsr6M576O
=WTrr
-----END PGP SIGNATURE-----
  #10  
Old 11-06-2008, 01:18 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Benchmarks

Juha Nieminen <nospam@thanks.invalid> writes:
> Keith Thompson wrote:
>> You represent your set of words as a linked list. You compare each
>> new word to every word already in the set. The C++ solution uses a
>> std::set which, if I recall correctly, can do searches and insertions
>> in O(n log n).

>
> That would be worse than linear-time, and is of course false.
>
> You meant: O(lg n).


Oops, you're correct of course; thanks for the correction.

Actually I meant O(log n), but it's the same thing.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Reply

Thread Tools



All times are GMT -5. The time now is 08:53 AM.

Managed by Infnx Pvt Ltd.