Interesting performance quirk. - ADA
This is a discussion on Interesting performance quirk. - ADA ; Alex R. Mosteo wrote:
> I'd also recomend valgrind with, e.g., kcachegrind to visualize the results. It might be easier to get into than gprof, and doesn't require special parameters to build the executable.
That's for the suggestion. I've never ...
-
Re: Interesting performance quirk.
Alex R. Mosteo wrote:
> I'd also recomend valgrind with, e.g., kcachegrind to visualize the results. It might be easier to get into than gprof, and doesn't require special parameters to build the executable.
That's for the suggestion. I've never used valgrind before, but it is
actually on my to-do list to figure it out in connection with an
unrelated project (that actually pertains to my day job).
Peter
-
Re: Interesting performance quirk.
Colin Paul Gloster wrote:
> Earlier this year I had used QEMU on Windows (possibly not Windows XP)
> to have a GNU/Linux distribution (possibly RedHat) emulated. I ran a
> Bourne shell script or a Bourne Again Shell script in the emulated
> system which made thousands of fairly short I/O transactions. The
> emulated system including its pretend harddisk were kept small enough
> (no more than a few hundred megabytes) to be kept solely in the real
> physical primary memory instead of relying on virtual memory.
>
> It was faster than running the same script on Cygwin on the same
> machine.
That's interesting. I think it's probably conventional wisdom that doing
I/O in a VM would be slower than outside the virtual machine. I'm sure
that's true in many cases, although the situation you described shows
that it's not always true.
My program doesn't do any I/O during its main loop. Also the memory
block I work over is only 1 MB long so I don't think paging would be an
issue (there is no disk activity when I run it). In some respects the
program is ideal for performance ****ysis in that there are relatively
few complicating factors involved. In fact, that was my intention when I
wrote it.
One complicating issue that remains is the behavior of the memory cache.
I wonder if one of the programs is missing the cache more than the
other. I'm not clear on why it would do that, however. The same hardware
is being used after all. Perhaps the Windows compiler has organized the
executable in some cache un-friendly way.
Peter
-
Re: Interesting performance quirk.
On Thu, 30 Oct 2008, Peter C. Chapin wrote:
|------------------------------------------------------------------------|
|"Colin Paul Gloster wrote: |
| |
|> Earlier this year I had used QEMU on Windows (possibly not Windows XP)|
|> to have a GNU/Linux distribution (possibly RedHat) emulated. I ran a |
|> Bourne shell script or a Bourne Again Shell script in the emulated |
|> system which made thousands of fairly short I/O transactions. The |
|> emulated system including its pretend harddisk were kept small enough |
|> (no more than a few hundred megabytes) to be kept solely in the real |
|> physical primary memory instead of relying on virtual memory. |
|> |
|> It was faster than running the same script on Cygwin on the same |
|> machine. |
| |
|That's interesting. I think it's probably conventional wisdom that doing|
|I/O in a VM would be slower than outside the virtual machine. I'm sure |
|that's true in many cases, although the situation you described shows |
|that it's not always true." |
|------------------------------------------------------------------------|
For clarity, I explain that the I/O of the virtual machine which I
referred to was merely I/O to its emulated filesystems, all of which
together plus the emulated memory were small enough to fit into the
genuine physical memory of the host operating system.
|------------------------------------------------------------------------|
|"My program [..] the memory |
|block I work over is only 1 MB long so I don't think paging would be an |
|issue (there is no disk activity when I run it). [..] |
| |
|[..]" |
|------------------------------------------------------------------------|
Your program's memory block's size might be of the order of one
megabyte, but I do not know whether the emulated filesystems which you
used were also small enough to fit into emulated memory. However, this
does not explain why one program you have tried has been sped up by
emulation whereas another has not been sped up.
-
Re: Interesting performance quirk.
On Oct 31, 9:41 am, Colin Paul Gloster <Colin_Paul_Glos...@ACM.org>
wrote:
> On Thu, 30 Oct 2008, Peter C. Chapin wrote:
>
> |------------------------------------------------------------------------||"Colin Paul Gloster wrote: |
>
> | |
> |> Earlier this year I had used QEMU on Windows (possibly not Windows XP)|
> |> to have a GNU/Linux distribution (possibly RedHat) emulated. I ran a |
> |> Bourne shell script or a Bourne Again Shell script in the emulated |
> |> system which made thousands of fairly short I/O transactions. The |
> |> emulated system including its pretend harddisk were kept small enough |
> |> (no more than a few hundred megabytes) to be kept solely in the real |
> |> physical primary memory instead of relying on virtual memory. |
> |> |
> |> It was faster than running the same script on Cygwin on the same |
> |> machine. |
> | |
> |That's interesting. I think it's probably conventional wisdom that doing|
> |I/O in a VM would be slower than outside the virtual machine. I'm sure |
> |that's true in many cases, although the situation you described shows |
> |that it's not always true." |
> |------------------------------------------------------------------------|
>
> For clarity, I explain that the I/O of the virtual machine which I
> referred to was merely I/O to its emulated filesystems, all of which
> together plus the emulated memory were small enough to fit into the
> genuine physical memory of the host operating system.
>
> |------------------------------------------------------------------------|
> |"My program [..] the memory |
> |block I work over is only 1 MB long so I don't think paging would be an |
> |issue (there is no disk activity when I run it). [..] |
> | |
> |[..]" |
> |------------------------------------------------------------------------|
>
> Your program's memory block's size might be of the order of one
> megabyte, but I do not know whether the emulated filesystems which you
> used were also small enough to fit into emulated memory. However, this
> does not explain why one program you have tried has been sped up by
> emulation whereas another has not been sped up.
Exactly right. The most obvious explanation is that system-dependent
code or build conventions have led to some important difference in the
run-time support. Detailed profiling is probably the only way to
figure how where. FWIW, I remember a similar situation that finally
turned out to be explained by compilation with the Windows
multithreaded debugging libraries. When we switched to production,
single-threaded libraries, the differences vanished or went in favor
of Windows.
-
Re: Interesting performance quirk.
"Peter C. Chapin" <pcc482719@gmail.com> wrote in message
news:4903c066$0$28676$4d3efbfe@news.sover.net...
....
> Now the interesting part. My main development system is a Windows XP
> laptop. On this system my "optimized" Blowfish benchmark encrypts or
> decrypts at about 11 MB/s (curiously decryption is a little faster than
> encryption, which seems odd). It also happens that I have OpenSUSE 10.2
> Linux running on the same box in a VMware virtual machine. In that
> environment my benchmark encrypts or decrypts at fully 27 MB/s. It's
> over twice as fast! I'm using GNAT GPL 2008 in both cases with the same
> compiler options and exactly the same source code. I'm even using the
> same basic hardware although, as I said, one of my systems---the faster
> one---is a virtual machine.
>
> Should I be surprised at this performance difference? I wasn't expecting
> it. Note that I'm using Ada.Calendar.Clock to track execution time. At
> first I wondered if the virtual machine's notion of time was distorted
> in some way but, no... the program is definitely faster in the VM (it
> runs long enough so that the difference is speed is easily perceptible
> by a human).
I can't answer whether you should be surprised, but I'm not. My experience
is that modern CPU chips have performance characteristics that seem random
and depend on things that no one has any control over.
My most recent example was a hobby program, much think yours. I was
surprised to see that fixing a memory management flaw caused the program to
run twice as fast. That temporarily caused rejoicing, until improving the
behavior of a non critical piece of the program caused the program to slow
by 50%! (This effect showed up on several Windows OSes on different Intel
processors. But not on the old Pentium IIIs.) Experimenting, I discovered
that I could change code in units totally unrelated to the "hot" areas of
the program and cause vast changes in the performance of the inner loops.
I of course verified that the generated code really was unchanged (it was).
I went as far as reading the lastest Intel literature on these topics (and
it is huge). I thought that the effect might have had something to do with
the alignment of the innermost loops, but adding options to control that to
Janus/Ada didn't help much (it did get rid of the slowest versions, but the
performance still could vary wildly, about 30% if I remember correctly).
Having wasted most of a nice weekend messing with this (and having no
customer requirements at the time), I finally gave up and just twiddled with
some unrelated code until the program ran fast.
So I don't quite know what is going on. I suspect it is related in some way
to alignment, but it might be necessary for some code to be page aligned for
maximum performance (and that is way too expensive to use within loops and
other code that is going to be executed - you have to fill the empty space
with no-ops, and executing them takes some time. Intel actually recommends
no-op sequences to use to fill space in order to minimize time - yuck).
So it is possible that the performance difference has everything to do with
unrelated parts of your program (such as the I/O libraries), which are going
to be different for the two OSes. And nothing to do with your Ada code or
anything that your compiler has control over.
Randy.
-
Re: Interesting performance quirk.
Randy Brukardt wrote:
> So it is possible that the performance difference has everything to do with
> unrelated parts of your program (such as the I/O libraries), which are going
> to be different for the two OSes. And nothing to do with your Ada code or
> anything that your compiler has control over.
Perhaps I'm lucky that I will be able to retire in another decade or so.
Peter