Interesting performance quirk. - ADA

This is a discussion on Interesting performance quirk. - ADA ; Alex R. Mosteo wrote: > I'd also recomend valgrind with, e.g., kcachegrind to visualize the results. It might be easier to get into than gprof, and doesn't require special parameters to build the executable. That's for the suggestion. I've never ...

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3
Results 21 to 26 of 26

Interesting performance quirk.

  1. Default Re: Interesting performance quirk.

    Alex R. Mosteo wrote:

    > I'd also recomend valgrind with, e.g., kcachegrind to visualize the results. It might be easier to get into than gprof, and doesn't require special parameters to build the executable.


    That's for the suggestion. I've never used valgrind before, but it is
    actually on my to-do list to figure it out in connection with an
    unrelated project (that actually pertains to my day job).

    Peter

  2. Default Re: Interesting performance quirk.

    Colin Paul Gloster wrote:

    > Earlier this year I had used QEMU on Windows (possibly not Windows XP)
    > to have a GNU/Linux distribution (possibly RedHat) emulated. I ran a
    > Bourne shell script or a Bourne Again Shell script in the emulated
    > system which made thousands of fairly short I/O transactions. The
    > emulated system including its pretend harddisk were kept small enough
    > (no more than a few hundred megabytes) to be kept solely in the real
    > physical primary memory instead of relying on virtual memory.
    >
    > It was faster than running the same script on Cygwin on the same
    > machine.


    That's interesting. I think it's probably conventional wisdom that doing
    I/O in a VM would be slower than outside the virtual machine. I'm sure
    that's true in many cases, although the situation you described shows
    that it's not always true.

    My program doesn't do any I/O during its main loop. Also the memory
    block I work over is only 1 MB long so I don't think paging would be an
    issue (there is no disk activity when I run it). In some respects the
    program is ideal for performance ****ysis in that there are relatively
    few complicating factors involved. In fact, that was my intention when I
    wrote it.

    One complicating issue that remains is the behavior of the memory cache.
    I wonder if one of the programs is missing the cache more than the
    other. I'm not clear on why it would do that, however. The same hardware
    is being used after all. Perhaps the Windows compiler has organized the
    executable in some cache un-friendly way.

    Peter

  3. Default Re: Interesting performance quirk.

    On Thu, 30 Oct 2008, Peter C. Chapin wrote:

    |------------------------------------------------------------------------|
    |"Colin Paul Gloster wrote: |
    | |
    |> Earlier this year I had used QEMU on Windows (possibly not Windows XP)|
    |> to have a GNU/Linux distribution (possibly RedHat) emulated. I ran a |
    |> Bourne shell script or a Bourne Again Shell script in the emulated |
    |> system which made thousands of fairly short I/O transactions. The |
    |> emulated system including its pretend harddisk were kept small enough |
    |> (no more than a few hundred megabytes) to be kept solely in the real |
    |> physical primary memory instead of relying on virtual memory. |
    |> |
    |> It was faster than running the same script on Cygwin on the same |
    |> machine. |
    | |
    |That's interesting. I think it's probably conventional wisdom that doing|
    |I/O in a VM would be slower than outside the virtual machine. I'm sure |
    |that's true in many cases, although the situation you described shows |
    |that it's not always true." |
    |------------------------------------------------------------------------|

    For clarity, I explain that the I/O of the virtual machine which I
    referred to was merely I/O to its emulated filesystems, all of which
    together plus the emulated memory were small enough to fit into the
    genuine physical memory of the host operating system.

    |------------------------------------------------------------------------|
    |"My program [..] the memory |
    |block I work over is only 1 MB long so I don't think paging would be an |
    |issue (there is no disk activity when I run it). [..] |
    | |
    |[..]" |
    |------------------------------------------------------------------------|

    Your program's memory block's size might be of the order of one
    megabyte, but I do not know whether the emulated filesystems which you
    used were also small enough to fit into emulated memory. However, this
    does not explain why one program you have tried has been sped up by
    emulation whereas another has not been sped up.

  4. Default Re: Interesting performance quirk.

    On Oct 31, 9:41 am, Colin Paul Gloster <Colin_Paul_Glos...@ACM.org>
    wrote:
    > On Thu, 30 Oct 2008, Peter C. Chapin wrote:
    >
    > |------------------------------------------------------------------------||"Colin Paul Gloster wrote:                                              |
    >
    > |                                                                       |
    > |> Earlier this year I had used QEMU on Windows (possibly not Windows XP)|
    > |> to have a GNU/Linux distribution (possibly RedHat) emulated. I ran a  |
    > |> Bourne shell script or a Bourne Again Shell script in the emulated   |
    > |> system which made thousands of fairly short I/O transactions. The     |
    > |> emulated system including its pretend harddisk were kept small enough |
    > |> (no more than a few hundred megabytes) to be kept solely in the real  |
    > |> physical primary memory instead of relying on virtual memory.         |
    > |>                                                                       |
    > |> It was faster than running the same script on Cygwin on the same      |
    > |> machine.                                                              |
    > |                                                                       |
    > |That's interesting. I think it's probably conventional wisdom that doing|
    > |I/O in a VM would be slower than outside the virtual machine. I'm sure  |
    > |that's true in many cases, although the situation you described shows   |
    > |that it's not always true."                                             |
    > |------------------------------------------------------------------------|
    >
    > For clarity, I explain that the I/O of the virtual machine which I
    > referred to was merely I/O to its emulated filesystems, all of which
    > together plus the emulated memory were small enough to fit into the
    > genuine physical memory of the host operating system.
    >
    > |------------------------------------------------------------------------|
    > |"My program [..] the memory                                             |
    > |block I work over is only 1 MB long so I don't think paging would be an |
    > |issue (there is no disk activity when I run it). [..]                   |
    > |                                                                       |
    > |[..]"                                                                   |
    > |------------------------------------------------------------------------|
    >
    > Your program's memory block's size might be of the order of one
    > megabyte, but I do not know whether the emulated filesystems which you
    > used were also small enough to fit into emulated memory. However, this
    > does not explain why one program you have tried has been sped up by
    > emulation whereas another has not been sped up.


    Exactly right. The most obvious explanation is that system-dependent
    code or build conventions have led to some important difference in the
    run-time support. Detailed profiling is probably the only way to
    figure how where. FWIW, I remember a similar situation that finally
    turned out to be explained by compilation with the Windows
    multithreaded debugging libraries. When we switched to production,
    single-threaded libraries, the differences vanished or went in favor
    of Windows.

  5. Default Re: Interesting performance quirk.

    "Peter C. Chapin" <pcc482719@gmail.com> wrote in message
    news:4903c066$0$28676$4d3efbfe@news.sover.net...
    ....
    > Now the interesting part. My main development system is a Windows XP
    > laptop. On this system my "optimized" Blowfish benchmark encrypts or
    > decrypts at about 11 MB/s (curiously decryption is a little faster than
    > encryption, which seems odd). It also happens that I have OpenSUSE 10.2
    > Linux running on the same box in a VMware virtual machine. In that
    > environment my benchmark encrypts or decrypts at fully 27 MB/s. It's
    > over twice as fast! I'm using GNAT GPL 2008 in both cases with the same
    > compiler options and exactly the same source code. I'm even using the
    > same basic hardware although, as I said, one of my systems---the faster
    > one---is a virtual machine.
    >
    > Should I be surprised at this performance difference? I wasn't expecting
    > it. Note that I'm using Ada.Calendar.Clock to track execution time. At
    > first I wondered if the virtual machine's notion of time was distorted
    > in some way but, no... the program is definitely faster in the VM (it
    > runs long enough so that the difference is speed is easily perceptible
    > by a human).


    I can't answer whether you should be surprised, but I'm not. My experience
    is that modern CPU chips have performance characteristics that seem random
    and depend on things that no one has any control over.

    My most recent example was a hobby program, much think yours. I was
    surprised to see that fixing a memory management flaw caused the program to
    run twice as fast. That temporarily caused rejoicing, until improving the
    behavior of a non critical piece of the program caused the program to slow
    by 50%! (This effect showed up on several Windows OSes on different Intel
    processors. But not on the old Pentium IIIs.) Experimenting, I discovered
    that I could change code in units totally unrelated to the "hot" areas of
    the program and cause vast changes in the performance of the inner loops.

    I of course verified that the generated code really was unchanged (it was).

    I went as far as reading the lastest Intel literature on these topics (and
    it is huge). I thought that the effect might have had something to do with
    the alignment of the innermost loops, but adding options to control that to
    Janus/Ada didn't help much (it did get rid of the slowest versions, but the
    performance still could vary wildly, about 30% if I remember correctly).
    Having wasted most of a nice weekend messing with this (and having no
    customer requirements at the time), I finally gave up and just twiddled with
    some unrelated code until the program ran fast.

    So I don't quite know what is going on. I suspect it is related in some way
    to alignment, but it might be necessary for some code to be page aligned for
    maximum performance (and that is way too expensive to use within loops and
    other code that is going to be executed - you have to fill the empty space
    with no-ops, and executing them takes some time. Intel actually recommends
    no-op sequences to use to fill space in order to minimize time - yuck).

    So it is possible that the performance difference has everything to do with
    unrelated parts of your program (such as the I/O libraries), which are going
    to be different for the two OSes. And nothing to do with your Ada code or
    anything that your compiler has control over.

    Randy.



  6. Default Re: Interesting performance quirk.

    Randy Brukardt wrote:

    > So it is possible that the performance difference has everything to do with
    > unrelated parts of your program (such as the I/O libraries), which are going
    > to be different for the two OSes. And nothing to do with your Ada code or
    > anything that your compiler has control over.


    Perhaps I'm lucky that I will be able to retire in another decade or so.

    Peter

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3