What's faster? - ASM x86 ASM 370

This is a discussion on What's faster? - ASM x86 ASM 370 ; octMove: movdqa xmm1, [edi] Blah, blah, blah movdqa [edi], xmm1 add edi, 16 loop octMove or octMove: movdqa xmm1, [edi] add edi, 16 Blah, blah, blah movdqa [edi-16], xmm1 loop octMove I know there are latency issue with using an ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 13

What's faster?

  1. Default What's faster?

    octMove:
    movdqa xmm1, [edi]

    Blah, blah, blah

    movdqa [edi], xmm1
    add edi, 16

    loop octMove


    or


    octMove:
    movdqa xmm1, [edi]
    add edi, 16

    Blah, blah, blah

    movdqa [edi-16], xmm1

    loop octMove


    I know there are latency issue with using an address register just
    after setting it. Does the loop soak that up?

    Thanks!

    -- Rich Fife --


  2. Default Re: What's faster?

    Rich Fife <spamtrap@crayne.org> wrote in part:
    > octMove:
    > movdqa xmm1, [edi]
    >
    > Blah, blah, blah
    >
    > movdqa [edi], xmm1
    > add edi, 16
    >
    > loop octMove



    Not using `loop` is faster: It has been deliberately
    slowed down to prevent MS-Win95 from crashing.

    Otherwise, it should make no difference with register renaming.

    -- Robert


  3. Default Re: What's faster?

    Robert Redelmeier wrote:
    > Not using `loop` is faster: It has been deliberately
    > slowed down to prevent MS-Win95 from crashing.


    Kidding, or serious?


  4. Default Re: What's faster?

    Jim Leonard wrote:
    > Robert Redelmeier wrote:
    >> Not using `loop` is faster: It has been deliberately
    >> slowed down to prevent MS-Win95 from crashing.

    >
    > Kidding, or serious?
    >


    That was my reaction, too. Until I benchmarked.


  5. Default Re: What's faster?

    Jeffrey Schwab wrote:
    > Jim Leonard wrote:
    > > Robert Redelmeier wrote:
    > >> Not using `loop` is faster: It has been deliberately
    > >> slowed down to prevent MS-Win95 from crashing.

    > >
    > > Kidding, or serious?

    >
    > That was my reaction, too. Until I benchmarked.


    That's not what I meant. I knew it was slower; what I was questioning
    is that AMD specifically degraded an instruction to get around a bug in
    someone else's software. I call BS. I understand that CPU design was
    influenced by the market (ie. the Pentium Pro was better than Pentium
    for 32-bit operations but worse for 16-bit operations, etc.) but this
    is the first time I've heard of intentionally degrading performance to
    get around something that would be a trivial patch in someone else's
    software...


  6. Default Re: What's faster?

    Jim Leonard <spamtrap@crayne.org> wrote in part:
    > That's not what I meant. I knew it was slower; what I was
    > questioning is that AMD specifically degraded an instruction
    > to get around a bug in someone else's software.


    How else do you explain going from 2 clocks (K6) to 8 (K7)
    when everything else speeds up?

    > I call BS.


    Google through AMD.

    > I understand that CPU design was influenced by the market (ie.the
    > Pentium Pro was better than Pentium for 32-bit operations but
    > worse for 16-bit operations, etc.) but this is the first time
    > I've heard of intentionally degrading performance to get around
    > something that would be a trivial patch in someone else's software


    Nothing is trivial in MS-Windows. IIRC, MS-win95* couldn't
    be installed and wouldn't boot on the machines. Had it booted,
    then it could have been patched.

    AFAIK, Intel also slowed their `loop` instruction 'cuz they
    didn't want to get bit. I think from 4 clocks to 6 on the
    Pentium!!!. The Pentium4 may be too recent to need the slowdown.

    This was a "bug" that cost AMD sales.

    -- Robert


  7. Default Re: What's faster?


    Jim Leonard wrote:
    > but this
    > is the first time I've heard of intentionally degrading performance to
    > get around something that would be a trivial patch in someone else's
    > software...


    As I pointed out earlier, this certainly wouldn't have been the first
    time. When the 486 or Pentium came along and instructions started
    operating at 1 CPI, Intel forced the NOP instruction to take 3 cycles
    because there was a considerable body of (DOS) code that computed CPU
    clock frequency by counting the number of NOPs executed between timer
    ticks (55ms). Such software (especially games) started behaving really
    weird when the NOP was changed to one clock cycle. Fortunately, the NOP
    was eventually fixed once people wised up and realized that they could
    no longer depend on software timing loops in their software (not to
    mention, having an OS API in Windows to provide this information was a
    big help, too).
    Cheers,
    Randy Hyde


  8. Default Re: What's faster?


    Robert Redelmeier wrote:
    > AFAIK, Intel also slowed their `loop` instruction 'cuz they
    > didn't want to get bit. I think from 4 clocks to 6 on the
    > Pentium!!!. The Pentium4 may be too recent to need the slowdown.


    Wasn't the Pentium the first processor where we started hearing about
    the 'RISC core'? Bottom line is that if LOOP was going to break
    software that depending on timing loops, so would a lot of other
    instructions. I suspect you're misinterpreting the reason why this
    instruction was not faster than the corresponding discrete
    instructions. The bottom line is that Intel concentrated on speeding up
    a core set of instructions, and because you could easily synthesize
    LOOP using instructions already in the core, and because so few
    compilers made effective use of the LOOP instruction, Intel chose to
    take the easy route and leave LOOP in microcode.

    BTW, it's also apparent that INC and DEC are quickly fading from the
    scene. Intel is recommending that programmers using ADD and SUB
    instead. Of course, AMD led the charge with their AMD64 chips to using
    the one-byte INC and DEC opcodes for 64-bit access. So I wouldn't be
    surprised to find that INC and DEC get slowed down in future chips.
    I.e.,

    sub( 1, ecx );
    jnz loopLbl;

    becomes the replacement for

    loop loopLbl;

    rather than

    dec( ecx );
    jnz loopLbl;

    Cheers,
    Randy Hyde


  9. Default Re: [Clax86list] Re: What's faster?

    On 2 May 2006 15:44:21 -0700
    "randyhyde@earthlink.net" <spamtrap@crayne.org> wrote:

    :sub( 1, ecx );
    :jnz loopLbl;
    :
    :becomes the replacement for
    :
    :loop loopLbl;
    :
    :rather than
    :
    :dec( ecx );
    :jnz loopLbl;

    Unfortunately, such a replacement would break a lot of existing code.

    -- Chuck


  10. Default Re: What's faster?

    randyhyde@earthlink.net wrote:
    > As I pointed out earlier, this certainly wouldn't have been the first
    > time. When the 486 or Pentium came along and instructions started
    > operating at 1 CPI, Intel forced the NOP instruction to take 3 cycles
    > because there was a considerable body of (DOS) code that computed CPU
    > clock frequency by counting the number of NOPs executed between timer
    > ticks (55ms). Such software (especially games) started behaving really


    This was happening LONG before 486. I remember games that broke on my
    7.16MHz 8086 because the prefetch queue was 6 bytes instead of 4, and
    that was enough. So, again, I can't believe this is the reason for
    making NOP take longer.

    Wasn't NOP just an alias for xchg ax,ax? Did that get slower too?


+ Reply to Thread
Page 1 of 2 1 2 LastLast

Similar Threads

  1. Please a bit faster
    By Application Development in forum Graphics
    Replies: 0
    Last Post: 11-25-2007, 11:53 AM
  2. Please a bit faster
    By Application Development in forum Theory
    Replies: 0
    Last Post: 11-25-2007, 11:53 AM
  3. Faster std::map?
    By Application Development in forum c++
    Replies: 11
    Last Post: 08-22-2007, 04:09 PM
  4. Re: What's faster?
    By Application Development in forum ASM x86 ASM 370
    Replies: 3
    Last Post: 05-02-2006, 11:24 AM
  5. Which is faster
    By Application Development in forum DOTNET
    Replies: 3
    Last Post: 09-20-2004, 07:23 PM