Hex to ascii - ASM x86 ASM 370

This is a discussion on Hex to ascii - ASM x86 ASM 370 ; Terje Mathisen wrote: > Jean-François Michaud wrote: > > Terje Mathisen wrote: > >> In 32-bit mode the difference is even larger, the code AMD "borrowed" > >> converts a full 32-bit unsigned integer to 10 decimal digits in less ...

+ Reply to Thread
Page 2 of 4 FirstFirst 1 2 3 4 LastLast
Results 11 to 20 of 34

Hex to ascii

  1. Default Re: Hex to ascii


    Terje Mathisen wrote:
    > Jean-François Michaud wrote:
    > > Terje Mathisen wrote:
    > >> In 32-bit mode the difference is even larger, the code AMD "borrowed"
    > >> converts a full 32-bit unsigned integer to 10 decimal digits in less
    > >> time than the cpu needs for a single DIV opcode! :-)
    > >>
    > >> Terje

    > >
    > > How about a lookup table?

    >
    > Lookup tables are indeed good here, in fact they should be the obvious
    > choice. :-)
    >
    > However, when working with more digits and/or parallel operations,
    > straight inline SIMD code tends to beat lookup tables due to the load
    > stage bottleneck.


    Hmm possibly but thats assuming alot ;-). Most processors out there
    don't have the SIMD instruction set/registers.

    Regards
    Jean-Francois Michaud


  2. Default Re: Hex to ascii


    Jean-François Michaud wrote:
    > > However, when working with more digits and/or parallel operations,
    > > straight inline SIMD code tends to beat lookup tables due to the load
    > > stage bottleneck.

    >
    > Hmm possibly but thats assuming alot ;-). Most processors out there
    > don't have the SIMD instruction set/registers.



    Actually, most do. But there are just enough that don't in order to
    make it dangerous to use them if you have a universal application.

    The experiments I've run lately show a 512-byte lookup table (which
    translates one byte/two digits at a time) to be the fastest, but I've
    not done careful ****ysis to make sure the cache isn't getting polluted
    by having such a large table. Bertrand posted an SSE version in the
    "faster hex to buffer routine" thread. And I've provided a couple of
    other solutions.

    Note to Terje -- I tried the MUL trick with 32-bit integers (converted
    to a decimal string) and wound up having to do a *huge* multiplication
    to overcome the loss of precision. For two characters (one byte) you
    can get away with this, but not for 32-bit integers. I'd be interested
    in seeing a 32-bit conversion to decimal integer string that doesn't
    involve at least three MUL instructions. I found out that a repeated
    subtraction for each digit turned out to be the fastest solution I
    could come up with.
    Cheers,
    Randy Hyde


  3. Default Re: Hex to ascii

    Jean-François Michaud wrote:
    > Terje Mathisen wrote:
    >> However, when working with more digits and/or parallel operations,
    >> straight inline SIMD code tends to beat lookup tables due to the load
    >> stage bottleneck.

    >
    > Hmm possibly but thats assuming alot ;-). Most processors out there
    > don't have the SIMD instruction set/registers.


    I'm not advocating SIMD opcodes, as in MMX/SSE, but working with
    multiple values simultaneously in a register that's wider than the basic
    data item. :-)

    Terje

    --
    - <Terje.Mathisen@hda.hydro.com>
    "almost all programming can be viewed as an exercise in caching"


  4. Default Re: Hex to ascii

    randyhyde@earthlink.net wrote:
    > Note to Terje -- I tried the MUL trick with 32-bit integers (converted
    > to a decimal string) and wound up having to do a *huge* multiplication
    > to overcome the loss of precision. For two characters (one byte) you
    > can get away with this, but not for 32-bit integers. I'd be interested
    > in seeing a 32-bit conversion to decimal integer string that doesn't
    > involve at least three MUL instructions. I found out that a repeated
    > subtraction for each digit turned out to be the fastest solution I
    > could come up with.


    Huh?

    10 digits, each averaging 5 subtractions/branches, and you run faster
    than the 30-50 cycles of a 3-MUL direct conversion?

    Terje
    --
    - <Terje.Mathisen@hda.hydro.com>
    "almost all programming can be viewed as an exercise in caching"


  5. Default Re: Hex to ascii


    randyhyde@earthlink.net wrote:
    > Jean-François Michaud wrote:
    > > > However, when working with more digits and/or parallel operations,
    > > > straight inline SIMD code tends to beat lookup tables due to the load
    > > > stage bottleneck.

    > >
    > > Hmm possibly but thats assuming alot ;-). Most processors out there
    > > don't have the SIMD instruction set/registers.

    >
    >
    > Actually, most do. But there are just enough that don't in order to
    > make it dangerous to use them if you have a universal application.


    Quite interresting. It seems we don't live on the same planet then .
    90% of the processors out there don't have those facilities. If you are
    talking about desktop then yes I would agree, but most computers are
    not desktop computers.

    [SNIP]

    Regards
    Jean-Francois Michaud


  6. Default Re: Hex to ascii


    Jean-François Michaud wrote:

    > > > Hmm possibly but thats assuming alot ;-). Most processors out there
    > > > don't have the SIMD instruction set/registers.

    > >
    > >
    > > Actually, most do. But there are just enough that don't in order to
    > > make it dangerous to use them if you have a universal application.

    >
    > Quite interresting. It seems we don't live on the same planet then .
    > 90% of the processors out there don't have those facilities. If you are
    > talking about desktop then yes I would agree, but most computers are
    > not desktop computers.


    This is the comp.lang.asm.x86 newsgroup, so we immediately eliminate
    from consideration non-x86 processors. While x86 processors *are* used
    in embedded applications that don't support SIMD instructions, I still
    argue that the majority of x86s in use, do.

    Your argument would have more power in ALA, but here one has to assume
    that we're talking x86.
    Cheers,
    Randy Hyde


  7. Default Re: Hex to ascii


    Terje Mathisen wrote:
    >
    > Huh?
    >
    > 10 digits, each averaging 5 subtractions/branches, and you run faster
    > than the 30-50 cycles of a 3-MUL direct conversion?


    Well, if it were only three MUL instructions I'd probably be okay with
    that. But it took more. And I still had accuracy problems that I had to
    hand-tweak for each digit range (and that resulted in several
    additional instructions in addition to the MUL). Again, I'd like to see
    a good generic MUL solution that works for 32 bits. Maybe my code was
    defective, I don't know. But it took a *lot* of work to get it working
    correctly for all 4 billion combinations. I wish, now, that I'd kept
    the code around rather than throwing it away after determining that the
    repeated subtraction worked faster so I could post it here and find out
    what was wrong with the approach.

    But if anyone has a 32-bit integer to decimal string conversion routine
    that uses MUL for the division operation, I'd love to see it. It's not
    like I'm happy with the repeated subtraction approach, just that the
    repeated subtraction approach is the fastest I've created so far...
    Cheers,
    Randy Hyde


  8. Default Re: Hex to ascii

    randyhyde@earthlink.net wrote:
    > Terje Mathisen wrote:
    >> Huh?

    > Well, if it were only three MUL instructions I'd probably be okay with
    > that. But it took more. And I still had accuracy problems that I had to


    Aha!

    You used reciprocal MUL as a simple substitution for DIV, and you
    probably didn't do the full error ****ysis needed to prove how to get
    exact results either?

    Anyway, my algorithm is totally different: It splits the 32-bit binary
    number into two 5-digit (decimal) halfs, using a reciprocal
    multiplication by 1/100000.

    The next step is to scale both of these numbers so as to make them a
    binary/decimal scaled fraction, with the (scaled) decimal point after
    the first digit.

    At this point each pair of digits (one from each half) can be retrieved
    by masking away the top digit After storing it), multiplying the
    fraction by 5 (i.e. 10/2), and repeat the process, with the implied
    decimal point one bit further down.

    > But if anyone has a 32-bit integer to decimal string conversion routine
    > that uses MUL for the division operation, I'd love to see it. It's not
    > like I'm happy with the repeated subtraction approach, just that the
    > repeated subtraction approach is the fastest I've created so far...


    Since I invented it, I'll post my original version. As I've stated
    previously, you can see AMD's minor tweaks to it in their optimization
    manual.

    The version below uses 3 MUL operations and a single IMUL. The rest is
    mostly LEA, SUB, AND, ADD and SHR by constant.

    char *utoa(char *buf, unsigned long n)
    {
    __asm {
    mov ebx,[n]
    mov eax,2814749767

    mul ebx // Reciprocal (2^n / 1e5) MUL

    shr ebx,1
    xor ecx,ecx

    mov edi,[buf]
    add eax,ebx

    adc ecx,edx
    mov eax,100000

    shr ecx,16 // ECX = high part
    mov ebx,[n]

    imul eax,ecx // High part * 100k

    sub ebx,eax // Low part
    mov eax,429497

    mul ecx

    mov ecx,eax
    add dl,'0'

    mov eax,429497
    mov [edi],dl

    mul ebx

    mov ebx,eax
    add ecx,7

    shr ecx,3
    add dl,'0'

    mov [edi+5],dl
    add ebx,7

    shr ebx,3
    lea ecx,[ecx+ecx*4]

    mov edx,ecx
    and ecx,0fffffffh

    shr edx,28
    lea ebx,[ebx+ebx*4]

    add dl,'0'
    mov eax,ebx

    shr eax,28
    mov [edi+1],dl

    and ebx,0fffffffh
    add al,'0'

    mov [edi+6],al
    lea ecx,[ecx+ecx*4]

    lea ebx,[ebx+ebx*4]
    mov edx,ecx

    mov eax,ebx
    and ecx,07ffffffh

    shr edx,27
    and ebx,07ffffffh

    shr eax,27
    add dl,'0'

    add al,'0'
    mov [edi+2],dl

    mov [edi+7],al
    lea ecx,[ecx+ecx*4]

    lea ebx,[ebx+ebx*4]
    mov edx,ecx

    mov eax,ebx
    and ecx,03ffffffh

    shr edx,26
    and ebx,03ffffffh

    shr eax,26
    add dl,'0'

    add al,'0'
    mov [edi+3],dl

    mov [edi+8],al
    lea ecx,[ecx+ecx*4]

    shr ecx,25
    lea ebx,[ebx+ebx*4]

    shr ebx,25
    add cl,'0'

    add bl,'0'
    mov [edi+10],ah

    mov [edi+4],cl
    mov [edi+9],bl
    }

    return buf;
    }

    Terje

    --
    - <Terje.Mathisen@hda.hydro.com>
    "almost all programming can be viewed as an exercise in caching"


  9. Default Re: Hex to ascii


    Terje Mathisen wrote:
    > randyhyde@earthlink.net wrote:
    > > Terje Mathisen wrote:
    > >> Huh?

    > > Well, if it were only three MUL instructions I'd probably be okay with
    > > that. But it took more. And I still had accuracy problems that I had to

    >
    > Aha!
    >
    > You used reciprocal MUL as a simple substitution for DIV, and you
    > probably didn't do the full error ****ysis needed to prove how to get
    > exact results either?


    Oh, I did the ****ysis and found that in the 10th digit I was getting
    only 3.25 bits of precision. That's why I added a ton of extra code and
    did a 64-bit multiply to overcome the accuracy problems.

    >
    > Anyway, my algorithm is totally different: It splits the 32-bit binary
    > number into two 5-digit (decimal) halfs, using a reciprocal
    > multiplication by 1/100000.'


    Okay, I'll try that and see how it works.
    Cheers,
    Randy Hyde


  10. Default Re: Hex to ascii

    randyhyde@earthlink.net <spamtrap@crayne.org> wrote in part:
    > But if anyone has a 32-bit integer to decimal string
    > conversion routine that uses MUL for the division operation,
    > I'd love to see it. It's not like I'm happy with the repeated
    > subtraction approach, just that the repeated subtraction
    > approach is the fastest I've created so far... Cheers,


    Why not let the firmware do the hard work: FILD / FBSTP ?

    -- Robert


+ Reply to Thread
Page 2 of 4 FirstFirst 1 2 3 4 LastLast

Similar Threads

  1. How to FTP a ASCII file
    By Application Development in forum Python
    Replies: 4
    Last Post: 07-07-2007, 07:36 AM
  2. jed: non-ASCII puzzles
    By Application Development in forum Editors
    Replies: 6
    Last Post: 02-23-2007, 11:42 PM
  3. Ascii to hex? Thank you!
    By Application Development in forum Perl
    Replies: 0
    Last Post: 10-12-2003, 08:06 PM
  4. Ascii to hex?
    By Application Development in forum Perl
    Replies: 4
    Last Post: 10-12-2003, 08:06 PM