Hex to ascii  ASM x86 ASM 370
This is a discussion on Hex to ascii  ASM x86 ASM 370 ; Terje Mathisen wrote:
> JeanFrançois Michaud wrote:
> > Terje Mathisen wrote:
> >> In 32bit mode the difference is even larger, the code AMD "borrowed"
> >> converts a full 32bit unsigned integer to 10 decimal digits in less
...

Re: Hex to ascii
Terje Mathisen wrote:
> JeanFrançois Michaud wrote:
> > Terje Mathisen wrote:
> >> In 32bit mode the difference is even larger, the code AMD "borrowed"
> >> converts a full 32bit unsigned integer to 10 decimal digits in less
> >> time than the cpu needs for a single DIV opcode! :)
> >>
> >> Terje
> >
> > How about a lookup table?
>
> Lookup tables are indeed good here, in fact they should be the obvious
> choice. :)
>
> However, when working with more digits and/or parallel operations,
> straight inline SIMD code tends to beat lookup tables due to the load
> stage bottleneck.
Hmm possibly but thats assuming alot ;). Most processors out there
don't have the SIMD instruction set/registers.
Regards
JeanFrancois Michaud

Re: Hex to ascii
JeanFrançois Michaud wrote:
> > However, when working with more digits and/or parallel operations,
> > straight inline SIMD code tends to beat lookup tables due to the load
> > stage bottleneck.
>
> Hmm possibly but thats assuming alot ;). Most processors out there
> don't have the SIMD instruction set/registers.
Actually, most do. But there are just enough that don't in order to
make it dangerous to use them if you have a universal application.
The experiments I've run lately show a 512byte lookup table (which
translates one byte/two digits at a time) to be the fastest, but I've
not done careful ****ysis to make sure the cache isn't getting polluted
by having such a large table. Bertrand posted an SSE version in the
"faster hex to buffer routine" thread. And I've provided a couple of
other solutions.
Note to Terje  I tried the MUL trick with 32bit integers (converted
to a decimal string) and wound up having to do a *huge* multiplication
to overcome the loss of precision. For two characters (one byte) you
can get away with this, but not for 32bit integers. I'd be interested
in seeing a 32bit conversion to decimal integer string that doesn't
involve at least three MUL instructions. I found out that a repeated
subtraction for each digit turned out to be the fastest solution I
could come up with.
Cheers,
Randy Hyde

Re: Hex to ascii
JeanFrançois Michaud wrote:
> Terje Mathisen wrote:
>> However, when working with more digits and/or parallel operations,
>> straight inline SIMD code tends to beat lookup tables due to the load
>> stage bottleneck.
>
> Hmm possibly but thats assuming alot ;). Most processors out there
> don't have the SIMD instruction set/registers.
I'm not advocating SIMD opcodes, as in MMX/SSE, but working with
multiple values simultaneously in a register that's wider than the basic
data item. :)
Terje

 <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Re: Hex to ascii
randyhyde@earthlink.net wrote:
> Note to Terje  I tried the MUL trick with 32bit integers (converted
> to a decimal string) and wound up having to do a *huge* multiplication
> to overcome the loss of precision. For two characters (one byte) you
> can get away with this, but not for 32bit integers. I'd be interested
> in seeing a 32bit conversion to decimal integer string that doesn't
> involve at least three MUL instructions. I found out that a repeated
> subtraction for each digit turned out to be the fastest solution I
> could come up with.
Huh?
10 digits, each averaging 5 subtractions/branches, and you run faster
than the 3050 cycles of a 3MUL direct conversion?
Terje

 <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Re: Hex to ascii
randyhyde@earthlink.net wrote:
> JeanFrançois Michaud wrote:
> > > However, when working with more digits and/or parallel operations,
> > > straight inline SIMD code tends to beat lookup tables due to the load
> > > stage bottleneck.
> >
> > Hmm possibly but thats assuming alot ;). Most processors out there
> > don't have the SIMD instruction set/registers.
>
>
> Actually, most do. But there are just enough that don't in order to
> make it dangerous to use them if you have a universal application.
Quite interresting. It seems we don't live on the same planet then .
90% of the processors out there don't have those facilities. If you are
talking about desktop then yes I would agree, but most computers are
not desktop computers.
[SNIP]
Regards
JeanFrancois Michaud

Re: Hex to ascii
JeanFrançois Michaud wrote:
> > > Hmm possibly but thats assuming alot ;). Most processors out there
> > > don't have the SIMD instruction set/registers.
> >
> >
> > Actually, most do. But there are just enough that don't in order to
> > make it dangerous to use them if you have a universal application.
>
> Quite interresting. It seems we don't live on the same planet then .
> 90% of the processors out there don't have those facilities. If you are
> talking about desktop then yes I would agree, but most computers are
> not desktop computers.
This is the comp.lang.asm.x86 newsgroup, so we immediately eliminate
from consideration nonx86 processors. While x86 processors *are* used
in embedded applications that don't support SIMD instructions, I still
argue that the majority of x86s in use, do.
Your argument would have more power in ALA, but here one has to assume
that we're talking x86.
Cheers,
Randy Hyde

Re: Hex to ascii
Terje Mathisen wrote:
>
> Huh?
>
> 10 digits, each averaging 5 subtractions/branches, and you run faster
> than the 3050 cycles of a 3MUL direct conversion?
Well, if it were only three MUL instructions I'd probably be okay with
that. But it took more. And I still had accuracy problems that I had to
handtweak for each digit range (and that resulted in several
additional instructions in addition to the MUL). Again, I'd like to see
a good generic MUL solution that works for 32 bits. Maybe my code was
defective, I don't know. But it took a *lot* of work to get it working
correctly for all 4 billion combinations. I wish, now, that I'd kept
the code around rather than throwing it away after determining that the
repeated subtraction worked faster so I could post it here and find out
what was wrong with the approach.
But if anyone has a 32bit integer to decimal string conversion routine
that uses MUL for the division operation, I'd love to see it. It's not
like I'm happy with the repeated subtraction approach, just that the
repeated subtraction approach is the fastest I've created so far...
Cheers,
Randy Hyde

Re: Hex to ascii
randyhyde@earthlink.net wrote:
> Terje Mathisen wrote:
>> Huh?
> Well, if it were only three MUL instructions I'd probably be okay with
> that. But it took more. And I still had accuracy problems that I had to
Aha!
You used reciprocal MUL as a simple substitution for DIV, and you
probably didn't do the full error ****ysis needed to prove how to get
exact results either?
Anyway, my algorithm is totally different: It splits the 32bit binary
number into two 5digit (decimal) halfs, using a reciprocal
multiplication by 1/100000.
The next step is to scale both of these numbers so as to make them a
binary/decimal scaled fraction, with the (scaled) decimal point after
the first digit.
At this point each pair of digits (one from each half) can be retrieved
by masking away the top digit After storing it), multiplying the
fraction by 5 (i.e. 10/2), and repeat the process, with the implied
decimal point one bit further down.
> But if anyone has a 32bit integer to decimal string conversion routine
> that uses MUL for the division operation, I'd love to see it. It's not
> like I'm happy with the repeated subtraction approach, just that the
> repeated subtraction approach is the fastest I've created so far...
Since I invented it, I'll post my original version. As I've stated
previously, you can see AMD's minor tweaks to it in their optimization
manual.
The version below uses 3 MUL operations and a single IMUL. The rest is
mostly LEA, SUB, AND, ADD and SHR by constant.
char *utoa(char *buf, unsigned long n)
{
__asm {
mov ebx,[n]
mov eax,2814749767
mul ebx // Reciprocal (2^n / 1e5) MUL
shr ebx,1
xor ecx,ecx
mov edi,[buf]
add eax,ebx
adc ecx,edx
mov eax,100000
shr ecx,16 // ECX = high part
mov ebx,[n]
imul eax,ecx // High part * 100k
sub ebx,eax // Low part
mov eax,429497
mul ecx
mov ecx,eax
add dl,'0'
mov eax,429497
mov [edi],dl
mul ebx
mov ebx,eax
add ecx,7
shr ecx,3
add dl,'0'
mov [edi+5],dl
add ebx,7
shr ebx,3
lea ecx,[ecx+ecx*4]
mov edx,ecx
and ecx,0fffffffh
shr edx,28
lea ebx,[ebx+ebx*4]
add dl,'0'
mov eax,ebx
shr eax,28
mov [edi+1],dl
and ebx,0fffffffh
add al,'0'
mov [edi+6],al
lea ecx,[ecx+ecx*4]
lea ebx,[ebx+ebx*4]
mov edx,ecx
mov eax,ebx
and ecx,07ffffffh
shr edx,27
and ebx,07ffffffh
shr eax,27
add dl,'0'
add al,'0'
mov [edi+2],dl
mov [edi+7],al
lea ecx,[ecx+ecx*4]
lea ebx,[ebx+ebx*4]
mov edx,ecx
mov eax,ebx
and ecx,03ffffffh
shr edx,26
and ebx,03ffffffh
shr eax,26
add dl,'0'
add al,'0'
mov [edi+3],dl
mov [edi+8],al
lea ecx,[ecx+ecx*4]
shr ecx,25
lea ebx,[ebx+ebx*4]
shr ebx,25
add cl,'0'
add bl,'0'
mov [edi+10],ah
mov [edi+4],cl
mov [edi+9],bl
}
return buf;
}
Terje

 <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Re: Hex to ascii
Terje Mathisen wrote:
> randyhyde@earthlink.net wrote:
> > Terje Mathisen wrote:
> >> Huh?
> > Well, if it were only three MUL instructions I'd probably be okay with
> > that. But it took more. And I still had accuracy problems that I had to
>
> Aha!
>
> You used reciprocal MUL as a simple substitution for DIV, and you
> probably didn't do the full error ****ysis needed to prove how to get
> exact results either?
Oh, I did the ****ysis and found that in the 10th digit I was getting
only 3.25 bits of precision. That's why I added a ton of extra code and
did a 64bit multiply to overcome the accuracy problems.
>
> Anyway, my algorithm is totally different: It splits the 32bit binary
> number into two 5digit (decimal) halfs, using a reciprocal
> multiplication by 1/100000.'
Okay, I'll try that and see how it works.
Cheers,
Randy Hyde

Re: Hex to ascii
randyhyde@earthlink.net <spamtrap@crayne.org> wrote in part:
> But if anyone has a 32bit integer to decimal string
> conversion routine that uses MUL for the division operation,
> I'd love to see it. It's not like I'm happy with the repeated
> subtraction approach, just that the repeated subtraction
> approach is the fastest I've created so far... Cheers,
Why not let the firmware do the hard work: FILD / FBSTP ?
 Robert
Similar Threads

By Application Development in forum Python
Replies: 4
Last Post: 07072007, 07:36 AM

By Application Development in forum Editors
Replies: 6
Last Post: 02232007, 11:42 PM

By Application Development in forum Perl
Replies: 0
Last Post: 10122003, 08:06 PM

By Application Development in forum Perl
Replies: 4
Last Post: 10122003, 08:06 PM