Jean-François Michaud wrote:

> > However, when working with more digits and/or parallel operations,

> > straight inline SIMD code tends to beat lookup tables due to the load

> > stage bottleneck.

>

> Hmm possibly but thats assuming alot ;-). Most processors out there

> don't have the SIMD instruction set/registers.

Actually, most do. But there are just enough that don't in order to

make it dangerous to use them if you have a universal application.

The experiments I've run lately show a 512-byte lookup table (which

translates one byte/two digits at a time) to be the fastest, but I've

not done careful ****ysis to make sure the cache isn't getting polluted

by having such a large table. Bertrand posted an SSE version in the

"faster hex to buffer routine" thread. And I've provided a couple of

other solutions.

Note to Terje -- I tried the MUL trick with 32-bit integers (converted

to a decimal string) and wound up having to do a *huge* multiplication

to overcome the loss of precision. For two characters (one byte) you

can get away with this, but not for 32-bit integers. I'd be interested

in seeing a 32-bit conversion to decimal integer string that doesn't

involve at least three MUL instructions. I found out that a repeated

subtraction for each digit turned out to be the fastest solution I

could come up with.

Cheers,

Randy Hyde