FDIV and QueryPerformanceFrequency with Visual C++

This is a discussion on FDIV and QueryPerformanceFrequency with Visual C++ within the ASM x86 ASM 370 forums in Programming Languages category; I've seen Microsoft's Visual C++ (2005/2008) compiler generate an FDIVP instruction from this code (see attached) despite enabling the SSE/SSE2 instruction set. The compiler generates very little FPU code (mainly FLD/FSTP in other parts of the code), preferring SSE scalar instructions, but it seems the compiler has hardcoded logic to use FDIV in this situation. ; 126 : QueryPerformanceFrequency( &frequency ); lea eax, DWORD PTR _frequency$[esp+2576] push eax call DWORD PTR __imp__QueryPerformanceFrequency @ 4 ; 127 : ; 128 : QueryPerformanceCounter( &base ); mov esi, DWORD PTR __imp__QueryPerformanceCounter @ 4 lea ecx, DWORD PTR _base$[esp+2576] push ecx call esi ; ...

Go Back   Application Development Forum > Programming Languages > ASM x86 ASM 370

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-07-2008, 12:20 PM
rmaudsley
Guest
 
Default FDIV and QueryPerformanceFrequency with Visual C++

I've seen Microsoft's Visual C++ (2005/2008) compiler generate an
FDIVP instruction from this code (see attached) despite enabling the
SSE/SSE2 instruction set.

The compiler generates very little FPU code (mainly FLD/FSTP in other
parts of the code), preferring SSE scalar instructions, but it seems
the compiler has hardcoded logic to use FDIV in this situation.

; 126 : QueryPerformanceFrequency( &frequency );

lea eax, DWORD PTR _frequency$[esp+2576]
push eax
call DWORD PTR __imp__QueryPerformanceFrequency@4

; 127 :
; 128 : QueryPerformanceCounter( &base );

mov esi, DWORD PTR __imp__QueryPerformanceCounter@4
lea ecx, DWORD PTR _base$[esp+2576]
push ecx
call esi

; 129 : QueryPerformanceCounter( &elapsed );

lea edx, DWORD PTR _elapsed$[esp+2576]
push edx
call esi

; 130 :
; 131 : const float delta = float(elapsed.LowPart -
base.LowPart);

mov eax, DWORD PTR _elapsed$[esp+2576]
sub eax, DWORD PTR _base$[esp+2576]

; 132 : display.m_lastDelta[0] = delta /
float(frequency.LowPart);

mov DWORD PTR tv1109[esp+2576], eax
fild DWORD PTR tv1109[esp+2576]
test eax, eax
jge SHORT $LN184@main
fadd DWORD PTR __real@4f800000
$LN184@main:
mov ecx, DWORD PTR _frequency$[esp+2576]
fild DWORD PTR _frequency$[esp+2576]
test ecx, ecx
jge SHORT $LN185@main
fadd DWORD PTR __real@4f800000
$LN185@main:
fdivp ST(1),
ST(0) ;
x87 division

; 133 :
; 134 : display.m_lastDelta[1] = display.m_lastDelta[3] /
display.m_lastDelta[2];

movss xmm0, DWORD PTR _display$[esp+2588]
divss xmm0, DWORD PTR _display$[esp+2584] ; SSE
division
movss DWORD PTR _display$[esp+2580], xmm0 ; store result
of SSE division
fstp DWORD PTR _display$[esp+2576] ;
store result of x87 division

Is x87 FDIV assumed to be more accurate than SSE DIVSS?

What if I were using MMX in my code? Shouldn't the compiler also
generate an EMMS instruction here?

- Richard Maudsley

Reply With Quote
  #2  
Old 08-07-2008, 12:47 PM
Hendrik van der Heijden
Guest
 
Default Re: FDIV and QueryPerformanceFrequency with Visual C++

rmaudsley schrieb:
> I've seen Microsoft's Visual C++ (2005/2008) compiler generate an
> FDIVP instruction from this code (see attached) despite enabling the
> SSE/SSE2 instruction set.


Have you tried different settings for "Floating Point Model"
and "Floating Point Exceptions"?
Perhaps this is because of division-by-zero handling.


Hendrik

Reply With Quote
  #3  
Old 08-07-2008, 02:10 PM
Harold Aptroot
Guest
 
Default Re: FDIV and QueryPerformanceFrequency with Visual C++

"rmaudsley" <spamtrap@crayne.org> wrote in message
news:9d2f6138-5514-4607-a017-4f69c433df0b@m73g2000hsh.googlegroups.com...
> I've seen Microsoft's Visual C++ (2005/2008) compiler generate an
> FDIVP instruction from this code (see attached) despite enabling the
> SSE/SSE2 instruction set.
>
> The compiler generates very little FPU code (mainly FLD/FSTP in other
> parts of the code), preferring SSE scalar instructions, but it seems
> the compiler has hardcoded logic to use FDIV in this situation.
>
> ; 126 : QueryPerformanceFrequency( &frequency );
>
> lea eax, DWORD PTR _frequency$[esp+2576]
> push eax
> call DWORD PTR __imp__QueryPerformanceFrequency@4
>
> ; 127 :
> ; 128 : QueryPerformanceCounter( &base );
>
> mov esi, DWORD PTR __imp__QueryPerformanceCounter@4
> lea ecx, DWORD PTR _base$[esp+2576]
> push ecx
> call esi
>
> ; 129 : QueryPerformanceCounter( &elapsed );
>
> lea edx, DWORD PTR _elapsed$[esp+2576]
> push edx
> call esi
>
> ; 130 :
> ; 131 : const float delta = float(elapsed.LowPart -
> base.LowPart);
>
> mov eax, DWORD PTR _elapsed$[esp+2576]
> sub eax, DWORD PTR _base$[esp+2576]
>
> ; 132 : display.m_lastDelta[0] = delta /
> float(frequency.LowPart);
>
> mov DWORD PTR tv1109[esp+2576], eax
> fild DWORD PTR tv1109[esp+2576]
> test eax, eax
> jge SHORT $LN184@main
> fadd DWORD PTR __real@4f800000
> $LN184@main:
> mov ecx, DWORD PTR _frequency$[esp+2576]
> fild DWORD PTR _frequency$[esp+2576]
> test ecx, ecx
> jge SHORT $LN185@main
> fadd DWORD PTR __real@4f800000
> $LN185@main:
> fdivp ST(1),
> ST(0) ;
> x87 division
>
> ; 133 :
> ; 134 : display.m_lastDelta[1] = display.m_lastDelta[3] /
> display.m_lastDelta[2];
>
> movss xmm0, DWORD PTR _display$[esp+2588]
> divss xmm0, DWORD PTR _display$[esp+2584] ; SSE
> division
> movss DWORD PTR _display$[esp+2580], xmm0 ; store result
> of SSE division
> fstp DWORD PTR _display$[esp+2576] ;
> store result of x87 division
>
> Is x87 FDIV assumed to be more accurate than SSE DIVSS?


It is, right? DIVSS only divides 2 32bit floats, whereas FDIV can divide by
a 32bit, 64bit and the Intel manuals suggest it can divide by a 80bit number
(if it's on the FP stack only) but isn't very clear about the precision..
I think it could have used DIVSD without much loss of precision

> What if I were using MMX in my code? Shouldn't the compiler also
> generate an EMMS instruction here?


IIRC the MMX convention was that MMX code does the EMMS itself, but I'm not
very sure about it

Reply With Quote
  #4  
Old 09-05-2008, 10:20 AM
rmaudsley
Guest
 
Default Re: FDIV and QueryPerformanceFrequency with Visual C++

On Aug 7, 7:10*pm, "Harold Aptroot" <spamt...@crayne.org> wrote:
> "rmaudsley" <spamt...@crayne.org> wrote in message
>
> news:9d2f6138-5514-4607-a017-4f69c433df0b@m73g2000hsh.googlegroups.com...
>
>
>
> > I've seen Microsoft's Visual C++ (2005/2008) compiler generate an
> > FDIVP instruction from this code (see attached) despite enabling the
> > SSE/SSE2 instruction set.

>
> > The compiler generates very little FPU code (mainly FLD/FSTP in other
> > parts of the code), preferring SSE scalar instructions, but it seems
> > the compiler has hardcoded logic to use FDIV in this situation.

>
> > ; 126 *: * * QueryPerformanceFrequency( &frequency );

>
> > lea eax, DWORD PTR _frequency$[esp+2576]
> > push eax
> > call DWORD PTR __imp__QueryPerformanceFrequency@4

>
> > ; 127 *:
> > ; 128 *: * * QueryPerformanceCounter( &base );

>
> > mov esi, DWORD PTR __imp__QueryPerformanceCounter@4
> > lea ecx, DWORD PTR _base$[esp+2576]
> > push ecx
> > call esi

>
> > ; 129 *: * * QueryPerformanceCounter( &elapsed );

>
> > lea edx, DWORD PTR _elapsed$[esp+2576]
> > push edx
> > call esi

>
> > ; 130 *:
> > ; 131 *: * * const float delta = float(elapsed.LowPart -
> > base.LowPart);

>
> > mov eax, DWORD PTR _elapsed$[esp+2576]
> > sub eax, DWORD PTR _base$[esp+2576]

>
> > ; 132 *: * * display.m_lastDelta[0] = delta /
> > float(frequency.LowPart);

>
> > mov DWORD PTR tv1109[esp+2576], eax
> > fild DWORD PTR tv1109[esp+2576]
> > test eax, eax
> > jge SHORT $LN184@main
> > fadd DWORD PTR __real@4f800000
> > $LN184@main:
> > mov ecx, DWORD PTR _frequency$[esp+2576]
> > fild DWORD PTR _frequency$[esp+2576]
> > test ecx, ecx
> > jge SHORT $LN185@main
> > fadd DWORD PTR __real@4f800000
> > $LN185@main:
> > fdivp ST(1),
> > ST(0) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ;
> > x87 division

>
> > ; 133 *:
> > ; 134 *: * * display.m_lastDelta[1] = display.m_lastDelta[3] /
> > display.m_lastDelta[2];

>
> > movss xmm0, DWORD PTR _display$[esp+2588]
> > divss xmm0, DWORD PTR _display$[esp+2584] * * * * * * * * * * *; SSE
> > division
> > movss DWORD PTR _display$[esp+2580], xmm0 * * * * * * *; store result
> > of SSE division
> > fstp DWORD PTR _display$[esp+2576] * * * * * * * * * * * * * * * * ;
> > store result of x87 division

>
> > Is x87 FDIV assumed to be more accurate than SSE DIVSS?

>
> It is, right? DIVSS only divides 2 32bit floats, whereas FDIV can divide by
> a 32bit, 64bit and the Intel manuals suggest it can divide by a 80bit number
> (if it's on the FP stack only) but isn't very clear about the precision..
> I think it could have used DIVSD without much loss of precision
>
> > What if I were using MMX in my code? Shouldn't the compiler also
> > generate an EMMS instruction here?

>
> IIRC the MMX convention was that MMX code does the EMMS itself, but I'm not
> very sure about it


Hi again,

I decided to ditch MMX because it's not supported on x64 (apparently
you get 8 additional SSE registers instead). My main concern was not
knowing where to place the EMMS instruction. "After each MMX code
block" advice is fairly useless if the compiler is going to secretly
generate x87 instructions out of your control.

I can't see how the fdiv choice was for precision reasons. The fdiv is
only a single operation, which means that even if the FPU had been set
to use 80bit precision internally, the result would be truncated to
single-precision when stored in memory, and there would be no
difference to using single precision divss in the first place.

I think the extra precision of the x87 FPU (which is set in the x87
control word) only makes a difference if the intermediate results of a
calculation are kept in this extended precision format throughout the
computation. I tested this recently. By setting the FPU control word
to single precision, I found that my calculations were identical to
the SSE single-precision instructions, even when using the "double"
data type.

Regards,

- Richard Maudsley

Reply With Quote
  #5  
Old 09-09-2008, 05:39 PM
Phil Carmody
Guest
 
Default Re: FDIV and QueryPerformanceFrequency with Visual C++

rmaudsley <spamtrap@crayne.org> writes:
> I think the extra precision of the x87 FPU (which is set in the x87
> control word) only makes a difference if the intermediate results of a
> calculation are kept in this extended precision format throughout the
> computation.


You mean it's only useful if you use it consistently? Are you
surprised by that?

> I tested this recently. By setting the FPU control word
> to single precision, I found that my calculations were identical to
> the SSE single-precision instructions, even when using the "double"
> data type.


You probably lied to your compiler, or violated some constraint
that the compiler required you to follow. All bets are off in that
case, and you proved nothing about any compitations, only that
you can mislead your compiler.

Phil
--
The fact that a believer is happier than a sceptic is no more to the
point than the fact that a drunken man is happier than a sober one.
The happiness of credulity is a cheap and dangerous quality.
-- George Bernard Shaw (1856-1950), Preface to Androcles and the Lion

Reply With Quote
  #6  
Old 09-09-2008, 06:44 PM
Rod Pemberton
Guest
 
Default Re: FDIV and QueryPerformanceFrequency with Visual C++

"Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message
news:87r67t80eg.fsf@nonospaz.fatphil.org...
> rmaudsley <spamtrap@crayne.org> writes:
> > I think the extra precision of the x87 FPU (which is set in the x87
> > control word) only makes a difference if the intermediate results of a
> > calculation are kept in this extended precision format throughout the
> > computation.

>
> You mean it's only useful if you use it consistently? Are you
> surprised by that?


Did his questioning of the compiler's choice to use the lower precision
'fdiv' instruction in the prior snipped paragraph have zero relevance on the
following paragraph of his which you quoted? You responded as if so...


RP

Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 03:34 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.