| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I've seen Microsoft's Visual C++ (2005/2008) compiler generate an FDIVP instruction from this code (see attached) despite enabling the SSE/SSE2 instruction set. The compiler generates very little FPU code (mainly FLD/FSTP in other parts of the code), preferring SSE scalar instructions, but it seems the compiler has hardcoded logic to use FDIV in this situation. ; 126 : QueryPerformanceFrequency( &frequency ); lea eax, DWORD PTR _frequency$[esp+2576] push eax call DWORD PTR __imp__QueryPerformanceFrequency@4 ; 127 : ; 128 : QueryPerformanceCounter( &base ); mov esi, DWORD PTR __imp__QueryPerformanceCounter@4 lea ecx, DWORD PTR _base$[esp+2576] push ecx call esi ; 129 : QueryPerformanceCounter( &elapsed ); lea edx, DWORD PTR _elapsed$[esp+2576] push edx call esi ; 130 : ; 131 : const float delta = float(elapsed.LowPart - base.LowPart); mov eax, DWORD PTR _elapsed$[esp+2576] sub eax, DWORD PTR _base$[esp+2576] ; 132 : display.m_lastDelta[0] = delta / float(frequency.LowPart); mov DWORD PTR tv1109[esp+2576], eax fild DWORD PTR tv1109[esp+2576] test eax, eax jge SHORT $LN184@main fadd DWORD PTR __real@4f800000 $LN184@main: mov ecx, DWORD PTR _frequency$[esp+2576] fild DWORD PTR _frequency$[esp+2576] test ecx, ecx jge SHORT $LN185@main fadd DWORD PTR __real@4f800000 $LN185@main: fdivp ST(1), ST(0) ; x87 division ; 133 : ; 134 : display.m_lastDelta[1] = display.m_lastDelta[3] / display.m_lastDelta[2]; movss xmm0, DWORD PTR _display$[esp+2588] divss xmm0, DWORD PTR _display$[esp+2584] ; SSE division movss DWORD PTR _display$[esp+2580], xmm0 ; store result of SSE division fstp DWORD PTR _display$[esp+2576] ; store result of x87 division Is x87 FDIV assumed to be more accurate than SSE DIVSS? What if I were using MMX in my code? Shouldn't the compiler also generate an EMMS instruction here? - Richard Maudsley |
|
#2
| |||
| |||
| rmaudsley schrieb: > I've seen Microsoft's Visual C++ (2005/2008) compiler generate an > FDIVP instruction from this code (see attached) despite enabling the > SSE/SSE2 instruction set. Have you tried different settings for "Floating Point Model" and "Floating Point Exceptions"? Perhaps this is because of division-by-zero handling. Hendrik |
|
#3
| |||
| |||
| "rmaudsley" <spamtrap@crayne.org> wrote in message news:9d2f6138-5514-4607-a017-4f69c433df0b@m73g2000hsh.googlegroups.com... > I've seen Microsoft's Visual C++ (2005/2008) compiler generate an > FDIVP instruction from this code (see attached) despite enabling the > SSE/SSE2 instruction set. > > The compiler generates very little FPU code (mainly FLD/FSTP in other > parts of the code), preferring SSE scalar instructions, but it seems > the compiler has hardcoded logic to use FDIV in this situation. > > ; 126 : QueryPerformanceFrequency( &frequency ); > > lea eax, DWORD PTR _frequency$[esp+2576] > push eax > call DWORD PTR __imp__QueryPerformanceFrequency@4 > > ; 127 : > ; 128 : QueryPerformanceCounter( &base ); > > mov esi, DWORD PTR __imp__QueryPerformanceCounter@4 > lea ecx, DWORD PTR _base$[esp+2576] > push ecx > call esi > > ; 129 : QueryPerformanceCounter( &elapsed ); > > lea edx, DWORD PTR _elapsed$[esp+2576] > push edx > call esi > > ; 130 : > ; 131 : const float delta = float(elapsed.LowPart - > base.LowPart); > > mov eax, DWORD PTR _elapsed$[esp+2576] > sub eax, DWORD PTR _base$[esp+2576] > > ; 132 : display.m_lastDelta[0] = delta / > float(frequency.LowPart); > > mov DWORD PTR tv1109[esp+2576], eax > fild DWORD PTR tv1109[esp+2576] > test eax, eax > jge SHORT $LN184@main > fadd DWORD PTR __real@4f800000 > $LN184@main: > mov ecx, DWORD PTR _frequency$[esp+2576] > fild DWORD PTR _frequency$[esp+2576] > test ecx, ecx > jge SHORT $LN185@main > fadd DWORD PTR __real@4f800000 > $LN185@main: > fdivp ST(1), > ST(0) ; > x87 division > > ; 133 : > ; 134 : display.m_lastDelta[1] = display.m_lastDelta[3] / > display.m_lastDelta[2]; > > movss xmm0, DWORD PTR _display$[esp+2588] > divss xmm0, DWORD PTR _display$[esp+2584] ; SSE > division > movss DWORD PTR _display$[esp+2580], xmm0 ; store result > of SSE division > fstp DWORD PTR _display$[esp+2576] ; > store result of x87 division > > Is x87 FDIV assumed to be more accurate than SSE DIVSS? It is, right? DIVSS only divides 2 32bit floats, whereas FDIV can divide by a 32bit, 64bit and the Intel manuals suggest it can divide by a 80bit number (if it's on the FP stack only) but isn't very clear about the precision.. I think it could have used DIVSD without much loss of precision > What if I were using MMX in my code? Shouldn't the compiler also > generate an EMMS instruction here? IIRC the MMX convention was that MMX code does the EMMS itself, but I'm not very sure about it |
|
#4
| |||
| |||
| On Aug 7, 7:10*pm, "Harold Aptroot" <spamt...@crayne.org> wrote: > "rmaudsley" <spamt...@crayne.org> wrote in message > > news:9d2f6138-5514-4607-a017-4f69c433df0b@m73g2000hsh.googlegroups.com... > > > > > I've seen Microsoft's Visual C++ (2005/2008) compiler generate an > > FDIVP instruction from this code (see attached) despite enabling the > > SSE/SSE2 instruction set. > > > The compiler generates very little FPU code (mainly FLD/FSTP in other > > parts of the code), preferring SSE scalar instructions, but it seems > > the compiler has hardcoded logic to use FDIV in this situation. > > > ; 126 *: * * QueryPerformanceFrequency( &frequency ); > > > lea eax, DWORD PTR _frequency$[esp+2576] > > push eax > > call DWORD PTR __imp__QueryPerformanceFrequency@4 > > > ; 127 *: > > ; 128 *: * * QueryPerformanceCounter( &base ); > > > mov esi, DWORD PTR __imp__QueryPerformanceCounter@4 > > lea ecx, DWORD PTR _base$[esp+2576] > > push ecx > > call esi > > > ; 129 *: * * QueryPerformanceCounter( &elapsed ); > > > lea edx, DWORD PTR _elapsed$[esp+2576] > > push edx > > call esi > > > ; 130 *: > > ; 131 *: * * const float delta = float(elapsed.LowPart - > > base.LowPart); > > > mov eax, DWORD PTR _elapsed$[esp+2576] > > sub eax, DWORD PTR _base$[esp+2576] > > > ; 132 *: * * display.m_lastDelta[0] = delta / > > float(frequency.LowPart); > > > mov DWORD PTR tv1109[esp+2576], eax > > fild DWORD PTR tv1109[esp+2576] > > test eax, eax > > jge SHORT $LN184@main > > fadd DWORD PTR __real@4f800000 > > $LN184@main: > > mov ecx, DWORD PTR _frequency$[esp+2576] > > fild DWORD PTR _frequency$[esp+2576] > > test ecx, ecx > > jge SHORT $LN185@main > > fadd DWORD PTR __real@4f800000 > > $LN185@main: > > fdivp ST(1), > > ST(0) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ; > > x87 division > > > ; 133 *: > > ; 134 *: * * display.m_lastDelta[1] = display.m_lastDelta[3] / > > display.m_lastDelta[2]; > > > movss xmm0, DWORD PTR _display$[esp+2588] > > divss xmm0, DWORD PTR _display$[esp+2584] * * * * * * * * * * *; SSE > > division > > movss DWORD PTR _display$[esp+2580], xmm0 * * * * * * *; store result > > of SSE division > > fstp DWORD PTR _display$[esp+2576] * * * * * * * * * * * * * * * * ; > > store result of x87 division > > > Is x87 FDIV assumed to be more accurate than SSE DIVSS? > > It is, right? DIVSS only divides 2 32bit floats, whereas FDIV can divide by > a 32bit, 64bit and the Intel manuals suggest it can divide by a 80bit number > (if it's on the FP stack only) but isn't very clear about the precision.. > I think it could have used DIVSD without much loss of precision > > > What if I were using MMX in my code? Shouldn't the compiler also > > generate an EMMS instruction here? > > IIRC the MMX convention was that MMX code does the EMMS itself, but I'm not > very sure about it Hi again, I decided to ditch MMX because it's not supported on x64 (apparently you get 8 additional SSE registers instead). My main concern was not knowing where to place the EMMS instruction. "After each MMX code block" advice is fairly useless if the compiler is going to secretly generate x87 instructions out of your control. I can't see how the fdiv choice was for precision reasons. The fdiv is only a single operation, which means that even if the FPU had been set to use 80bit precision internally, the result would be truncated to single-precision when stored in memory, and there would be no difference to using single precision divss in the first place. I think the extra precision of the x87 FPU (which is set in the x87 control word) only makes a difference if the intermediate results of a calculation are kept in this extended precision format throughout the computation. I tested this recently. By setting the FPU control word to single precision, I found that my calculations were identical to the SSE single-precision instructions, even when using the "double" data type. Regards, - Richard Maudsley |
|
#5
| |||
| |||
| rmaudsley <spamtrap@crayne.org> writes: > I think the extra precision of the x87 FPU (which is set in the x87 > control word) only makes a difference if the intermediate results of a > calculation are kept in this extended precision format throughout the > computation. You mean it's only useful if you use it consistently? Are you surprised by that? > I tested this recently. By setting the FPU control word > to single precision, I found that my calculations were identical to > the SSE single-precision instructions, even when using the "double" > data type. You probably lied to your compiler, or violated some constraint that the compiler required you to follow. All bets are off in that case, and you proved nothing about any compitations, only that you can mislead your compiler. Phil -- The fact that a believer is happier than a sceptic is no more to the point than the fact that a drunken man is happier than a sober one. The happiness of credulity is a cheap and dangerous quality. -- George Bernard Shaw (1856-1950), Preface to Androcles and the Lion |
|
#6
| |||
| |||
| "Phil Carmody" <thefatphil_demunged@yahoo.co.uk> wrote in message news:87r67t80eg.fsf@nonospaz.fatphil.org... > rmaudsley <spamtrap@crayne.org> writes: > > I think the extra precision of the x87 FPU (which is set in the x87 > > control word) only makes a difference if the intermediate results of a > > calculation are kept in this extended precision format throughout the > > computation. > > You mean it's only useful if you use it consistently? Are you > surprised by that? Did his questioning of the compiler's choice to use the lower precision 'fdiv' instruction in the prior snipped paragraph have zero relevance on the following paragraph of his which you quoted? You responded as if so... RP |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.