| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#171
| |||
| |||
| <meta.x.gdb@gmail.com> wrote in message news:1187063898.683782.256370@e9g2000prf.googlegro ups.com... > __asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1])); Oh yes, it's much for fashionable to post obfuscated monstrosities such as this than to post clear assembly language code or to look in documentation such as instruction_tables.pdf or 248966.pdf. Take this kind of nonsense to comp.lang.c where it belongs. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end |
|
#172
| |||
| |||
| meta.x.gdb@gmail.com wrote: (snip) > This compiles and runs fine with both Intel and GNU compilers 3.3, > 4.0, etc. > when I compile this and execute under Cygwin (running on Windows XP) > and an AMD 4200+ I get > ./a.exe > ticks per rdtsc 6 > which isn't 1 or 2, but I can live with 6 clock ticks to process a > seldom called op. In all the times I have used RDTSC I don't know that I ever tried to time RDTSC itself. > if I compile and run this under Mac OS X (new Apple MacBookPro) Intel > Core 2 I get 65 ?!?! For IA32 I use it as a function returning (long long), which conveniently has the result already in the right registers. For x86-64, the result is still in EDX:EAX, and so isn't in the appropriate 64 bit register. That won't matter the way you use it, though. > if I compile and run this on Suse Linux on a Xeon processor, I get > 85 ?!?! (Intel or GNU compiler) > I'm not even putting in serializing. does that look right to > anyone ? Can anyone verify they get the same results on their x86 > machines ? This seems to me to be a HUGE amount of overhead for just > a simple instruction. I believe I have heard that it does serialization, or at least can do serialization. I never worried about that, though. -- glen |
|
#173
| |||
| |||
| James Van Buskirk said: > <meta.x.gdb@gmail.com> wrote in message > news:1187063898.683782.256370@e9g2000prf.googlegro ups.com... > >> __asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1])); > > Oh yes, it's much for fashionable to post obfuscated monstrosities > such as this than to post clear assembly language code or to look > in documentation such as instruction_tables.pdf or 248966.pdf. > > Take this kind of nonsense to comp.lang.c where it belongs. Please don't. This kind of nonsense has no place in comp.lang.c either. I'm not quite sure where it does belong, but comp.lang.c will chew it up and spit out the bits. -- Richard Heathfield <http://www.cpax.org.uk> Email: -www. +rjh@ Google users: <http://www.cpax.org.uk/prg/writings/googly.php> "Usenet is a strange place" - dmr 29 July 1999 |
|
#174
| |||
| |||
| that's not C, that's assembly. People here brought up RDTSC as a measurement technique for performance measurement. I'm pointing out there is a serious problem with using RDTSC for measurement, in that the op can be incredibly slow. using "=A" versus "=a" "=d" doesn't change the answer. I have had independent confirmation that Intel P4 chips take about 80 clock cycles, P3 takes 30 clock cycles. athlon64 takes about 6 cycles, The mftbr instruction on Power architectures takes about 2 clock cycles. RDTSC is not serializing, but the form of the loop I created handles this issue. I talked last night with someone at Intel that acknowledged that this could all very well be correct. the RDTSC instruction doesn't have a very high priority at Intel. another possibility is that the RDTSC instruction on Intel *is* serializing.... which explains all the waiting, since it has to completely drain the pipelines for each call, and that can take a lot of cycles on a P4. Brian Van Straalen On Aug 14, 12:04 am, "James Van Buskirk" <not_va...@comcast.net> wrote: > <meta.x....@gmail.com> wrote in message > > news:1187063898.683782.256370@e9g2000prf.googlegro ups.com... > > > __asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1])); > > Oh yes, it's much for fashionable to post obfuscated monstrosities > such as this than to post clear assembly language code or to look > in documentation such as instruction_tables.pdf or 248966.pdf. > > Take this kind of nonsense to comp.lang.c where it belongs. > > -- > write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & > 6.0134700243160014d-154/),(/'x'/)); end |
|
#175
| |||
| |||
| Steve Underwood wrote: (snip) > It might be a little tidier to use: > static __inline__ unsigned long long int rdtsc(void) > { > unsigned long long int x; > > __asm__ volatile ("rdtsc \n\t" : "=A" (x)); > > return x; > } I use: .file "rdtsc.c" .text ..globl rdtsc .type rdtsc, @function rdtsc: rdtsc ret .size rdtsc, .-rdtsc which I get by compiling with -S something like: long long rdtsc() { return 1;} editing the generated .s file, removing things that aren't needed such that the only executable instructions are the rdtsc and ret. This is for IA32, I have a different one for x86-64. For processors with CALL/RET branch prediction logic it should be pretty fast. -- glen |
|
#176
| |||
| |||
| glen herrmannsfeldt wrote: > Steve Underwood wrote: > > (snip) > >> It might be a little tidier to use: > >> static __inline__ unsigned long long int rdtsc(void) >> { >> unsigned long long int x; >> >> __asm__ volatile ("rdtsc \n\t" : "=A" (x)); >> >> return x; >> } > > I use: > > .file "rdtsc.c" > .text > .globl rdtsc > .type rdtsc, @function > rdtsc: > rdtsc > ret > .size rdtsc, .-rdtsc > > which I get by compiling with -S something like: > > long long rdtsc() { return 1;} > > editing the generated .s file, removing things that aren't > needed such that the only executable instructions are the > rdtsc and ret. This is for IA32, I have a different one > for x86-64. > > For processors with CALL/RET branch prediction logic it > should be pretty fast. > > -- glen > That's nasty. What I showed has no call/return overhead at all. It compiles to a single in line instruction. Steve |
|
#177
| |||
| |||
| Steve Underwood wrote: (snip) >>> static __inline__ unsigned long long int rdtsc(void) >>> { >>> unsigned long long int x; >>> __asm__ volatile ("rdtsc \n\t" : "=A" (x)); >>> return x; >>> } >> I use: >> .file "rdtsc.c" >> .text >> .globl rdtsc >> .type rdtsc, @function >> rdtsc: >> rdtsc >> ret >> .size rdtsc, .-rdtsc (snip) > That's nasty. What I showed has no call/return overhead at all. It > compiles to a single in line instruction. Well, that assumes that you want the result in EDX:EAX, otherwise it will take some instructions to get them in the right place. I would not be surprised to see some push/pop while the compiler gets some other values out of the appropriate registers. I have used mine for calls from Java (through JNI) and Fortran, which may not accept inline assembly. -- glen |
|
#178
| |||
| |||
| glen herrmannsfeldt wrote: > Steve Underwood wrote: > > (snip) > >>>> static __inline__ unsigned long long int rdtsc(void) >>>> { >>>> unsigned long long int x; >>>> __asm__ volatile ("rdtsc \n\t" : "=A" (x)); >>>> return x; >>>> } > >>> I use: > >>> .file "rdtsc.c" >>> .text >>> .globl rdtsc >>> .type rdtsc, @function >>> rdtsc: >>> rdtsc >>> ret >>> .size rdtsc, .-rdtsc > > > (snip) > >> That's nasty. What I showed has no call/return overhead at all. It >> compiles to a single in line instruction. > > Well, that assumes that you want the result in EDX:EAX, otherwise > it will take some instructions to get them in the right place. > I would not be surprised to see some push/pop while the compiler > gets some other values out of the appropriate registers. The same is true with your code. You just have the call and return overheads in addition to that. For 32 bit code, things happen to be exactly where you want them, most of the time, to be treated as a uint64_t value. With x86_64 code, you have the annoyance of needing to cook up the 64 bit value from 2 32 bit pieces. Its odd they did not make rdtsc work in a natural way, feeding the 64 bit A register, in the x86_64 instruction set. > > I have used mine for calls from Java (through JNI) and Fortran, > which may not accept inline assembly. Fair enough. Steve |
|
#179
| |||
| |||
| On Sun, 29 Jul 2007 00:45:26 -0400, Jerry Avins <jya@ieee.org> wrote: > Wade Ward wrote: > > "Jerry Avins" <jya@ieee.org> wrote in message <snip> > >> Sure, but why? When a compiler provides for including the programmer's > >> assembly code in the object code, how can an assembler be faster? > > I'm only partially familiar with assembly code. I have never laid eyes on > > anything like a standard for assembly. Is there one? > > C has standard constructs for including assembly instructions in the > generated object code. You need to observe register conventions when you > use it. It is also easy to call an assembly routine as a subroutine or > function. The interface is in the standard. > Not dejure-Standard = in ISO 9899 nor even defacto-standard = everyone does the same thing. Fairly common = most C compilers do it, yes. C++ does reserve the keyword 'asm' for this purpose (Standard C doesn't even do that) and specifies that the asm instruction(s)/info are represented as a string, but their contents are up to the implementation, probably the assembler used. Neither standard directly specifies calling to other languages except C++ calling C, but both usually can be and typically are implemented with simple, obvious calling conventions which can easily be handled in assembly, assuming you can find the documentation. FWIW, Ada (re)uses the named-aggregate syntax it has for several other things, in an implementation-dependent way in a standard place in its (rather extensive) naming hierarchy. And does have a standard syntax to specify (partially implementation-dependent) calling conventions. - formerly david.thompson1 || achar(64) || worldnet.att.net |
|
#180
| |||
| |||
| On Jul 27, 12:19 am, lunamoonm...@gmail.com wrote: > C/C++ speed optimization bible/resources/pointers needed! > [snip] Look at C/C++ Program Perfometer http://sourceforge.net/projects/cpp-perfometer/ The C/C++ Program Perfometer is an open source tool which enables the programmer to measure the comparative performance of a C/C++ program or of separated pieces of code by one of several desired metrics: e.g., time, memory, or metrics defined by the programmer. -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.