| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#101
| |||
| |||
| Chip Eastham wrote: [snip] > I think a common strategy is to push the contents > of some or all registers onto the stack as the > very first step in an inline assembly snippet. > One then pops these values back into the same > registers when the inline assembly code ends. In general it always has been good practice to save the registers on the stack, either when inline code is invoked or when externally compiled asm routines try to communicate with the compiler. The Mac (back in 1984-88 or thereabouts) had a cute way to execute inline code: You loaded the required raw opcodes inside an array and then the interpreter (back then I was working with an interpreted version of Pascal) would do that automatically: It would save all registers except certain ones, save the return address and then jump onto the array code itself(!) When the processor encountered a JSR or RTS inside the array, it popped the return address and the registers and returned to the user. Pretty nifty. > regards, chip -- I.N. Galidakis --- http://ioannis.virtualcomposer2000.com/ |
|
#102
| |||
| |||
| Herman Rubin wrote: > glen herrmannsfeldt wrote: (snip) >>Note that Alpha also doesn't supply a divide instruction, suggesting >>that it can be done by subroutine call. > Or by inlining? Subroutine calls have become the slowest > instructions on many, if not most, machines. This used to > be the opposite, but hardware has made this almost necessarily > the case. The Alpha architecture manual suggests a subroutine. I don't know what it actually looks like, but it does seem that it could be done inline. -- glen |
|
#103
| |||
| |||
| hrubin@odds.stat.purdue.edu (Herman Rubin) writes: > In article <cLCdnYLVPuwgii3bnZ2dnUVZ_gqdnZ2d@comcast.com>, > glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote: .... > >Note that Alpha also doesn't supply a divide instruction, suggesting > >that it can be done by subroutine call. > > Or by inlining? Subroutine calls have become the slowest > instructions on many, if not most, machines. This used to > be the opposite, but hardware has made this almost necessarily > the case. Yes, inlining might be possible too. In particular for divisions by constants (usually a combination of shift-adds to synthesise, or a real multiply by reciprocal scheme). In recent times, however, call instructions generated by things like C compilers have been 100% predictable branches, and their returns have been 100% predictable too. Some processors have even implemented a known-return-address stack which would help the branch predicition unit know to expect such control flows. I'm not saying call/ret is free, but it's not probably less expensive than an unpredicted conditional branch. C++ and VFPs aren't necessarily as predictable though (but can often be faster than the alternatives). Phil -- Dear aunt, let's set so double the killer delete select all. -- Microsoft voice recognition live demonstration |
|
#104
| |||
| |||
| glen herrmannsfeldt <gah@ugcs.caltech.edu> writes: > Herman Rubin wrote: > > glen herrmannsfeldt wrote: > > (snip) > > >>Note that Alpha also doesn't supply a divide instruction, suggesting > >>that it can be done by subroutine call. > > > Or by inlining? Subroutine calls have become the slowest > > instructions on many, if not most, machines. This used to > > be the opposite, but hardware has made this almost necessarily > > the case. > > The Alpha architecture manual suggests a subroutine. > I don't know what it actually looks like, but it does > seem that it could be done inline. I've seen division by 3 as an inline on my alpha. However, I don't know if that counts as a useful data point, as division by a constant is not a binary function, unlike division. Phil -- Dear aunt, let's set so double the killer delete select all. -- Microsoft voice recognition live demonstration |
|
#105
| |||
| |||
| Androcles wrote: > "William Hughes" <wpihughes@hotmail.com> wrote in message > news:1185831281.136214.163410@q75g2000hsh.googlegr oups.com... > : On Jul 30, 4:58 pm, hru...@odds.stat.purdue.edu (Herman Rubin) wrote: > : > In article <1185822779.498334.240...@r34g2000hsd.googlegroups .com>, > : > William Hughes <wpihug...@hotmail.com> wrote: > : > > : > > : > > : > >On Jul 30, 2:02 pm, hru...@odds.stat.purdue.edu (Herman Rubin) wrote: > : > >> In article <1185722378.498383.20...@d55g2000hsg.googlegroups. com>, > : > >> William Hughes <wpihug...@hotmail.com> wrote: > : > >> >On Jul 29, 10:19 am, Randy Yates <ya...@ieee.org> wrote: > : > >> >> Phil Carmody <thefatphil_demun...@yahoo.co.uk> writes: > : > >> >> > [...] > : > >> >> > But of course that's the programmer taking control. > : > >> >> To some extent it is, but if you need to "take control," and > : > >> >> you must delve into architecture specifics and write non-portable > : > >> >> C code (at least C code that won't run _fast_ on multiple > platforms), > : > >> >A far cry from non-portable code. > : > >> Consider the badly designed C function frexp. If executed > : > >> as a function call AT ALL, the function call is likely to > : > >> be more costly in time and space than inlining it using > : > >> assembler instructions, on whatever machine it is to be > : > >> executed. It also should not use the C language; this is > : > >> one clear place where an adequate HLL would have to have > : > >> more than one argument to the left of the replacement call. > : > >I fail to see your point. When the compiler sees the > : > >function frexp it can emit the appropriate machine > : > >code without a function call. > : > >Compilers can always use compiler magic on > : > >a library fuction. [Sure I can force the compiler to use > : > >a function call and this will slow things down. > : > >So what? There are lots of stupid things I can do to > : > >defeat optimization] > : > > : > The form of frexp was b = frexp(a, &m), which > : > meant that, unless a = 0, .5 <= |b| < 1, and > : > a = 2^m*b. The form of this statement forces > : > m into memory. > : > : No, as I point out the fact that it looks like > : m has an address, does not mean that m cannot > : be a register. > : > : > To avoid this, one would have > : > to write > : > > : > b, m = frexp(a) > : > > : > >Or is your point that the compiler cannot store the > : > >exponent in a register. This is simply false. > : > >The abstract machine cannot store the exponent in a register > : > >but the abtract machine does not have the concept of register. > : > > : > Alas, the abstract machine needs the concept of a register. > : > : Nonsense. We obviously do not share the same concept of > : abstract machine. > : > : > > : > >The compiler can use a register for any variable > : > >it wants. (If the address of this variable is taken > : > >then the compiler has to be able to determine that > : > >the address is not actually needed, trivial here). > : > > : > Not if the address of the variable is explicitly given. > : > > : > : If the address of the variable is explicitly given then > : things are not quite as trivial, but the compiler can > : still determine that the address of the variable is only > : used to determine where to store the variable. > : > : > True you > : > > : > >cannot use a "register" qualifier for the variable > : > >used to store the exponent, but in theory we are > : > >allowed to assume that the compiler is as good or better > : > >than a good assembly programmer > : > >at chosing which variable should go into registers > : > >(the fact that this is true in practice is only icing > : > >on the cake) > : > > : > It is true in practice? Not to my knowledge. > : > : Not to my direct knowledge. It is frequently asserted. > : In any case, whether it is true or not in practice is > : beside the point. The point is that it can be considered > : to be true in theory. > : > : > > : > >> Another bad example is the switch operator. It requires > : > >> an argument for the switch process, and also requires a > : > >> return to the switch operator even if one knows where to > : > >> go. In this case, the "argument" should be kept in the > : > >> location ONLY; this removes an argument assignment, using > : > >> a multibranch test on argument, as well as a simple > : > >> transfer when a simple transfer would do as well. This > : > >> CAN be avoided in C by a liberal use of goto's, which > : > >> the HLL advocates demean. > : > >There is nothing to stop the compiler from recognizing > : > >when a switch construct can be simplified in the way > : > >you describe and doing exactly what you describe. > : > >Again you seem to think that if there is an assignment > : > >in the abstract machine, the compiler must use > : > >and assignment. This is nonsense. > : > > : > I doubt that the compiler will be able to recognize that > : > the switch variable is ONLY going to be used in the > : > switch statement, nor will it recognize that a setup > : > like that may call for additional entries from outside > : > the subroutine. > : > : I do not know about existing compilers. There > : is certainly no theoretical barrier > > <syntax error 233> Statement terminator expect. > Intelligent compiler recovered from error. > > > : > I have not noticed additional entries > : > to a function or subroutine in any HLL since early > : > versions of Fortran, and few assemblers even have them. > : > One problem is the return statement. > : > : Explicit entry points are not needed. If the compiler > : can determine that such entry points are useful and > : it can create them. > > <syntax error 234> Unexpected statement terminator. "then" statement > expected after "if" statement followed by "and" statement. > Compilation terminated, unrecoverable error. > > If you can't write normal English syntax how the fuck do you expect a > compiler to translate and assemble your code? I suspect that he doesn't expect any compiler to translate and assembly his English text. |
|
#106
| |||
| |||
| Herman Rubin wrote: > Another bad example is the switch operator. It requires > an argument for the switch process, and also requires a > return to the switch operator even if one knows where to > go. In this case, the "argument" should be kept in the > location ONLY; this removes an argument assignment, using > a multibranch test on argument, as well as a simple > transfer when a simple transfer would do as well. The argument of a switch operator in C is naturally scoped and therefor known to the compiler. Most current implementations use data flow analysis to select one of several code generation sequences. Multibranch is the alternative if other possibilities have failed to be used for some reason. w.. |
|
#107
| |||
| |||
| Phil Carmody wrote: (snip) > I've seen division by 3 as an inline on my alpha. > However, I don't know if that counts as a useful data point, > as division by a constant is not a binary function, > unlike division. Yes, the Alpha manual suggests that integer division by constants can be done with multiply. I believe that is true for most, but not all, integer divisors. All HLLs that I know of, though, allow division by a variable, so the compiler must be able to do that. That is what they suggest doing by subroutine call. -- glen |
|
#108
| |||
| |||
| On Wed, 01 Aug 2007 14:33:41 -0400, Herman Rubin wrote: > I wrote: >>None of that really affects how code is written, much, though. What makes >>you write "parallel" code is the multiple cores or threads that are now >>becoming popular. > > It SHOULD affect how code is written, as many programs which are > efficient on single-stream instructions, or even when all of the > streams which a compiler with serially issued instructions can be > efficiently used, will do very poorly on an SIMD machine. Sorry, I wasn't clear on that topic (I slipped in the bit about SIMD in a later edit). There's not a whole lot that one can do to make use of the multiple parallel scalar function units in the processors I listed. The compiler and processor's schedulers will pack the issue slots as densely as they can, given operand dependencies. (So it's a good idea, now, to code with as few unnecessary dependencies as possible.) SIMD is a different issue, though. Some compilers can get some use out of them by analysing scalar code, but most of the tight SIMD code that I've seen is written explicitly in SIMD intrinsic functions. So you're right: you have to think and code differently to make use of the parallelism in SIMD, and that's a *different* kind of thinking and coding than is necessary to make use of the asynchronous parallelism that one gets from multiple cores or threads. Cheers, -- Andrew |
|
#109
| |||
| |||
| (snip, someone wrote) >>Well, this is not optimized! Never can be, because C can't restructure >>the "data[]" array to put the real and imaginary parts into individual, >>continuous arrays. (snip) >>Note that this is absolutely irrefutable. The C language (and every other >>computer language that I know of) is not capable of optimizing this BECAUSE >>IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE. See: http://gcc.gnu.org/ml/gcc-bugs/2007-03/msg01324.html Some C compilers do change an array of structs to struct of arrays, or the other way around. -- glen |
|
#110
| |||
| |||
| glen herrmannsfeldt <gah@ugcs.caltech.edu> writes: > (snip, someone wrote) >>>Well, this is not optimized! Never can be, because C can't restructure >>>the "data[]" array to put the real and imaginary parts into individual, >>>continuous arrays. > (snip) > >>>Note that this is absolutely irrefutable. The C language (and every other >>>computer language that I know of) is not capable of optimizing this BECAUSE >>>IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE. > > See: http://gcc.gnu.org/ml/gcc-bugs/2007-03/msg01324.html > > Some C compilers do change an array of structs to struct of arrays, > or the other way around. I wonder how you're supposed to debug something like that?! -- % Randy Yates % "My Shangri-la has gone away, fading like %% Fuquay-Varina, NC % the Beatles on 'Hey Jude'" %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Shangri-La', *A New World Record*, ELO http://home.earthlink.net/~yatescr |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.