| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#11
| |||
| |||
| On Jul 28, 4:50 pm, Jerry Avins <j...@ieee.org> wrote: > Until processors routinely execute and n instructions in parallel, all > code will be executed sequentially. You might as well write it that way. and if instructions have conditionals in them, those would have to be executed sequentially, despite the brawn, no? r b-j |
|
#12
| |||
| |||
| Jerry Avins wrote: ... > Until processors routinely execute and n instructions in parallel ... Until processors routinely execute any n instructions in parallel ... ^ Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ |
|
#13
| |||
| |||
| Logan Shaw wrote: > Jerry Avins wrote: >> Until processors routinely execute and n instructions in parallel, all any >> code will be executed sequentially. You might as well write it that way. > > n had been 1 for decades. For most new hardware, it's now 2, and it's > already starting to change to 4. But mostly it's not any 2, only certain built-in combinations. And a stall in one if the execution units holds them all up. We'll think of the present efforts as quite primitive if those deficiencies are ever ironed out. Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ |
|
#14
| |||
| |||
| On 28 Jul, 16:18, Randy Yates <ya...@ieee.org> wrote: > There may come a day when a compiled language can consistently beat a > good assembly programmer, but that language will never be C. It will > be a new language that has a radically new, high level way of specifying the > algorithm to the compiler, and that has complete control over the generated > code, including data structures. > > To make my point, take a simple example. > > Consider a fictitious processor that has an architecture with an indirect > addressing mode. That is, the address can be loaded in a register, and > that register be used to generate the address for data used in other instructions. > > Now let's also posit that the processor can post-increment-by-one the > address registers used for indirect mode for free, but post-increment > by-two (or other higher values) is not free - i.e., it cost more > cycles. > > Now let's say the algorithm to be coded is to sum the real and imaginary > parts of a complex array. > > Our programmer, being ignorant of the processor architecture details, > writes the following code: > > typedef struct > { > int16_t real; > int16_t imag; > > } COMPLEX_T; .... > Well, this is not optimized! Never can be, because C can't restructure > the "data[]" array to put the real and imaginary parts into individual, > continuous arrays. Why do you claim that real and imaginary parts in spearate arrays is optimal? What about cache misses? Actually, lots of programmers (me among them) prefer to remain ignorant about processor details. That being said, I did take a basic course in x86 assembly programming some 15 years ago, so I know the fundamentals of how a CPU and a computer works. > Also note that if the programmer were to realize this is a problem and > manually reorganize the data structure, then the resulting code > becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another > processor using this new data structure may not be optimal. Of course. That's the whole point of C, isn't it? A programming tool that is starndardized to make it portable, AND generates "sufficiently" efficient code? It's the programmer's choise of where he wants to spend the effort: - A portable C code which compiles on all standard-compliant platforms, but might introduce, say, 50 - 80% run-time overhead - A near-optimal assembly code which has to be hand-crafted for all relevant platforms. Everybody knows about these tradeoffs; it's a matter of resources and eceonomy which approach to choose. That same trade-off, only at a higher level, is the reason why matlab is so popular. It takes no time to get some code up and working, code which works on everywhere matlab works, at the cost of appaling run time. > Also note that requiring the programmer to know the processor architecture > defeats, at least in part, the point of a high-level-language since time > is spent learning things that ideally would be handled by the compiler. > > Note that this is absolutely irrefutable. The C language (and every other > computer language that I know of) is not capable of optimizing this BECAUSE > IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE. It's the job of the programmer to keep track of those sorts of things, isn't it? The alternative is to implement standard libraries which take care of standard functionality, which is the way C and C++ handle things. That introduces some compiler and run-time overhead which is likely too expensive if the alternative is to program assembly. You might want to have a look at C++ template metaprogramming. I am just learning that stuff myself, so I don't want to get into too many details, but it turns out that it is possible to write C++ code that is compiles to near-optimal code which ought to be able to approach the efficiency of hand-crafted assembly code in certain applications. What little I have seen of C++ metaprogramming blows the socks of anything coded in plain C. I've had too little time to learn how to exploit it, though. Rune |
|
#15
| |||
| |||
| Randy Yates <yates@ieee.org> writes: > Now let's also posit that the processor can post-increment-by-one the > address registers used for indirect mode for free, but post-increment > by-two (or other higher values) is not free - i.e., it cost more > cycles. .... > for (n = 0; n < N; n++) > { > sum.real += data[n].real; > sum.imag += data[n].imag; > } > > return sum; > } > > Well, this is not optimized! Never can be, because C can't restructure > the "data[]" array to put the real and imaginary parts into individual, > continuous arrays. > > Also note that if the programmer were to realize this is a problem and > manually reorganize the data structure, then the resulting code > becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another > processor using this new data structure may not be optimal. That's the whole point of a compiler. You write portably, the compiler maps it to your chosen target for you. > Also note that requiring the programmer to know the processor architecture > defeats, at least in part, the point of a high-level-language since time > is spent learning things that ideally would be handled by the compiler. > > Note that this is absolutely irrefutable. The C language (and every other > computer language that I know of) is not capable of optimizing this BECAUSE > IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE. So you can't see a compiler using 2 post-increments in that loop then? And if that code is critical, then it's certainly easier to code 2 versions of the module in a high level language than it is several dozen versions in architecture-specific asssembly. I usually write two versions of all my speed critical C code. One for register-rich architectures, and another for the register-starved toy processors which alas dominate the market. But of course that's the programmer taking control. It appears that that concept offends you when it comes to high level languages. Phil -- "Home taping is killing big business profits. We left this side blank so you can help." -- Dead Kennedys, written upon the B-side of tapes of /In God We Trust, Inc./. |
|
#16
| |||
| |||
| Randy Yates wrote: > There may come a day when a compiled language can consistently beat a > good assembly programmer, but that language will never be C. It will > be a new language that has a radically new, high level way of specifying the > algorithm to the compiler, and that has complete control over the generated > code, including data structures. I agree with the description of the functional language that describes a requirement to the compiler and the compiler translates that requirement to this architecture will likely be the one that will beat asm under all implementation conditions. C is an implementation language that implements what is described as efficiently as possible to the target processor as a language does not do the kind of algorithm re-writing suggested in your example. > Consider a fictitious processor that has an architecture with an indirect > addressing mode. That is, the address can be loaded in a register, and > that register be used to generate the address for data used in other instructions. > > Now let's also posit that the processor can post-increment-by-one the > address registers used for indirect mode for free, but post-increment > by-two (or other higher values) is not free - i.e., it cost more > cycles. The better C compilers have become very good at accounting for cycle differences between different instructions and using that information to select implementation tactics. The simplest example is choosing between compiled stack and stack frames for individual functions on some processors depending on whether a function needs to be implemented as re-entrant or not. (This saves a RAM fetch on many processors) This choice is made with static analysis of the application by the compiler. Regards Walter Banks -- Byte Craft Limited Tel. (519) 888-6911 http://www.bytecraft.com walter@bytecraft.com |
|
#17
| |||
| |||
| On Sun, 29 Jul 2007 00:39:24 -0400, Jerry Avins wrote: > Logan Shaw wrote: >> Jerry Avins wrote: >>> Until processors routinely execute and n instructions in parallel, all > any >>> code will be executed sequentially. You might as well write it that way. >> >> n had been 1 for decades. For most new hardware, it's now 2, and it's >> already starting to change to 4. > But mostly it's not any 2, only certain built-in combinations. And a > stall in one if the execution units holds them all up. We'll think of > the present efforts as quite primitive if those deficiencies are ever > ironed out. I think the reference was to multicore processors, not to units within a single core. Each core functions as an independent processor, except that they share a common cache. There are no restrictions on which instructions can be executed in parallel. -- Dave Seaman Oral Arguments in Mumia Abu-Jamal Case heard May 17 U.S. Court of Appeals, Third Circuit <http://www.abu-jamal-news.com/> |
|
#18
| |||
| |||
| On Sun, 29 Jul 2007 02:51:16 -0000, robert bristow-johnson wrote: > On Jul 28, 4:50 pm, Jerry Avins <j...@ieee.org> wrote: >> Until processors routinely execute and n instructions in parallel, all >> code will be executed sequentially. You might as well write it that way. > and if instructions have conditionals in them, those would have to be > executed sequentially, despite the brawn, no? Not exactly. Modern processors perform branch prediction and can conditionally execute instructions in anticipation of a certain branch being taken. -- Dave Seaman Oral Arguments in Mumia Abu-Jamal Case heard May 17 U.S. Court of Appeals, Third Circuit <http://www.abu-jamal-news.com/> |
|
#19
| |||
| |||
| Phil Carmody <thefatphil_demunged@yahoo.co.uk> writes: > [...] > But of course that's the programmer taking control. To some extent it is, but if you need to "take control," and you must delve into architecture specifics and write non-portable C code (at least C code that won't run _fast_ on multiple platforms), then assembly provides the ultimate "control." > It appears that that concept offends you when it comes to high level > languages. I wouldn't use the word "offend" - I consider it an abuse of the high level language. Hey People, this is ludicrous. All these arguments treat assembly as if it were the ultimate evil. It ain't that bad, folks. Honest. -- % Randy Yates % "Midnight, on the water... %% Fuquay-Varina, NC % I saw... the ocean's daughter." %%% 919-577-9882 % 'Can't Get It Out Of My Head' %%%% <yates@ieee.org> % *El Dorado*, Electric Light Orchestra http://home.earthlink.net/~yatescr |
|
#20
| |||
| |||
| Brad Griffis <bradgriffis@hotmail.com> writes: > Randy Yates wrote: >> Also note that if the programmer were to realize this is a problem and >> manually reorganize the data structure, then the resulting code >> becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another >> processor using this new data structure may not be optimal. >> Also note that requiring the programmer to know the processor >> architecture >> defeats, at least in part, the point of a high-level-language since time >> is spent learning things that ideally would be handled by the compiler. >> Note that this is absolutely irrefutable. The C language (and every >> other >> computer language that I know of) is not capable of optimizing this BECAUSE >> IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE. > > Randy, > > I think that this debate can actually be thought of in a slightly more > general way. That is, rather than being the "C vs assembly debate" I > think you could think of this as the "Optimization vs Portability > debate". This would encompass not only C vs assembly, but also things > like drivers vs "integrated" code or OS vs simple scheduling (ISRs). > > In general you can get more optimized code by keeping things simpler. > I think that the simplest and most optimized you can get is keeping > everything in assembly and handling everything through interrupts and > a background loop. However, as complexity continues to grow we need > to have things like C code, OSes, drivers, etc. or else we'd be > forever re-coding the same things. > > I think that it is not an easy task to maintain a healthy equilibrium > between portability and optimization. You get one extreme where even > the simplest of tasks go through 10 layers of software just to read a > register. On the other extreme you might have assembly code that > needs to be completely rewritten in order to move to a different > procesor. I think maintaining this equilibrium is one of the toughest > challenges facing programmers, particularly in the embedded world > where cycles and memory are at a premium. Hi Brad, I agree with you. What I don't understand is why, once the decision has been made that portability/readability/maintainability is to be secondary to performance, assembly is a bad choice. As I just wrote in another thread, it isn't that awful. Is it? I mean, folks here argue as if it takes MAN-YEARS to write a single assembly routine. In any case, I think you bring out a very good point. -- % Randy Yates % "So now it's getting late, %% Fuquay-Varina, NC % and those who hesitate %%% 919-577-9882 % got no one..." %%%% <yates@ieee.org> % 'Waterfall', *Face The Music*, ELO http://home.earthlink.net/~yatescr |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.