Thoughts on the C/Assembly Debate

This is a discussion on Thoughts on the C/Assembly Debate within the Other Technologies forums in category; There may come a day when a compiled language can consistently beat a good assembly programmer, but that language will never be C. It will be a new language that has a radically new, high level way of specifying the algorithm to the compiler, and that has complete control over the generated code, including data structures. To make my point, take a simple example. Consider a fictitious processor that has an architecture with an indirect addressing mode. That is, the address can be loaded in a register, and that register be used to generate the address for data used in ...

Go Back   Application Development Forum > Other Technologies

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 07-28-2007, 10:18 AM
Randy Yates
Guest
 
Default Thoughts on the C/Assembly Debate

There may come a day when a compiled language can consistently beat a
good assembly programmer, but that language will never be C. It will
be a new language that has a radically new, high level way of specifying the
algorithm to the compiler, and that has complete control over the generated
code, including data structures.

To make my point, take a simple example.

Consider a fictitious processor that has an architecture with an indirect
addressing mode. That is, the address can be loaded in a register, and
that register be used to generate the address for data used in other instructions.

Now let's also posit that the processor can post-increment-by-one the
address registers used for indirect mode for free, but post-increment
by-two (or other higher values) is not free - i.e., it cost more
cycles.

Now let's say the algorithm to be coded is to sum the real and imaginary
parts of a complex array.

Our programmer, being ignorant of the processor architecture details,
writes the following code:

typedef struct
{
int16_t real;
int16_t imag;
} COMPLEX_T;

#define N 1024;

COMPLEX_T data[N];

COMPLEX_T MyAlgorithm(void)
{
uint32_t n;
COMPLEX_T sum;

sum.real = 0;
sum.imag = 0;
for (n = 0; n < N; n++)
{
sum.real += data[n].real;
sum.imag += data[n].imag;
}

return sum;
}

Well, this is not optimized! Never can be, because C can't restructure
the "data[]" array to put the real and imaginary parts into individual,
continuous arrays.

Also note that if the programmer were to realize this is a problem and
manually reorganize the data structure, then the resulting code
becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another
processor using this new data structure may not be optimal.

Also note that requiring the programmer to know the processor architecture
defeats, at least in part, the point of a high-level-language since time
is spent learning things that ideally would be handled by the compiler.

Note that this is absolutely irrefutable. The C language (and every other
computer language that I know of) is not capable of optimizing this BECAUSE
IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE.
--
% Randy Yates % "She tells me that she likes me very much,
%% Fuquay-Varina, NC % but when I try to touch, she makes it
%%% 919-577-9882 % all too clear."
%%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr
Reply With Quote
  #2  
Old 07-28-2007, 11:12 AM
Guest
 
Default Re: Thoughts on the C/Assembly Debate

Check Intel's marketing for its C/C++ compiler.
I'm sure in there somewhere is how it detects
SOA/AOS requirements and reorders data, and
vectors it all using SSE2/3/4 for best effect.
I'm almost as sure that MS/VS will do the same.
All this even without #pragma hints. Since so
few will ever know asm well enough to beat a
current compiler, guru code (potentially) being
better than compiler code is pretty much moot.
Besides, this is more about knowing how to order
data, not a compiler/assembler thing. Had to yawn

RY- [Sat, 28 Jul 2007 10:18:04 -0400]:
>Now let's say the algorithm to be coded is to sum the real and imaginary
>parts of a complex array.
>typedef struct>{> int16_t real;> int16_t imag;>} COMPLEX_T;
>#define N 1024;
>COMPLEX_T data[N];
>Well, this is not optimized! Never can be, because C can't restructure
>the "data[]" array to put the real and imaginary parts into individual,
>continuous arrays.


--
40th Floor - Software @ http://40th.com/
iplay.40th.com iPlay advanced audio player
Reply With Quote
  #3  
Old 07-28-2007, 11:16 AM
Brad Griffis
Guest
 
Default Re: Thoughts on the C/Assembly Debate

Randy Yates wrote:
> Also note that if the programmer were to realize this is a problem and
> manually reorganize the data structure, then the resulting code
> becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another
> processor using this new data structure may not be optimal.
>
> Also note that requiring the programmer to know the processor architecture
> defeats, at least in part, the point of a high-level-language since time
> is spent learning things that ideally would be handled by the compiler.
>
> Note that this is absolutely irrefutable. The C language (and every other
> computer language that I know of) is not capable of optimizing this BECAUSE
> IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE.


Randy,

I think that this debate can actually be thought of in a slightly more
general way. That is, rather than being the "C vs assembly debate" I
think you could think of this as the "Optimization vs Portability
debate". This would encompass not only C vs assembly, but also things
like drivers vs "integrated" code or OS vs simple scheduling (ISRs).

In general you can get more optimized code by keeping things simpler. I
think that the simplest and most optimized you can get is keeping
everything in assembly and handling everything through interrupts and a
background loop. However, as complexity continues to grow we need to
have things like C code, OSes, drivers, etc. or else we'd be forever
re-coding the same things.

I think that it is not an easy task to maintain a healthy equilibrium
between portability and optimization. You get one extreme where even
the simplest of tasks go through 10 layers of software just to read a
register. On the other extreme you might have assembly code that needs
to be completely rewritten in order to move to a different procesor. I
think maintaining this equilibrium is one of the toughest challenges
facing programmers, particularly in the embedded world where cycles and
memory are at a premium.

Brad
Reply With Quote
  #4  
Old 07-28-2007, 11:56 AM
Stefan Reuther
Guest
 
Default Re: Thoughts on the C/Assembly Debate

Hi there,

[excessive Newsgroups line shortened]

Randy Yates wrote:
> Now let's also posit that the processor can post-increment-by-one the
> address registers used for indirect mode for free, but post-increment
> by-two (or other higher values) is not free - i.e., it cost more
> cycles.

[snip]
> Our programmer, being ignorant of the processor architecture details,
> writes the following code:

[snip]
> Well, this is not optimized! Never can be, because C can't restructure
> the "data[]" array to put the real and imaginary parts into individual,
> continuous arrays.


But the programmer can (and, if that's the bottleneck, will) do that.

With C, he still has the advantage that he can take the final code and
run it unmodified on his fast, well-equipped PC. So he can make huge
testsuites of vectors to sum up, and let them run automatically and
quickly. I'm doing that all day long.

Sure, I could use a simulator for my target code. Ours does a mere 10k
instructions per second. But I've got other things to do than to wait
quarter of an hour until it finished clearing the .bss, let alone
decoded my 3 MB MPEG file.

> Also note that if the programmer were to realize this is a problem and
> manually reorganize the data structure, then the resulting code
> becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another
> processor using this new data structure may not be optimal.


It may not be optimal for the new processor, but at least it's correct
in the first place. If it's not fast enough, you can still (or: again)
modify it for the new one. But if it is fast enough, you don't have to
touch it. With assembly, you'd have to rewrite and test it from scratch.

De facto, most code is a little processor-specific anyway - assumptions
about endianness, sizeof(int), or even small things such that a
floating-point multiply is faster than a divide, or a 'short' variable
is faster (chip with 16-bit bus) / slower (x86) than an 'int' variable.

> Also note that requiring the programmer to know the processor architecture
> defeats, at least in part, the point of a high-level-language since time
> is spent learning things that ideally would be handled by the compiler.


Actually, every high-level language programmer *should* know some basics
about the architecture he's targeting.


Stefan

Reply With Quote
  #5  
Old 07-28-2007, 12:27 PM
Logan Shaw
Guest
 
Default Re: Thoughts on the C/Assembly Debate

Randy Yates wrote:
> Now let's also posit that the processor can post-increment-by-one the
> address registers used for indirect mode for free, but post-increment
> by-two (or other higher values) is not free - i.e., it cost more
> cycles.


> typedef struct
> {
> int16_t real;
> int16_t imag;
> } COMPLEX_T;


> COMPLEX_T MyAlgorithm(void)
> {
> uint32_t n;
> COMPLEX_T sum;
>
> sum.real = 0;
> sum.imag = 0;
> for (n = 0; n < N; n++)
> {
> sum.real += data[n].real;
> sum.imag += data[n].imag;
> }
>
> return sum;
> }
>
> Well, this is not optimized! Never can be, because C can't restructure
> the "data[]" array to put the real and imaginary parts into individual,
> continuous arrays.


I think a clever C compiler could optimize the code to use your cheap
increment-by-one instruction. Since n is a local variable with no
pointers to it, and since it is never written inside the loop, the
compilers is free to notice that you are really just accessing memory
in a totally sequential way and transform your loop to this equivalent:

int16_t *p;
p = &( data[0].real );
two_N = 2 * N;
n = 0;
while (n < two_N) {
sum.real += p[n++];
sum.imag += p[n++];
}

Note that it would be pretty bad coding style for a human programmer to
write this in C, but since the compiler is in control of the implementation
details (like the way structs are padded and aligned) it's perfectly fine
for it to make transformations like this.

The only negative I can see with this is that it requires one extra
register than the version that would use parallel arrays (because you
have to keep two continuous running totals instead of one). That
should not be a big limitation in most cases.

I'm not saying compilers actually do this (although they may). I'm just
disagreeing with your claim that the generated code cannot ever be
optimized to use the post-increment operator. C doesn't *need* to
restructure the array to optimize the code!

Also, as someone else pointed out, you are blurring the differences between
architecture-specific code and portable code with the differences between
C and assembly. It is possible (and sometimes beneficial) to write C code
with a specific platform's performance in mind. (And if you want, you can
even write that code so that it will still run on other architectures.)

In the end, what it comes down to is compromises between programmers' time
and performance of the program. Since assembly gives you absolute control
over the instructions that are run, you have more flexibility to squeeze
out a tiny bit of extra performance in cases where the compiler doesn't do
the right thing. Compilers these days to the right thing a LOT of the time,
so this would be rare, but I can't deny that it must happen now and then
that the compiler generates code that a very knowledgeable person could
improve upon. The question is whether it's worth the time necessary to
do it. You generally have to have a very serious need for performance for
it to be worth it.

- Logan
Reply With Quote
  #6  
Old 07-28-2007, 12:47 PM
Tim Prince
Guest
 
Default Re: Thoughts on the C/Assembly Debate

hel@40th.com wrote:
> Check Intel's marketing for its C/C++ compiler.
> I'm sure in there somewhere is how it detects
> SOA/AOS requirements and reorders data, and
> vectors it all using SSE2/3/4 for best effect.
> I'm almost as sure that MS/VS will do the same.

If you checked the marketing, you might have seen that Intel C++
implements complex data type auto-vectorization for SSE3, but requires
C99 syntax for that. Since you are cross-posting to c.l.f, you might
consider that some people would prefer to use Fortran rather than less
optimal asm hacks. Sorry if you are put off by flashbacks to the days
when people called Fortran functions from COBOL rather than do asm.
Beats me why people would rather use asm than embed C99 in C++ or C89.
Don't know whether you consider the <?intrin.h> stuff to be assembly.
Reply With Quote
  #7  
Old 07-28-2007, 04:25 PM
gyansorova@gmail.com
Guest
 
Default Re: Thoughts on the C/Assembly Debate

On Jul 29, 2:18 am, Randy Yates <ya...@ieee.org> wrote:
> There may come a day when a compiled language can consistently beat a
> good assembly programmer, but that language will never be C. It will
> be a new language that has a radically new, high level way of specifying the
> algorithm to the compiler, and that has complete control over the generated
> code, including data structures.
>
> To make my point, take a simple example.
>
> Consider a fictitious processor that has an architecture with an indirect
> addressing mode. That is, the address can be loaded in a register, and
> that register be used to generate the address for data used in other instructions.
>
> Now let's also posit that the processor can post-increment-by-one the
> address registers used for indirect mode for free, but post-increment
> by-two (or other higher values) is not free - i.e., it cost more
> cycles.
>
> Now let's say the algorithm to be coded is to sum the real and imaginary
> parts of a complex array.
>
> Our programmer, being ignorant of the processor architecture details,
> writes the following code:
>
> typedef struct
> {
> int16_t real;
> int16_t imag;
>
> } COMPLEX_T;
>
> #define N 1024;
>
> COMPLEX_T data[N];
>
> COMPLEX_T MyAlgorithm(void)
> {
> uint32_t n;
> COMPLEX_T sum;
>
> sum.real = 0;
> sum.imag = 0;
> for (n = 0; n < N; n++)
> {
> sum.real += data[n].real;
> sum.imag += data[n].imag;
> }
>
> return sum;
>
> }
>
> Well, this is not optimized! Never can be, because C can't restructure
> the "data[]" array to put the real and imaginary parts into individual,
> continuous arrays.
>
> Also note that if the programmer were to realize this is a problem and
> manually reorganize the data structure, then the resulting code
> becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another
> processor using this new data structure may not be optimal.
>
> Also note that requiring the programmer to know the processor architecture
> defeats, at least in part, the point of a high-level-language since time
> is spent learning things that ideally would be handled by the compiler.
>
> Note that this is absolutely irrefutable. The C language (and every other
> computer language that I know of) is not capable of optimizing this BECAUSE
> IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE.
> --
> % Randy Yates % "She tells me that she likes me very much,
> %% Fuquay-Varina, NC % but when I try to touch, she makes it
> %%% 919-577-9882 % all too clear."
> %%%% <ya...@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr


I have a feeling that data-flow is the answer. Check out the latest
LabView. You must think differently.Sequencial code will die.

Reply With Quote
  #8  
Old 07-28-2007, 04:50 PM
Jerry Avins
Guest
 
Default Re: Thoughts on the C/Assembly Debate

gyansorova@gmail.com wrote:
> On Jul 29, 2:18 am, Randy Yates <ya...@ieee.org> wrote:
>> There may come a day when a compiled language can consistently beat a
>> good assembly programmer, but that language will never be C. It will
>> be a new language that has a radically new, high level way of specifying the
>> algorithm to the compiler, and that has complete control over the generated
>> code, including data structures.
>>
>> To make my point, take a simple example.
>>
>> Consider a fictitious processor that has an architecture with an indirect
>> addressing mode. That is, the address can be loaded in a register, and
>> that register be used to generate the address for data used in other instructions.
>>
>> Now let's also posit that the processor can post-increment-by-one the
>> address registers used for indirect mode for free, but post-increment
>> by-two (or other higher values) is not free - i.e., it cost more
>> cycles.
>>
>> Now let's say the algorithm to be coded is to sum the real and imaginary
>> parts of a complex array.
>>
>> Our programmer, being ignorant of the processor architecture details,
>> writes the following code:
>>
>> typedef struct
>> {
>> int16_t real;
>> int16_t imag;
>>
>> } COMPLEX_T;
>>
>> #define N 1024;
>>
>> COMPLEX_T data[N];
>>
>> COMPLEX_T MyAlgorithm(void)
>> {
>> uint32_t n;
>> COMPLEX_T sum;
>>
>> sum.real = 0;
>> sum.imag = 0;
>> for (n = 0; n < N; n++)
>> {
>> sum.real += data[n].real;
>> sum.imag += data[n].imag;
>> }
>>
>> return sum;
>>
>> }
>>
>> Well, this is not optimized! Never can be, because C can't restructure
>> the "data[]" array to put the real and imaginary parts into individual,
>> continuous arrays.
>>
>> Also note that if the programmer were to realize this is a problem and
>> manually reorganize the data structure, then the resulting code
>> becomes SPECIFIC TO THE PROCESSOR. Targeting the code for another
>> processor using this new data structure may not be optimal.
>>
>> Also note that requiring the programmer to know the processor architecture
>> defeats, at least in part, the point of a high-level-language since time
>> is spent learning things that ideally would be handled by the compiler.
>>
>> Note that this is absolutely irrefutable. The C language (and every other
>> computer language that I know of) is not capable of optimizing this BECAUSE
>> IT DOESN'T HAVE CONTROL OVER DATA ALIGNMENT AND STRUCTURE.
>> --
>> % Randy Yates % "She tells me that she likes me very much,
>> %% Fuquay-Varina, NC % but when I try to touch, she makes it
>> %%% 919-577-9882 % all too clear."
>> %%%% <ya...@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr

>
> I have a feeling that data-flow is the answer. Check out the latest
> LabView. You must think differently.Sequencial code will die.


Until processors routinely execute and n instructions in parallel, all
code will be executed sequentially. You might as well write it that way.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply With Quote
  #9  
Old 07-28-2007, 05:51 PM
christian.bau
Guest
 
Default Re: Thoughts on the C/Assembly Debate

On Jul 28, 3:18 pm, Randy Yates <ya...@ieee.org> wrote:

> There may come a day when a compiled language can consistently beat a
> good assembly programmer, but that language will never be C. It will
> be a new language that has a radically new, high level way of specifying the
> algorithm to the compiler, and that has complete control over the generated
> code, including data structures.


One mistake that people often make is trying to compare for example
the efficiency of C code vs. assembler code for the same problem. This
is not the right comparison. You have to compare C vs. assembler code,
assuming that two equally competent programmers spend the same amount
of time writing the code.

I can write assembler code that is faster than equivalent C code. If
it takes me one day to write the C code, it takes me a week to write
the assembler code. In that week, I could have written much faster C
code. In two months, I could write equivalent assembler code that is
faster. In the same two months, I could write faster C code.


> To make my point, take a simple example.
>
> Consider a fictitious processor that has an architecture with an indirect
> addressing mode. That is, the address can be loaded in a register, and
> that register be used to generate the address for data used in other instructions.
>
> Now let's also posit that the processor can post-increment-by-one the
> address registers used for indirect mode for free, but post-increment
> by-two (or other higher values) is not free - i.e., it cost more
> cycles.
>
> Now let's say the algorithm to be coded is to sum the real and imaginary
> parts of a complex array.
>
> Our programmer, being ignorant of the processor architecture details,
> writes the following code:
>
> typedef struct
> {
> int16_t real;
> int16_t imag;
>
> } COMPLEX_T;
>
> #define N 1024;
>
> COMPLEX_T data[N];
>
> COMPLEX_T MyAlgorithm(void)
> {
> uint32_t n;
> COMPLEX_T sum;
>
> sum.real = 0;
> sum.imag = 0;
> for (n = 0; n < N; n++)
> {
> sum.real += data[n].real;
> sum.imag += data[n].imag;
> }
>
> return sum;
>
> }
>
> Well, this is not optimized! Never can be, because C can't restructure
> the "data[]" array to put the real and imaginary parts into individual,
> continuous arrays.


You're on a loser here. A decent compiler will easily figure out that
data [n].real, data [n].imag, data [n+1].real etc. are all at
consecutive addresses. So it transforms this to

int16_t *p = &data[n].real;
for (n = 0; n < N; n++)
{
sum.real += *p++;
sum.imag += *p++;
}



Reply With Quote
  #10  
Old 07-28-2007, 10:36 PM
Logan Shaw
Guest
 
Default Re: Thoughts on the C/Assembly Debate

Jerry Avins wrote:
> Until processors routinely execute and n instructions in parallel, all
> code will be executed sequentially. You might as well write it that way.


n had been 1 for decades. For most new hardware, it's now 2, and it's
already starting to change to 4.

- Logan
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 12:45 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.