| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#21
| |||
| |||
| On Fri, 8 Aug 2008 10:23:19 -0400, Jonah Thomas <jethomas5@gmail.com> wrote: >Back in the old days, we used to claim that ITC was smaller, that this >was one of the advantages of Forth. > >Were we wrong then, or does your result come because the processors have >changed? Both. Modern CPUs tend to be much more compiler friendly. On older CPUs, e.g. 8051 there were very few 16 bit operations and calls cost three bytes rather than two. On more recent 8-bit CPUs, there are more 16 bit operations, e.g. 9S12, and you can probably call an MSP430 a 16 bit CPU. However, with a few notable exceptions, compiling Forth to native code compilation was not a well-known subject. Now, we're applying what we've learnt from VFX on 32 bit CPUs to smaller CPUs. Even an 8051 benefits from a carefully set up code generator. The downside of NCC is that getting a good one right is quite a big job, whereas getting a DTC or ITC Forth up and running is a quick and easy job. Another driver for NCC is that the jobs we're being asked to do are simply more demanding. Even on a 60MHz ARM, I simply would not consider writing a USB stack in a DTC Forth, whereas with VFX Forth it was just another job. Stephen -- Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads |
|
#22
| |||
| |||
| Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >Bernd Paysan <bernd.paysan@gmx.de> wrote: >> Processors have changed, and ITC on 32 bit processors already uses 4 >> bytes per instruction - on 64 bits, native code size doesn't change >> much, but ITC doubles size again. > >Well, hold on a minute. On a 64-bit processor you can use 32-bit >addressing for code, as long as you have less than 4 gigathings of >code. This is a common optimization used by many programming >languages. It's the default for gcc on AMD-64, for example. > >It is not common to use 16-bit code addressing on a 32-bit processor >because no-on wants to be limited to 64 kilothings of code, but the >same reasoning doesn't apply to 64-bit processors. There's no reason >at all to use 64-bit threading on a 64-bit processor. There are a number of reasons: * Nothing at all guarantees that the code is all in the lower 4G of the address space (and on at least one platform it isn't), and the gcc maintainers and others have a tendency to break the gcc behaviour that we rely on; e.g., we used to rely on the code being in the lower 32M on PowerPC, and there is no reason for it not to be there, and it used to be there, and then one day it just was no longer there. I then did a linker script to put it there, but that stopped working a little later (and on reporting this as a bug I learned that one should not use linker scripts or somesuch). * It's just simpler to have a uniform cell size that also covers the threaded code, especially since we need to do this for other 64-bit platforms anyway. * It would buy very little to support 32-bit threaded code on 64-bit platforms. Threaded-code size does not consume much memory there (compared to what's available), and it also does not cause many cache misses. BTW, I just looked at the switch tables generated by gcc, and they use 64-bit entries on AMD64, while according to you they could use 32-bit entries. If, as you say, there's no reason to use 64-bit entries, why do they use them? - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: http://www.forth200x.org/forth200x.html EuroForth 2008: http://www.euroforth.org/ef08.html |
|
#23
| |||
| |||
| Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: > Andrew Haley <andrew29@littlepinkcloud.invalid> writes: > >Bernd Paysan <bernd.paysan@gmx.de> wrote: > >> Processors have changed, and ITC on 32 bit processors already uses 4 > >> bytes per instruction - on 64 bits, native code size doesn't change > >> much, but ITC doubles size again. > > > >Well, hold on a minute. On a 64-bit processor you can use 32-bit > >addressing for code, as long as you have less than 4 gigathings of > >code. This is a common optimization used by many programming > >languages. It's the default for gcc on AMD-64, for example. > > > >It is not common to use 16-bit code addressing on a 32-bit > >processor because no-on wants to be limited to 64 kilothings of > >code, but the same reasoning doesn't apply to 64-bit processors. > >There's no reason at all to use 64-bit threading on a 64-bit > >processor. > There are a number of reasons: > * Nothing at all guarantees that the code is all in the lower 4G of > the address space (and on at least one platform it isn't), and the > gcc maintainers and others > have a tendency to break the gcc behaviour that we rely on; Of course, this isn't something gcc maintainers have any control over. I don't know who "others" may be! > e.g., we used to rely on the code being in the lower 32M on PowerPC, > and there is no reason for it not to be there, and it used to be > there, and then one day it just was no longer there. I then did a > linker script to put it there, but that stopped working a little > later (and on reporting this as a bug I learned that one should not > use linker scripts or somesuch). > * It's just simpler to have a uniform cell size that also covers the > threaded code, especially since we need to do this for other 64-bit > platforms anyway. > * It would buy very little to support 32-bit threaded code on 64-bit > platforms. Threaded-code size does not consume much memory there > (compared to what's available), and it also does not cause many > cache misses. Fair enough. The last two, which are more or less "I can't be bothered to change it, and it doesn't matter anyway" are rather weak, but OK, there may be some legitimate reasons. The claim was that ITC doubles size from 32-bit to 64-bit processors. It doesn't need to be that way: you might choose to do it that way, and on some operating systems you might even be forced to do it that way, but it ain't necessarily so. > BTW, I just looked at the switch tables generated by gcc, and they > use 64-bit entries on AMD64, while according to you they could use > 32-bit entries. If, as you say, there's no reason to use 64-bit > entries, why do they use them? I don't know, but: It may be a bug. The compiler may always generate switch tables as arrays of pointers, so perhaps it's a side-effect of using generic code to generate them. Maybe a suitable 32-bit reloc type doesn't exist for AMD-64, but I doubt that. Maybe a simple jump indirect instruction is used, and that instruction always uses a 64-bit pointer in memory. .... etc. Andrew. |
|
#24
| |||
| |||
| On 8 Aug, 13:42, stephen...@mpeforth.com (Stephen Pelc) wrote: > On Thu, 7 Aug 2008 15:16:39 -0700 (PDT), jacko <jackokr...@gmail.com> > wrote: > > >Well as cache operates somewhat faster than memory, there comes a > >point where DTC will beat STC even though you'd initially think > >otherwise. cache trashing can be a major bottleneck. I prefere DTC and > >ITC for this reason. Optimizing low level code can provide benefits, > >but often at a space cost, (more cache thrash anyone?). > > On conventional CPUs at least, the benchmark figures say you're > wrong by a factor of 10:1 and more. There are good and bad cache > implementations, and good and bad solutions for cache problems. Yeah probbly large code pointers used mainly full of some constant in the upper 16 bit. > STC compilers inline and optimise, so low-level words simply don't > include many CALL/RET pairs. For size comparison, we converted a > 256k byte (of binary) app from ITC to STC on the same hardware > and the STC version was about 2% smaller. Others have reported > similar results. Raw STC code may certainly suffer cache issues, > but even simple optimisations remove the problems. Fully optimising > compilers (let's call their output NCC for native compiled code) > produce code that is of similar size to DTC or ITC. Low level words should generate at most 1 CALL. There is very little reason not to use 16 bit addressing, a simple process of inserting 3*1/2 cells (on 32 bit) for a long jump would reduce most DTC pointers to half a cell. (This assumes people clever enough to place small kernals every so often in memory, or use the vocabulary system to split code into convienient blocks). > On silicon stack machines, Chuck Moore certainly doesn't agree > with you. Some Forth-machine FPGA implementations have reached > several hundreds of MIPs, but I'm not free to say more. Yes and they probably don't use the lowest speed grade and have pockets bigger than there heads. > Stephen > > -- > Stephen Pelc, stephen...@mpeforth.com > MicroProcessor Engineering Ltd - More Real, Less Time > 133 Hill Lane, Southampton SO15 5AF, England > tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 > web:http://www.mpeforth.com- free VFX Forth downloads |
|
#25
| |||
| |||
| On Mon, 11 Aug 2008 06:42:33 -0700 (PDT), jacko <jackokring@gmail.com> wrote: >On 8 Aug, 13:42, stephen...@mpeforth.com (Stephen Pelc) wrote: >Yeah probbly large code pointers used mainly full of some constant in >the upper 16 bit. >Low level words should generate at most 1 CALL. There is very little >reason not to use 16 bit addressing When we started serious NCC development, we very quickly learned to trust measured results over opinion. That's why we developed and published a simple set of integer benchmarks http://www.mpeforth.com/arena/benchmrk.fth http://www.mpeforth.com/arena/xbench32.fth You are welcome to publish your results. >> On silicon stack machines, Chuck Moore certainly doesn't agree >> with you. Some Forth-machine FPGA implementations have reached >> several hundreds of MIPs, but I'm not free to say more. > >Yes and they probably don't use the lowest speed grade and have >pockets bigger than there heads. If you come to EuroForth 2008 in Vienna, you can talk to people who have shipped silicon and used silicon stack machines for real applications. Stephen -- Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads |
|
#26
| |||
| |||
| Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >> >It is not common to use 16-bit code addressing on a 32-bit >> >processor because no-on wants to be limited to 64 kilothings of >> >code, but the same reasoning doesn't apply to 64-bit processors. >> >There's no reason at all to use 64-bit threading on a 64-bit >> >processor. > >> There are a number of reasons: > >> * Nothing at all guarantees that the code is all in the lower 4G of >> the address space (and on at least one platform it isn't), and the >> gcc maintainers and others >> have a tendency to break the gcc behaviour that we rely on; > >Of course, this isn't something gcc maintainers have any control over. The gcc maintainers do not have any control over gcc behaviour? >I don't know who "others" may be! In the example below, I guess it was the binutils maintainers. >> e.g., we used to rely on the code being in the lower 32M on PowerPC, >> and there is no reason for it not to be there, and it used to be >> there, and then one day it just was no longer there. I then did a >> linker script to put it there, but that stopped working a little >> later (and on reporting this as a bug I learned that one should not >> use linker scripts or somesuch). I now remember the story better, and at first I tried to correct the new text placement with the linker option -Ttext which did not work as documented; I reported that as bug, and was told that this should not work with ELF files, and I should write a linker script, which I then did, and for some time it worked. Eventually we switched to hybrid direct/indirect-threaded code, which made that code placement unnecessary, so we retired the linker script. >The claim was that ITC doubles size from 32-bit to 64-bit processors. >It doesn't need to be that way: you might choose to do it that way, >and on some operating systems you might even be forced to do it that >way, but it ain't necessarily so. Hmm, thinking again about it, with ITC it's necessarily so. With ITC, the addresses you put in the threaded code are not code addresses, but code-field addresses, i.e., general dictionary addresses. If you restrict these to 32 bits, you restrict the dictionary to the lower 4G. What kind of 64-bit system would that be? >> BTW, I just looked at the switch tables generated by gcc, and they >> use 64-bit entries on AMD64, while according to you they could use >> 32-bit entries. If, as you say, there's no reason to use 64-bit >> entries, why do they use them? .... >Maybe a simple jump indirect instruction is used, and that instruction >always uses a 64-bit pointer in memory. That's a good explanation. Let's see: jmp *.L11(,%rax,8) Yes, that's a good reason, but I'm not sure if the benefit of using a single instruction instead of two is worth the higher number of cache misses from the larger switch table. Especially since the compiler then chooses to add another instruction that just slows things down: ja .L25 mov %eax, %eax jmp *.L11(,%rax,8) I guess the MOV is there to get enough distance between the two jumps. But does it have to use %eax (on which the next instruction depends)? And if we want an instruction in between, we could split the JMP and use a 32-bit switch table: movl %eax,.L11(,%rax,4) jmp *%rax - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: http://www.forth200x.org/forth200x.html EuroForth 2008: http://www.euroforth.org/ef08.html |
|
#27
| |||
| |||
| Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: > Andrew Haley <andrew29@littlepinkcloud.invalid> writes: > >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: > >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: > >> >It is not common to use 16-bit code addressing on a 32-bit > >> >processor because no-on wants to be limited to 64 kilothings of > >> >code, but the same reasoning doesn't apply to 64-bit processors. > >> >There's no reason at all to use 64-bit threading on a 64-bit > >> >processor. > > > >> There are a number of reasons: > > > >> * Nothing at all guarantees that the code is all in the lower 4G > >> of the address space (and on at least one platform it isn't), and > >> the gcc maintainers and others have a tendency to break the gcc > >> behaviour that we rely on; > > > >Of course, this isn't something gcc maintainers have any control over. > The gcc maintainers do not have any control over gcc behaviour? This isn't gcc behaviour. > >I don't know who "others" may be! > In the example below, I guess it was the binutils maintainers. Perhaps, but the address at which a program is loaded probably isn't controlled by the binutils maintainers either. > Hmm, thinking again about it, with ITC it's necessarily so. With ITC, > the addresses you put in the threaded code are not code addresses, but > code-field addresses, i.e., general dictionary addresses. If you > restrict these to 32 bits, you restrict the dictionary to the lower > 4G. What kind of 64-bit system would that be? Obviously, that'd be a 64-bit system with code restricted to the lower 4G. The non-code part of the dictionary could be anywhere. obviously. > >> BTW, I just looked at the switch tables generated by gcc, and they > >> use 64-bit entries on AMD64, while according to you they could use > >> 32-bit entries. If, as you say, there's no reason to use 64-bit > >> entries, why do they use them? > ... > >Maybe a simple jump indirect instruction is used, and that instruction > >always uses a 64-bit pointer in memory. > That's a good explanation. Let's see: > jmp *.L11(,%rax,8) > Yes, that's a good reason, but I'm not sure if the benefit of using > a single instruction instead of two is worth the higher number of > cache misses from the larger switch table. Especially since the > compiler then chooses to add another instruction that just slows > things down: > ja .L25 > mov %eax, %eax > jmp *.L11(,%rax,8) > I guess the MOV is there to get enough distance between the two > jumps. But does it have to use %eax (on which the next instruction > depends)? And if we want an instruction in between, we could split > the JMP and use a 32-bit switch table: > movl %eax,.L11(,%rax,4) > jmp *%rax Yes, it looks like a 32-bit switch table would be better. Andrew. |
|
#28
| |||
| |||
| Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >> >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >> >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >> >> >There's no reason at all to use 64-bit threading on a 64-bit >> >> >processor. >> > >> >> There are a number of reasons: >> > > >> >> * Nothing at all guarantees that the code is all in the lower 4G >> >> of the address space (and on at least one platform it isn't), and >> >> the gcc maintainers and others have a tendency to break the gcc >> >> behaviour that we rely on; >> > >> >Of course, this isn't something gcc maintainers have any control over. > >> The gcc maintainers do not have any control over gcc behaviour? > >This isn't gcc behaviour. If gcc introduces crossjumps and pointlessly reorders the basic blocks, that's not gcc behaviour? >> >I don't know who "others" may be! > >> In the example below, I guess it was the binutils maintainers. > >Perhaps, but the address at which a program is loaded probably isn't >controlled by the binutils maintainers either. So, you are saying that nobody is responsible for code placement, nobody gives us a guarantee for it, but we should rely on it being in the lower 4G. Hmm, OTOH, would thse unknown people be worse than the gcc maintainers? Probably not. >> Hmm, thinking again about it, with ITC it's necessarily so. With ITC, >> the addresses you put in the threaded code are not code addresses, but >> code-field addresses, i.e., general dictionary addresses. If you >> restrict these to 32 bits, you restrict the dictionary to the lower >> 4G. What kind of 64-bit system would that be? > >Obviously, that'd be a 64-bit system with code restricted to the lower >4G. The non-code part of the dictionary could be anywhere. obviously. With the unusual definition of "code" that includes every word header, including those of e.g., constants and CREATEd words. - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: http://www.forth200x.org/forth200x.html EuroForth 2008: http://www.euroforth.org/ef08.html |
|
#29
| |||
| |||
| Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: > Andrew Haley <andrew29@littlepinkcloud.invalid> writes: > >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: > >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: > >> >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: > >> >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: > >> >> >There's no reason at all to use 64-bit threading on a 64-bit > >> >> >processor. > >> > > >> >> There are a number of reasons: > >> > > > > >> >> * Nothing at all guarantees that the code is all in the lower 4G > >> >> of the address space (and on at least one platform it isn't), and > >> >> the gcc maintainers and others have a tendency to break the gcc > >> >> behaviour that we rely on; > >> > > >> >Of course, this isn't something gcc maintainers have any control over. > > > >> The gcc maintainers do not have any control over gcc behaviour? > > > >This isn't gcc behaviour. > If gcc introduces crossjumps and pointlessly reorders the basic > blocks, that's not gcc behaviour? What on Earth does reordering basic blocks have to do with any of this? > >> >I don't know who "others" may be! > > > >> In the example below, I guess it was the binutils maintainers. > > > >Perhaps, but the address at which a program is loaded probably isn't > >controlled by the binutils maintainers either. > So, you are saying that nobody is responsible for code placement, I don't know. Maybe the kernel or libc, maybe both. This sort of thing is usually worked out by negotiation between the teams. > nobody gives us a guarantee for it, but we should rely on it being > in the lower 4G. This one is guaranteed by the ABI. The "small" x86_64 model depends on it. > Hmm, OTOH, would thse unknown people be worse than the gcc > maintainers? Probably not. > >> Hmm, thinking again about it, with ITC it's necessarily so. With ITC, > >> the addresses you put in the threaded code are not code addresses, but > >> code-field addresses, i.e., general dictionary addresses. If you > >> restrict these to 32 bits, you restrict the dictionary to the lower > >> 4G. What kind of 64-bit system would that be? > > > >Obviously, that'd be a 64-bit system with code restricted to the lower > >4G. The non-code part of the dictionary could be anywhere. obviously. > With the unusual definition of "code" that includes every word header, > including those of e.g., constants and CREATEd words. I don't see why the whole header must be there. Just the code field, surely. Andrew. |
|
#30
| |||
| |||
| Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >> >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >> >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >> >> >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >> >> >> Andrew Haley <andrew29@littlepinkcloud.invalid> writes: >> >> >> >There's no reason at all to use 64-bit threading on a 64-bit >> >> >> >processor. >> >> > >> >> >> There are a number of reasons: >> >> > >> > >> >> >> * Nothing at all guarantees that the code is all in the lower 4G >> >> >> of the address space (and on at least one platform it isn't), and >> >> >> the gcc maintainers and others have a tendency to break the gcc >> >> >> behaviour that we rely on; >> >> > >> >> >Of course, this isn't something gcc maintainers have any control over. >> > >> >> The gcc maintainers do not have any control over gcc behaviour? >> > >> >This isn't gcc behaviour. > >> If gcc introduces crossjumps and pointlessly reorders the basic >> blocks, that's not gcc behaviour? > >What on Earth does reordering basic blocks have to do with any of >this? It's one of the gcc behaviours that we relied on an that was broken by the gcc maintainers. >> So, you are saying that nobody is responsible for code placement, > >I don't know. Maybe the kernel or libc, maybe both. In my experience the code is placed where the binary says it should be placed, and the binary is produced by the linker. >> nobody gives us a guarantee for it, but we should rely on it being >> in the lower 4G. > >This one is guaranteed by the ABI. The "small" x86_64 model depends >on it. And if we decided to rely on it, the next version of gcc would no longer support the small model, and would place the code beyond 4GB in order to show the moral superiority of the standards-loving gcc maintainers. >> >Obviously, that'd be a 64-bit system with code restricted to the lower >> >4G. The non-code part of the dictionary could be anywhere. obviously. > >> With the unusual definition of "code" that includes every word header, >> including those of e.g., constants and CREATEd words. > >I don't see why the whole header must be there. Just the code field, >surely. Sure. In traditional Forth systems for large systems the code field is very close to the other header fields, and that's right before the parameter field. In such a system, 32-bit threaded code just means limiting the dictionary to 4G. - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: http://www.forth200x.org/forth200x.html EuroForth 2008: http://www.euroforth.org/ef08.html |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.