| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#11
| |||
| |||
| DavidM <nos...@nowhere.com> wrote: > What I am interested in is: when is it better to sweat over the coding of > forth words, squeezing every last ounce of speed out of them at the very > likely cost of readability/maintainability, as opposed to just taking the > time critical stuff and coding it as C and/or assembler primitives? Ideally, if you're going to spend that kind of effort, it's better to sweat about code before you settle on a design. That way your code will clearly and readably reflect a high-performance design, rather than being built for one design and then partially refactored into a different one and then a bunch of changes added without refactoring and then... I recall an essay in which Chuck Moore was quoted as saying that he would write many different versions of each part of his programs from scratch, finding the one that fit the problem best. I think he was talking about "Thoughtful Programming", but I can't find the specific part of any essay that talks about that... I may be wrong. > Dave -Wm |
|
#12
| |||
| |||
| Bernd Paysan wrote: > You don't see the forrest in the trees. When we argue here, we are quite > often concerned about speed. You then step in and say "there's no > requirement to be performant here". That's sloppy thinking - there's no > requirement for performance here and there, so you implement CPU hogs here > and there and then wonder why your application is dog slow. Speak for yourself. I rarely have any problems related to the performance of my code because I spend the time up-front to research and choose algorithms and data structures that are appropriate. And when I do find something I wrote is slower than expected or desired, I don't "wonder" anything-- I break out the tools. I measure with a profile. I count graticule lines on a oscilloscope. This approach-- up-front design work coupled with objective measurement-- is superior to the mindless "I must optimize every primitive" mindset because it focuses on the system. Spending your time optimizing a primitive that takes up a tiny fraction of a system's run-time makes no sense. But you only know what matters by thinking and measuring. > You forget that a fast, low-memory solution is often straight-forward and > clean design, too. It's easy to maintain, as well. That's the goal. If your > performance improvements are a burden for a small, clean implementation, > forget it. I enjoy your canned knee-jerk response here. It sounds well-practiced. Too bad it doesn't apply to what I was wrote. I have no problem with the ideal that people should write "fast, low-memory solutions." My problem is that there are too many programmers who reflexively seek out those solutions without carefully considering the requirements. And then they are completely surprised later when their "fast, low-memory solution" fails to meet performance expectations. The choice of using a more-sophisticated algorithm may indeed be because the programmer doesn't know what they are doing. But it can also be because the programmer understands the performance requirements and knows the simpler routine can't meet those performance requirements. But really, the majority of what I was addressing in my reply was the notion that systems built on virtual machines are slower than native code. And sometimes, they are. But as I wrote, the slowest VM can out-perform the fastest native code if the algorithms are superior. Those who don't understand how this can be probably also can't understand how in the early days of Forth, the relatively slow implementation mechanisms could still beat native code. > After you have done that - implement something that's already sane in terms > of space, performance, and lines of codes, you can start measuring things. > You might be surprised that something that looked sane isn't, but in > general, it makes things much easier. After all, you have only to hunt > those bottlenecks that are still there despite of the preparation, and the > design is lean and clean. Thanks for the generic lecture. |
|
#13
| |||
| |||
| On Aug 5, 6:56 pm, John Passaniti <n...@JapanIsShinto.com> wrote: > Bernd Paysan wrote: > > You don't see the forrest in the trees. When we argue here, we are quite > > often concerned about speed. You then step in and say "there's no > > requirement to be performant here". That's sloppy thinking - there's no > > requirement for performance here and there, so you implement CPU hogs here > > and there and then wonder why your application is dog slow. > > Speak for yourself. I rarely have any problems related to the > performance of my code because I spend the time up-front to research and > choose algorithms and data structures that are appropriate. And when I > do find something I wrote is slower than expected or desired, I don't > "wonder" anything-- I break out the tools. I measure with a profile. I > count graticule lines on a oscilloscope. > > This approach-- up-front design work coupled with objective > measurement-- is superior to the mindless "I must optimize every > primitive" mindset because it focuses on the system. Spending your time > optimizing a primitive that takes up a tiny fraction of a system's > run-time makes no sense. But you only know what matters by thinking and > measuring. > > > You forget that a fast, low-memory solution is often straight-forward and > > clean design, too. It's easy to maintain, as well. That's the goal. If your > > performance improvements are a burden for a small, clean implementation, > > forget it. > > I enjoy your canned knee-jerk response here. It sounds well-practiced. > Too bad it doesn't apply to what I was wrote. > > I have no problem with the ideal that people should write "fast, > low-memory solutions." My problem is that there are too many > programmers who reflexively seek out those solutions without carefully > considering the requirements. And then they are completely surprised > later when their "fast, low-memory solution" fails to meet performance > expectations. > > The choice of using a more-sophisticated algorithm may indeed be because > the programmer doesn't know what they are doing. But it can also be > because the programmer understands the performance requirements and > knows the simpler routine can't meet those performance requirements. > > But really, the majority of what I was addressing in my reply was the > notion that systems built on virtual machines are slower than native > code. And sometimes, they are. But as I wrote, the slowest VM can > out-perform the fastest native code if the algorithms are superior. > Those who don't understand how this can be probably also can't > understand how in the early days of Forth, the relatively slow > implementation mechanisms could still beat native code. > > > After you have done that - implement something that's already sane in terms > > of space, performance, and lines of codes, you can start measuring things. > > You might be surprised that something that looked sane isn't, but in > > general, it makes things much easier. After all, you have only to hunt > > those bottlenecks that are still there despite of the preparation, and the > > design is lean and clean. > > Thanks for the generic lecture. What speed profiling tools do people use? What's a good one to use with SwiftForth? |
|
#14
| |||
| |||
| On 3 Aug, 23:45, DavidM <nos...@nowhere.com> wrote: > As we all know, all VMs regardless of the language can impose huge run- > time performance penalties, compared to native coding in native compiled > languages. > > An STC-based Forth with optimal hand-coded assembler primitives can evade > this cost to a large degree, so I won't be talking about that here. I'm > thinking more of DTC, ITC and TTC-based forths. > > What I am interested in is: when is it better to sweat over the coding of > forth words, squeezing every last ounce of speed out of them at the very > likely cost of readability/maintainability, as opposed to just taking the > time critical stuff and coding it as C and/or assembler primitives? > > Cheers > Dave Well as cache operates somewhat faster than memory, there comes a point where DTC will beat STC even though you'd initially think otherwise. cache trashing can be a major bottleneck. I prefere DTC and ITC for this reason. Optimizing low level code can provide benefits, but often at a space cost, (more cache thrash anyone?). To this end i decided http://nibz.googlecode.com should use DTC. ITC was thought slightly indirect and had no apparent space reduction. Optimizing primitives will not gain too muh, primitive pair sequences as single words holds more potential. cheers jacko |
|
#15
| |||
| |||
| On Thu, 7 Aug 2008 15:16:39 -0700 (PDT), jacko <jackokring@gmail.com> wrote: >Well as cache operates somewhat faster than memory, there comes a >point where DTC will beat STC even though you'd initially think >otherwise. cache trashing can be a major bottleneck. I prefere DTC and >ITC for this reason. Optimizing low level code can provide benefits, >but often at a space cost, (more cache thrash anyone?). On conventional CPUs at least, the benchmark figures say you're wrong by a factor of 10:1 and more. There are good and bad cache implementations, and good and bad solutions for cache problems. STC compilers inline and optimise, so low-level words simply don't include many CALL/RET pairs. For size comparison, we converted a 256k byte (of binary) app from ITC to STC on the same hardware and the STC version was about 2% smaller. Others have reported similar results. Raw STC code may certainly suffer cache issues, but even simple optimisations remove the problems. Fully optimising compilers (let's call their output NCC for native compiled code) produce code that is of similar size to DTC or ITC. On silicon stack machines, Chuck Moore certainly doesn't agree with you. Some Forth-machine FPGA implementations have reached several hundreds of MIPs, but I'm not free to say more. Stephen -- Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads |
|
#16
| |||
| |||
| stephenXXX@mpeforth.com (Stephen Pelc) wrote: > STC compilers inline and optimise, so low-level words simply don't > include many CALL/RET pairs. For size comparison, we converted a > 256k byte (of binary) app from ITC to STC on the same hardware > and the STC version was about 2% smaller. Others have reported > similar results. Back in the old days, we used to claim that ITC was smaller, that this was one of the advantages of Forth. Were we wrong then, or does your result come because the processors have changed? |
|
#17
| |||
| |||
| Jonah Thomas <jethomas5@gmail.com> writes: >stephenXXX@mpeforth.com (Stephen Pelc) wrote: > >> STC compilers inline and optimise, so low-level words simply don't >> include many CALL/RET pairs. For size comparison, we converted a >> 256k byte (of binary) app from ITC to STC on the same hardware >> and the STC version was about 2% smaller. Others have reported >> similar results. > >Back in the old days, we used to claim that ITC was smaller, that this >was one of the advantages of Forth. > >Were we wrong then, or does your result come because the processors have >changed? 32-bit ITC is twice as big as 16-bit ITC, and 64-bit ITC adds another factor of two. If native code for ARM or i386 is smaller than 32-bit ITC, that does not mean that native code for the 8085 is smaller than 16-bit ITC, even if it was compiled with something like VFX. - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: http://www.forth200x.org/forth200x.html EuroForth 2008: http://www.euroforth.org/ef08.html |
|
#18
| |||
| |||
| Jonah Thomas wrote: > Back in the old days, we used to claim that ITC was smaller, that this > was one of the advantages of Forth. Back in the old days, we had run ITC on 8 bit processors, where a 16 bit add was an operation that consisted of at least two loads (two bytes each), two adds (two bytes each) and two stores (also two bytes each), all going through the single accumulator. ITC gave you the same operation in 2 bytes instead of 12, no wonder ITC was smaller. > Were we wrong then, or does your result come because the processors have > changed? Processors have changed, and ITC on 32 bit processors already uses 4 bytes per instruction - on 64 bits, native code size doesn't change much, but ITC doubles size again. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/ |
|
#19
| |||
| |||
| Bernd Paysan <bernd.paysan@gmx.de> wrote: > Jonah Thomas wrote: > > Back in the old days, we used to claim that ITC was smaller, that this > > was one of the advantages of Forth. > Back in the old days, we had run ITC on 8 bit processors, where a 16 bit add > was an operation that consisted of at least two loads (two bytes each), two > adds (two bytes each) and two stores (also two bytes each), all going > through the single accumulator. ITC gave you the same operation in 2 bytes > instead of 12, no wonder ITC was smaller. > > Were we wrong then, or does your result come because the processors have > > changed? > Processors have changed, and ITC on 32 bit processors already uses 4 > bytes per instruction - on 64 bits, native code size doesn't change > much, but ITC doubles size again. Well, hold on a minute. On a 64-bit processor you can use 32-bit addressing for code, as long as you have less than 4 gigathings of code. This is a common optimization used by many programming languages. It's the default for gcc on AMD-64, for example. It is not common to use 16-bit code addressing on a 32-bit processor because no-on wants to be limited to 64 kilothings of code, but the same reasoning doesn't apply to 64-bit processors. There's no reason at all to use 64-bit threading on a 64-bit processor. Andrew. |
|
#20
| |||
| |||
| Anton Ertl wrote: > Jonah Thomas <jethomas5@gmail.com> writes: >> stephenXXX@mpeforth.com (Stephen Pelc) wrote: >> >>> STC compilers inline and optimise, so low-level words simply don't >>> include many CALL/RET pairs. For size comparison, we converted a >>> 256k byte (of binary) app from ITC to STC on the same hardware >>> and the STC version was about 2% smaller. Others have reported >>> similar results. >> Back in the old days, we used to claim that ITC was smaller, that this >> was one of the advantages of Forth. >> >> Were we wrong then, or does your result come because the processors have >> changed? > > 32-bit ITC is twice as big as 16-bit ITC, and 64-bit ITC adds another > factor of two. If native code for ARM or i386 is smaller than 32-bit > ITC, that does not mean that native code for the 8085 is smaller than > 16-bit ITC, even if it was compiled with something like VFX. > > - anton Yes. In the 70's, you could fit a 16-bit ITC xt in one cell, while a CALL took at least 3 bytes, so it seemed like a win, particularly since the implementation itself (which was resident) remained small and simple. With 32-bit processors, you could fit a CALL in a cell for a very large amount of memory. When we switched over in the 90's we were surprised at how much the programs shrank. On a 32-bit processor we got almost 20% in one fairly complex app. On the small micros, the use of interactive cross-compilers as opposed to a resident Forth means the targets don't have to bear the cost of heads, compiler, etc., and you can use fairly sophisticated compiler strategies to deliver programs both very small and very fast. That wasn't really an option in the 70's. Cheers, Elizabeth -- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310.999.6784 5959 West Century Blvd. Suite 700 Los Angeles, CA 90045 http://www.forth.com "Forth-based products and Services for real-time applications since 1973." ================================================== |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.