| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| As we all know, all VMs regardless of the language can impose huge run- time performance penalties, compared to native coding in native compiled languages. An STC-based Forth with optimal hand-coded assembler primitives can evade this cost to a large degree, so I won't be talking about that here. I'm thinking more of DTC, ITC and TTC-based forths. What I am interested in is: when is it better to sweat over the coding of forth words, squeezing every last ounce of speed out of them at the very likely cost of readability/maintainability, as opposed to just taking the time critical stuff and coding it as C and/or assembler primitives? Cheers Dave |
|
#2
| |||
| |||
| On Aug 3, 6:45 pm, DavidM <nos...@nowhere.com> wrote: > As we all know, all VMs regardless of the language can impose huge run- > time performance penalties, compared to native coding in native compiled > languages. > > An STC-based Forth with optimal hand-coded assembler primitives can evade > this cost to a large degree, so I won't be talking about that here. I'm > thinking more of DTC, ITC and TTC-based forths. > > What I am interested in is: when is it better to sweat over the coding of > forth words, squeezing every last ounce of speed out of them at the very > likely cost of readability/maintainability, as opposed to just taking the > time critical stuff and coding it as C and/or assembler primitives? > > Cheers > Dave I'd say ... you shouldn't sweat over Forth code at all. Do what's necessary and no more. You want to finish your application (if you're writing one), not torture yourself. I'd choose readability and ease of coding over speed. At the same time I like speed too so I use fast native external libraries and an STC Forth. Many times I've read about determining the most often executed routines and optimizing those into assembly. I have a handful in my project that are candidates for this, but I'm waiting until I need more speed, not anticipating it. Or when I'm bored and just want to code SOMETHING. Which I've done, but I usually feel an odd sense of having wasted my time after. So it goes, a lot of people say not to optimize prematurely. Oops. ![]() On the bright side, algorithm-redesign can speed up things. I've reduced routines to 25% their original size just through a redesign of an algorithm that was meant to simplify things, and I wasn't even looking for speed. Hope those helped. Roger |
|
#3
| |||
| |||
| DavidM <nospam@nowhere.com> wrote: > What I am interested in is: when is it better to sweat over the coding > of forth words, squeezing every last ounce of speed out of them at the > very likely cost of readability/maintainability, as opposed to just > taking the time critical stuff and coding it as C and/or assembler > primitives? If your code is already fast enough running on your particular Forth on your particular hardware, then you don't need to do either one, you're done. So, you have your Forth code that works but it isn't fast enough. Step back and notice whether you see some other method that would run faster. You can try out new methods faster in Forth. See which one looks like it's actually doing the least work. If you find a better algorithm and it's fast enough, then you're done. OK, so your best method is still too slow. Look at it carefully for ways to speed it up with better Forth. Don't do things that make it unreadable. They aren't worth it. Something will go wrong and it will be extra trouble to fix it. But you might as well look in case you've missed something that would speed it up. You can do some profiling to see where the slow stuff is, don't spend much effort on the stuff that can't help. If you see something that lets you speed up a critical inner loop, maybe it will be fast enough. If so then you're done. If it's still too slow then by this time you know where the slow spots are. Look for something that it pays to do in C or assembler. If you're already using a good Forth optimiser that produces native code, you might only expect to speed it up 3-4 times. Maybe less, depending. If you need more speedup than you have any right to hope for, now is the time to either go back and look again for a better algorithm, or else look for faster hardware. Or you could try doing it in C or assembly just in case. If it's fast enough at this point then you're done. If you've already coded one bottleneck and it didn't help enough, you probably have some idea how much speedup you can hope for from the second bottleneck. Guess whether you can make it fast enough by assembly coding. If it doesn't look plausible, your best choices are to find a faster algorithm or get faster hardware. Look at the things that gobble up the time and imagine ways to get your result without doing them. Like, one time the slow part was to compute successive integer square roots inside an inner loop. The solution was to not compute square roots, but instead compute successive squares. The square root stays the same for x iterations until you get to the new square. And if you know n and n^2, then (n+1)^2 = n^2 + 2n + 1. Very fast. Making your code unreadable is a mug's game. You lose the advantages of Forth for a moderate speed improvement. Rewriting your code in C or assembler can be a mug's game. You lose the advantages of Forth for a moderate speed improvement. Do it if you need to, and if you think it will work well enough. Forth is good for prototyping alternate methods. That's worth a try. The problem with that approach is that you can't tell ahead of time what you'll find, and there's a chance you'll put signficant time into it and not find anything. So alternate with other approaches. If you have a manager who pays close attention to how you use your time, coding things in C will look like a valid exercise. So when you spend half your time doing that, at worst it will look like you're half as fast coding in C as you really are. If you spend all your time looking for better methods and you don't find them, then it might appear that you've just been goofing off. Writing code that's at the very edge of what your processor can do is also a mug's game. You put a lot of effort into getting things barely fast enough, and pretty soon the specs will change and demand more. The first time you have to be real smart to get your code fast enough is a big warning that you need a faster processor. They can pay you the big bucks to do smarter and smarter tricks, and then they'll have to switch to a faster processor anyway. All the time you spent writing optimised assembly code for the old processor is wasted. (But the C code can be salvaged provided the C compiler for the new chip is 100% compatible with that for the old one.) The Forth code ought to run but the speed tradeoffs may be different. If you spent a lot of effort writing just exactly the sequence of Forth that was fastest, and now SWAP is seven times as fast as it used to be but ROT is only twice as fast, any effort you spent on fast stack juggling is wasted -- even if speed is still an issue on the new processor. [minor rant on] Forth is good for making code size small. You can put some effort into byte code, you can compress source code and decompress it a line at a time and interpret it, there are lots of ways to make your code very small if you don't need it to be fast. It can be pretty cheap to produce small code. Forth is one of the best scripting languages for making code fast. Good Forth programmers can produce relatively fast code cheaper than faster code written in C. Forth is one of the best choices for making code that's fast and small both. But it isn't cheap to do that. This is not a good market niche for Forth, even though it's a niche that Forth is good for. If somebody is using an obsolescent processor that needs to do more than it can do, if they want to cram more functionality in limited space and limited speed than anybody can reasonably expect, maybe you can do it for them with Forth. Then they bite the bullet and switch processors, and their costs for getting you to do all that great stuff must be completely amortized right then. Very likely they'll decide it wasn't worth it, they should have switched earlier. Next time, or the time after, they do switch earlier. You get bragging rights for doing this superhuman work but repeat sales don't happen as often as you'd like. I think it would be much better to develop the reputation for delivering more than expected, if you can do that. If you can give a competitive bid, and then meet the specs long before deadline, and ask "What else would you like us to do?".... Getting the obsolete processor to deliver a little bit longer is not quite beating a dead horse. Delivering code that's small and reasonably fast and *correct*, before deadline, and then offering something extra -- there ought to be a big market for that. If you can deliver. [rant end] |
|
#4
| |||
| |||
| DavidM wrote: > As we all know, all VMs regardless of the language can impose huge run- > time performance penalties, compared to native coding in native compiled > languages. > > An STC-based Forth with optimal hand-coded assembler primitives can evade > this cost to a large degree, so I won't be talking about that here. I'm > thinking more of DTC, ITC and TTC-based forths. > > What I am interested in is: when is it better to sweat over the coding of > forth words, squeezing every last ounce of speed out of them at the very > likely cost of readability/maintainability, as opposed to just taking the > time critical stuff and coding it as C and/or assembler primitives? > > Cheers > Dave A story I've told here before is applicable (sorry if you've already heard it): FORTH, Inc. was asked to recode a baggage handling system for American Airlines. The original program was all assembler, and too expensive to maintain. We were required to reproduce the user interface and basic bag handling procedures, but could do whatever else seemed appropriate. Our program was written entirely in polyFORTH (ITC), running native on an LSI-11 (yeah, it was quite a few years ago). When it was sufficiently complete to run some timing tests, everyone was astonished: our system could handle 25% more bags/minute than the previous one. polyFORTH was obviously not faster than pure assembler; the point was that our internal design was far more efficient than its predecessor. The overall design of an application is a much stronger determinant of performance than language (any language). Sweating bullets over language benchmarks is a largely meaningless exercise. As others have said, the important thing is to get your program running in the most straightforward way possible. In designing your program, be cognizant of the potentially time-critical parts, and try to come up with a clean design implemented in clean, readable code. When your program is running correctly, you can do timing studies and it should be clear what sections, if any, need some kind of optimization. Modern Forths running on modern hardware are fast enough for the vast majority of applications. It's not worth sweating until you have a working program and can establish that you have a timing problem (and where it is). Then you can focus on that bottleneck. Cheers, Elizabeth -- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310.999.6784 5959 West Century Blvd. Suite 700 Los Angeles, CA 90045 http://www.forth.com "Forth-based products and Services for real-time applications since 1973." ================================================== |
|
#5
| |||
| |||
| Elizabeth D Rather wrote: ... > A story I've told here before is applicable (sorry if you've already > heard it): FORTH, Inc. was asked to recode a baggage handling system > for American Airlines. The original program was all assembler, and too > expensive to maintain. We were required to reproduce the user interface > and basic bag handling procedures, but could do whatever else seemed > appropriate. Our program was written entirely in polyFORTH (ITC), > running native on an LSI-11 (yeah, it was quite a few years ago). When > it was sufficiently complete to run some timing tests, everyone was > astonished: our system could handle 25% more bags/minute than the > previous one. polyFORTH was obviously not faster than pure assembler; > the point was that our internal design was far more efficient than its > predecessor. Which code was running last Wednesday at JFK? What a mess! http://news.yahoo.com/s/nm/20080730/ts_nm/amr_jfk_dc Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ |
|
#6
| |||
| |||
| On Aug 3, 6:45 pm, DavidM <nos...@nowhere.com> wrote: > As we all know, all VMs regardless of the language can impose huge run- > time performance penalties, compared to native coding in native compiled > languages. The key word is "can." The slowest VM can still outperform the fastest native code if the algorithms used are superior. What matters is the performance of the system as a whole, not the VM. That's a trap that you see endlessly here in comp.lang.forth-- a preoccupation with speed. But not speed of some concrete real-world application, but the speed of some small primitive in the system. The theory, I guess, is that by focusing on speeding up all the primitives, the performance of the overall system is improved. To which I say, nonsense-- if I choose a superior algorithm, that is going to give me a far better pay-off in terms of performance than if I manage to reduce a routine I hardly ever call by a few cycles. > What I am interested in is: when is it better to sweat over the coding of > forth words, squeezing every last ounce of speed out of them at the very > likely cost of readability/maintainability, as opposed to just taking the > time critical stuff and coding it as C and/or assembler primitives? The first step is to consider different algorithms and data structures. And here, simpler isn't always better. For example, depending on size of what you're searching, a simple linear search will be slower (O(n)) than a more complex binary search (O(log2 N)). And a more complex binary search is likely to be slower than an even more complex hashing algorithm (typically O(1)). On the other hand, if the size of what you're searching is small, then a linear search may be fastest. It requires thought and sometimes experiment to know how to choose algorithms and data structures. If you're talking to hardware that has strict real-time performance requirements that can't be met in Forth, then that is one time when coding in a more primitive language makes sense. If you're processing is not keeping up with input data, then coding core routines in a more primitive language makes sense. But before one starts down that road, they need to *measure*. That may mean running a profiler to identify where to focus efforts. That may mean getting out your oscilloscope or logic analyzer and establishing a timing baseline. That may mean instrumenting the VM to collect statistics on things like counts of certain instructions or how much time some instructions take up. The point is that you need to have some objective metric by which you can not only verify that your efforts have paid off, but to understand the run-time behavior of the system. You need that because your intuition can be wrong. You need that because your experience can blind you to what is front of you. |
|
#7
| |||
| |||
| Jerry Avins wrote: > Elizabeth D Rather wrote: > > ... > >> A story I've told here before is applicable (sorry if you've already >> heard it): FORTH, Inc. was asked to recode a baggage handling system >> for American Airlines. The original program was all assembler, and >> too expensive to maintain. We were required to reproduce the user >> interface and basic bag handling procedures, but could do whatever >> else seemed appropriate. Our program was written entirely in >> polyFORTH (ITC), running native on an LSI-11 (yeah, it was quite a few >> years ago). When it was sufficiently complete to run some timing >> tests, everyone was astonished: our system could handle 25% more >> bags/minute than the previous one. polyFORTH was obviously not faster >> than pure assembler; the point was that our internal design was far >> more efficient than its predecessor. > > Which code was running last Wednesday at JFK? What a mess! > http://news.yahoo.com/s/nm/20080730/ts_nm/amr_jfk_dc > > Jerry Hah, thank you for that! No, our system was at LAX. It operated for about 10 years before AA corporate decided to standardize on a turnkey system provided by a company "specializing in baggage handling systems". ISTR there was a similar snafu when the new Denver terminal opened. Cheers, Elizabeth -- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310.999.6784 5959 West Century Blvd. Suite 700 Los Angeles, CA 90045 http://www.forth.com "Forth-based products and Services for real-time applications since 1973." ================================================== |
|
#8
| |||
| |||
| John Passaniti wrote: > The key word is "can." The slowest VM can still outperform the > fastest native code if the algorithms used are superior. What matters > is the performance of the system as a whole, not the VM. That's a > trap that you see endlessly here in comp.lang.forth-- a preoccupation > with speed. But not speed of some concrete real-world application, > but the speed of some small primitive in the system. You don't see the forrest in the trees. When we argue here, we are quite often concerned about speed. You then step in and say "there's no requirement to be performant here". That's sloppy thinking - there's no requirement for performance here and there, so you implement CPU hogs here and there and then wonder why your application is dog slow. You forget that a fast, low-memory solution is often straight-forward and clean design, too. It's easy to maintain, as well. That's the goal. If your performance improvements are a burden for a small, clean implementation, forget it. After you have done that - implement something that's already sane in terms of space, performance, and lines of codes, you can start measuring things. You might be surprised that something that looked sane isn't, but in general, it makes things much easier. After all, you have only to hunt those bottlenecks that are still there despite of the preparation, and the design is lean and clean. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/ |
|
#9
| |||
| |||
| Bernd Paysan wrote: > John Passaniti wrote: >> The key word is "can." The slowest VM can still outperform the >> fastest native code if the algorithms used are superior. What matters >> is the performance of the system as a whole, not the VM. That's a >> trap that you see endlessly here in comp.lang.forth-- a preoccupation >> with speed. But not speed of some concrete real-world application, >> but the speed of some small primitive in the system. > > You don't see the forrest in the trees. When we argue here, we are quite > often concerned about speed. You then step in and say "there's no > requirement to be performant here". That's sloppy thinking - there's no > requirement for performance here and there, so you implement CPU hogs here > and there and then wonder why your application is dog slow. > > You forget that a fast, low-memory solution is often straight-forward and > clean design, too. It's easy to maintain, as well. That's the goal. If your > performance improvements are a burden for a small, clean implementation, > forget it. > > After you have done that - implement something that's already sane in terms > of space, performance, and lines of codes, you can start measuring things. > You might be surprised that something that looked sane isn't, but in > general, it makes things much easier. After all, you have only to hunt > those bottlenecks that are still there despite of the preparation, and the > design is lean and clean. Well, but the OP was asking where he should put his emphasis in optimizing low-level code. John and I are both saying, in different words, that the place to put the primary effort is in application design, not low-level code optimization. We're not saying performance doesn't matter, but pointing out where the main focus should be in achieving it. In my story about the baggage system, the polyFORTH that we used was roughly 10x faster than FIGforths of that era, but what made the difference was not that but the design of polyFORTH's multitasker and the way we used it in the application (the old system was doing a lot of polling and flag passing internally). I'm all for fast systems, but that doesn't address the OP's issue. Cheers, Elizabeth -- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310.999.6784 5959 West Century Blvd. Suite 700 Los Angeles, CA 90045 http://www.forth.com "Forth-based products and Services for real-time applications since 1973." ================================================== |
|
#10
| |||
| |||
| On 4 Aug 2008 10:45:30 +1200, DavidM <nospam@nowhere.com> wrote: >As we all know, all VMs regardless of the language can impose huge run- >time performance penalties, compared to native coding in native compiled >languages. > >An STC-based Forth with optimal hand-coded assembler primitives can evade >this cost to a large degree, so I won't be talking about that here. I'm >thinking more of DTC, ITC and TTC-based forths. > >What I am interested in is: when is it better to sweat over the coding of >forth words, squeezing every last ounce of speed out of them at the very >likely cost of readability/maintainability, as opposed to just taking the >time critical stuff and coding it as C and/or assembler primitives? It depends what the VM is for! When MPE and Forth Inc were working on the OTA virtual machine, we found that if the high level portion of the system (I/O, database ...) was sufficiently high-level, the payment terminal applications spent most of their time in the high level functions. I do not remember the numbers ... it was a long time agao. OTA was a token threaded 32 bit system. Underneath that, depending on the CPU were DTC and STC Forth kernels for CPUs like 80186/V25, 8051 and 68000. Stephen -- Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.