Optimizing stack access for a stack based VM - Compilers
This is a discussion on Optimizing stack access for a stack based VM - Compilers ; Hello everyone,
I'm working on a compiler of a Pascal-like language for a specific
stack-based virtual machine. To produce small code, I would like to
optimize the storage of variables and temporaries akin to register
allocation on a register-based machine.
...
-
Optimizing stack access for a stack based VM
Hello everyone,
I'm working on a compiler of a Pascal-like language for a specific
stack-based virtual machine. To produce small code, I would like to
optimize the storage of variables and temporaries akin to register
allocation on a register-based machine.
Local variables should occur as natural operands of instructions (on
the stack top) as much as possible, removing the need to fetch them
from the middle of the stack (and encode displacement values).
(More detailed description below.)
Do you know if anyone already tried something like this, can you point me
to any algorithms/articles/software?
Target machine description:
In short: Imagine a classical Harvard-style RISC processor, only that there
are no registers, instruction operands are on the stack.
There are separate address space for (1) code, (2) data, (3) stack.
Code - classical linear sequence of instructions with conditional jumps
for branching
Arithmetic instructions - for example "add" - pops two values off the stack,
adds them and pushes the result onto the stack
Data - "ld [<addr>]" loads a word from the given address and pushes it
on the stack; "st [<addr>]" pops a word off the stack and stores it
to the given address. "Static" variables are stored here.
Stack - direct access to stack is possible, "stkld [<offs>]", "stkst [<offs>]"
similar to ld/st, but <offs> is relative to the stack top. Auxiliary
instructions: "dup" duplicate value on stack top, "trash" remove value from
stack top, "swap" exchange two values on stack top. "Automatic" variables
and temporaries are stored here.
Obviously anytime we want to execute an arithmetic operation it would be
beneficial to have its operands on the stack top removing the need for
stkld/stkst (and coding <offs>).
While we get this for free for temporaries (= temporary results in arithmetic
expressions) just by producing stack-based code from the source code, I'd like
to extend it to work for variables, too.
I suppose an algorithm for an accumulator-based machine (which has
1 accumulator register) could also be used (with less efficiency).
TIA
Regards
Jiri
-
Re: Optimizing stack access for a stack based VM
Jiri Svoboda wrote:
> Local variables should occur as natural operands of instructions (on
> the stack top) as much as possible, removing the need to fetch them
> from the middle of the stack (and encode displacement values).
IMO it doesn't matter whether you encode local variables as offsets
from the SP, or from another base pointer.
> Obviously anytime we want to execute an arithmetic operation it would be
> beneficial to have its operands on the stack top removing the need for
> stkld/stkst (and coding <offs>).
>
> While we get this for free for temporaries (= temporary results in
> arithmetic expressions) just by producing stack-based code from the
> source code, I'd like to extend it to work for variables, too.
Since access to local variables requires some selector (offset...)
value, you'll have to duplicate all operations, for using a local
variable as the second operand. Perhaps further sets are required, for
locals as first or both operands, and operations with non-local
variables would require further instructions. I'd stay with an RISC
instruction set, and only push/pop variables by dedicated
instructions.
Did you have a look at the i80x87 FPU instruction set? Or at the Java
VM, or at FORTH?
DoDi
-
Re: Optimizing stack access for a stack based VM
On Sep 11, 10:13 am, Jiri Svoboda <jirik.svob...@seznam.cz> wrote:
> I'm working on a compiler of a Pascal-like language for a specific
> stack-based virtual machine. To produce small code, I would like to
> optimize the storage of variables and temporaries akin to register
> allocation on a register-based machine.
>
> Local variables should occur as natural operands of instructions (on
> the stack top) as much as possible, removing the need to fetch them
> from the middle of the stack (and encode displacement values).
>
> (More detailed description below.)
>
> Do you know if anyone already tried something like this, can you point me
> to any algorithms/articles/software?
Much of what you describe is accomplished by Forth.
> Target machine description:
>
> In short: Imagine a classical Harvard-style RISC processor, only that there
> are no registers, instruction operands are on the stack.
>
> There are separate address space for (1) code, (2) data, (3) stack.
The separation between code and data isn't required in Forth (although
there may be advantages in certain architectures that treat each space
differently, or don't like mixing of code and data).
> Code - classical linear sequence of instructions with conditional jumps
> for branching
Or token-threaded code; it doesn't need to be executable code.
> Arithmetic instructions - for example "add" - pops two values off the stack,
> adds them and pushes the result onto the stack
>
> Data - "ld [<addr>]" loads a word from the given address and pushes it
> on the stack; "st [<addr>]" pops a word off the stack and stores it
> to the given address. "Static" variables are stored here.
Forth uses addresses on the stack, and a @ (fetch contents of address)
and ! (store into address the value 2nd on the stack).
variable x
10 x ! \ set x to 10; x puts the address on the stack for the 2
operand !
> Stack - direct access to stack is possible, "stkld [<offs>]", "stkst [<offs>]"
> similar to ld/st, but <offs> is relative to the stack top. Auxiliary
> instructions: "dup" duplicate value on stack top, "trash" remove value from
> stack top, "swap" exchange two values on stack top. "Automatic" variables
> and temporaries are stored here.
Most operations in a stack based VM only require access to the top 2
stack entries. Forth provides your basic arithmetic instructions,
along with stack juggling words; SWAP and DUP you mention, and "trash"
is DROP. Also consider having an OVER ( a b -> a b a ).
The stack offset word you describe is Forth's PICK; but it is
infrequently used, and can be eliminated entirely for most code.
Digging down on the stack is frowned on in Forth, because many of the
machines on which it is implemented do not have an item addressable
stack. In general, it's not required, and 99 times out of 100
indicates a loss of stack control on the part of the programmer.
> Obviously anytime we want to execute an arithmetic operation it
> would be beneficial to have its operands on the stack top removing
> the need for stkld/stkst (and coding <offs>).
"Variables" in Forth are words that push their addresses on the stack,
as in the @ and ! example given above. Forth also has a value, or a
self fetching variable.
> While we get this for free for temporaries (= temporary results in
> arithmetic expressions) just by producing stack-based code from the
> source code, I'd like to extend it to work for variables, too.
> I suppose an algorithm for an accumulator-based machine (which has 1
> accumulator register) could also be used (with less efficiency).
It might be more efficient if stack accesses are more expensive than
accumulator accesses. Just consider the accumulator as the top entry
of the stack, and the stack holds entries below that.
> TIA
See comp.lang.forth,
Anton Ertl's website at http://www.complang.tuwien.ac.at/anton/home.html,
the comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html,
and this thread on clf; http://groups.google.com/group/comp....a6b972/?hl=en#
--
Regards
Alex McDonald
-
Re: Optimizing stack access for a stack based VM
Jiri Svoboda wrote:
> I'm working on a compiler of a Pascal-like language for a specific
> stack-based virtual machine. To produce small code, I would like to
> optimize the storage of variables and temporaries akin to register
> allocation on a register-based machine.
I've seen something like this done in two places:
One is Postscript, where programmers do this manually. There are no
statements or expressions as such, just a stack and operators.
To do the equivalent of "x := x + 1":
dict /x get 1 add dict /x exch put
You can also do optimize away one dictionary lookup with something like
this:
dict /x 2 copy get 1 add put
Then there's the Unisys MCP Series and their Burroughs ancestors, all
stack machines. I've seen those compilers optimize expressions, e.g.:
x ** 5 would be something like
load x
dupl
dupl
mult
dupl
mult
mult
I'm not sure these compilers do everything you're looking for.
I do know that the earlier machines in this series (Burroughs B6700)
kept the top two words of the stack (and their double precision
extensions) in registers and the rest of the stack in memory. Later
machines cached the top n words of the stack, which made stack access
faster.
I'm not sure I understand all of what you're trying to do. Are you
trying to reuse stack offsets when variables are no longer needed?
Are you trying to avoid assigning a stack offset to a variable that
can be bounced around the top of the stack until it can simply go
away?
Are you sure you'd be saving memory? More complex code means more
memory (and more development time and more bugs).
Louis
lkrupp *at* pssw *dot* com
-
Re: Optimizing stack access for a stack based VM
Alex McDonald wrote:
>>There are separate address space for (1) code, (2) data, (3) stack.
>
> The separation between code and data isn't required in Forth (although
> there may be advantages in certain architectures that treat each space
> differently, or don't like mixing of code and data).
IMO Forth is not a direct solution for the OP, since...
> "Variables" in Forth are words that push their addresses on the stack,
> as in the @ and ! example given above. Forth also has a value, or a
> self fetching variable.
Forth has no local variables (for recursive calls), the implementation
of e.g. loop counters requires two separate stacks, for data and
return addresses. Calling subroutines (Forth: compiled words) for the
retrieval of references to variables IMO also is overkill, in a VM
other than the Forth machine itself.
Nonetheless Forth is a source of inspiration, for everybody who wants
to learn more about really uncommon virtual machines :-)
DoDi
-
Re: Optimizing stack access for a stack based VM
Louis Krupp wrote:
> I do know that the earlier machines in this series (Burroughs B6700)
> kept the top two words of the stack (and their double precision
> extensions) in registers and the rest of the stack in memory. Later
> machines cached the top n words of the stack, which made stack access
> faster.
AFAIR a 16 bit TI microprocessor also used the top 16 entries on the
stack as registers. Perhaps this architecture reflected the Burroughs
machine?
DoDi
[That seems to be pretty typical. The HP3000 kept the top several elements
of the stack in registers as well. -John]
-
Re: Optimizing stack access for a stack based VM
> Did you have a look at the i80x87 FPU instruction set? Or at the Java
> VM, or at FORTH?
In my biased opinion, the prototypical non-ancient stack-based ISA is that of
the transputer.
Jan
-
Re: Optimizing stack access for a stack based VM
On Sep 13, 9:31 pm, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
> Alex McDonald wrote:
> >>There are separate address space for (1) code, (2) data, (3) stack.
>
> > The separation between code and data isn't required in Forth (although
> > there may be advantages in certain architectures that treat each space
> > differently, or don't like mixing of code and data).
>
> IMO Forth is not a direct solution for the OP, since...
I suggested the OP look at Forth, not implement Forth. There are many
good ideas in Forth, and he can avoid many of the problems others have
solved by doing a bit of research. Generating stack-based IL from
Pascal is not difficult; getting the right stack-based IL is the hard
part.
> > "Variables" in Forth are words that push their addresses on the stack,
> > as in the @ and ! example given above. Forth also has a value, or a
> > self fetching variable.
>
> Forth has no local variables (for recursive calls),
Your assertion is not true; Forth has local variables (actually, self-
fetching values) which can be used in recursion. Preferably, the stack
can be used for recursion and avoid locals entirely.
: fib ( n -- n' )
dup 1 > if
dup 1- recurse swap 2 - recurse +
then ;
> the implementation of e.g. loop counters requires two separate
> stacks, for data and return addresses.
Forth requires two stacks (data and return) as a matter of course, and
not just for loop counters or variables. This is not a big problem
even in architectures that don't directly support any stacks at all
(as I'm sure you know, stacks are easy to implement without hardware
support).
And loop counters and local variables can be held on the return stack,
but don't need to be. If recursion isn't used, the stack sizes
required for the data and return stacks need not be large; certainly,
tens of entries are more than adequate for most purposes, and rarely
if ever a requirement for hundreds or thousands of entries.
> Calling subroutines (Forth: compiled words) for the retrieval of
> references to variables IMO also is overkill, in a VM other than the
> Forth machine itself.
Your assumption is that Forth is exclusively subroutine-threaded.
Threaded code Forths are easy to implement, debug and port, which is
why threaded implementations are often described in the literature.
But modern Forths generate native code, rather than threaded code, and
(for instance) would compile this primitive and simplistic variable
increment;
x @ 1 + x !
into native code;
add x, 1
--
Regards
Alex McDonald
-
Re: Optimizing stack access for a stack based VM
DrDiettrich1@aol.com (Hans-Peter Diettrich) wrote:
> Forth has no local variables (for recursive calls),
Local variables are in the Forth Standard, optional word set IIRC.
Ken Young
-
Re: Optimizing stack access for a stack based VM
Hans-Peter Diettrich wrote:
> AFAIR a 16 bit TI microprocessor also used the top 16 entries on the
> stack as registers. Perhaps this architecture reflected the Burroughs
> machine?
The TI9900 might be what you remember. It was a PDP-11 wannabe, with 16
registers instead of 8. Some neat features and some weird features.
jeff
[The registers, which TI called the workspace, were in RAM, with a
pointer register saying where they were. So to stack the registers on
a subroutine call, you just adjusted the pointer register. Not really
a stack architecture, more a predecessor to the SPARC's register
windows. -John]
Similar Threads
-
By Application Development in forum RUBY
Replies: 8
Last Post: 08-31-2007, 11:21 PM
-
By Application Development in forum DOTNET
Replies: 5
Last Post: 07-14-2006, 04:19 PM
-
By Application Development in forum Compilers
Replies: 1
Last Post: 08-10-2005, 10:52 AM
-
By Application Development in forum Compilers
Replies: 11
Last Post: 08-07-2005, 03:21 PM
-
By Application Development in forum PROLOG
Replies: 3
Last Post: 06-30-2005, 02:02 PM