Parser question

This is a discussion on Parser question within the Compilers forums in Theory and Concepts category; I blow the dust of my old Dragon book and started to read. I got some inspiration to make a very simple (read subset of a) C compiler. The implementation is a predictive parser that construct a syntax tree holding both expression and statements. As the grammar is recursive, so is the implementation. However, i wish to find some shortcuts to rewrite (at least some of) the recursive functions to be iterative. Are there any well known algorithms of doing this? I could not find much info on this topic in the dragon book. For instance the grammar for: expr ...

Go Back   Application Development Forum > Theory and Concepts > Compilers

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 07-11-2008, 07:21 AM
Mike (News)
Guest
 
Default Parser question

I blow the dust of my old Dragon book and started to read. I got some
inspiration to make a very simple (read subset of a) C compiler.

The implementation is a predictive parser that construct a syntax tree
holding both expression and statements. As the grammar is recursive,
so is the implementation. However, i wish to find some shortcuts to
rewrite (at least some of) the recursive functions to be
iterative. Are there any well known algorithms of doing this? I could
not find much info on this topic in the dragon book.

For instance the grammar for:

expr -> term moreterms

moreterms -> + term | - term | N5


Is inherently recursive. How would I write iterative code for this using
my tree function makenode_op(operand, left, right)?

An example in pseudo code would be most helpful to get me started.

Thanks in advance!

/Mike

Reply With Quote
  #2  
Old 07-13-2008, 07:33 AM
Aeran
Guest
 
Default Re: Parser question

You probably want to identify tail recursion. If nothing is built up on
the stack while recursing it is possible to avoid the recursive call. If
you have a grammar like: a -> Ba|N then you can when implementing the
parser instead of writing it like:
sub a()
matchB
if (look == 'N') matchN else a()
end sub

You could write it like this:
sub a()
loop forever
matchB
if (look == 'N')
matchN
break out from loop

You can also avoid doing a procedure call for every nonterminal. The
grammar you wrote above you could fit into one procedure:

sub expr
term()
if (look == '+')
match+
term()
elseif (look == '-')
match-
term()

The above procedure cover both expr and moreterms.
Reply With Quote
  #3  
Old 07-13-2008, 09:12 AM
Max Hailperin
Guest
 
Default Re: Parser question

"Mike (News)" <mike@sesamgames.com> writes:

> ...
> The implementation is a predictive parser that construct a syntax tree
> holding both expression and statements. As the grammar is recursive,
> so is the implementation. However, i wish to find some shortcuts to
> rewrite (at least some of) the recursive functions to be
> iterative. Are there any well known algorithms of doing this? I could
> not find much info on this topic in the dragon book.


There are well-known algorithms for tail-recursion elimination, but
using one of the fully-general algorithms is unlikely to help you much
getting over your hurdle of developing an intuition for the process.
I think you would be better off with what you ask for later in your
post, an example. Your old dragon book shows an example of manually
doing the tail-recursion elimination on a parsing procedure for a
grammar like your example (or rather, like your example once
debugged). However, it is shown in a context where no syntax tree is
being built. Later, the dragon book shows how to use an L-attributed
definition to build the syntax tree. Putting these two examples
togeher would solve your problem, but it isn't obvious to a newcomer
how that is done. That, I think, is the crux of your problem, and I
address it below.

However, I should also note that depending on what language you are
writing your compiler in, and in some cases what compiler you are
using to compile your compiler, doing the tail-recursion elimination
by hand may be pointless. The compiler that is compiling your
compiler may do it for you. As an example, I took the two parsers
shown in C in the red dragon book, one tail-recursive, one with a
loop, and ran them both through the gcc compiler (with optimization
turned on). I told gcc to produce assembly language output rather
than binary object code, and I compared the two versions. The result:
essentially no difference. In this case, where the parsing routine
was written in C, the fact that tail-recursion elimination was
performed automatically was a property of the specific C compiler
used, gcc. But there are some other languages, such as Scheme, where
tail-recursion elimination is guaranteed by the language
specification, and so you can feel free to leave your parser in the
tail-recursive form, knowing that it will still generate an iterative
process.

>
> For instance the grammar for:
>
> expr -> term moreterms
>
> moreterms -> + term | - term | N5


I think you goofed on the productions for moreterms. The N5 is surely
supposed to be an epsilon, but there is also a more fundamental
problem: you missed the tail recursion. The first two productions
for moreterms should presumably allow further terms at the end:

moreterms -> + term moreterms | - term moreterms | epsilon

> Is inherently recursive. How would I write iterative code for this using
> my tree function makenode_op(operand, left, right)?
>
> An example in pseudo code would be most helpful to get me started.


First, let's write this using tail recursion and not building up the
syntax tree:

expr():
term()
moreterms()

moreterms():
if lookahead = '+':
advance lookahead
term()
moreterms()
else if lookahead = '-':
advance lookahead
term()
moreterms()
else if lookahead can legally follow:
return
else:
report a syntax error

Note that in practice, the check for a legally following symbol is
frequently omitted.

Now we can take this in two different directions, which we later merge
back together. One is to eliminate the tail-position calls to
moreterms(), the other is to introduce the building of a syntax-tree
using an L-attributed definition. You can read these in either order.

To eliminate the tail calls, we convert the moreterms procedure into a loop:

expr():
term()
// initially fall into moreterms
repeat until exited: // this loop is moreterms
if lookahead = '+':
advance lookahead
term()
// loop back for moreterms
else if lookahead = '-':
advance lookahead
term()
// loop back for moreterms
else if lookahead can legally follow:
return
else:
report a syntax error

If instead we want to leave the tail calls intact, but build the
syntax tree, we would introduce an inherited attribute (procedure
parameter) for the left context and a synthesized attribute (procedure
return value) for the resulting syntax tree. I've written the below
pseudo-code rather ploddingly, with a lot of unnecessary assignment
statements to give meaningful names to the intermediate results, in
the hopes that it helps you see what is going on:

expr():
firstTerm = term()
return moreterms(firstTerm)

moreterms(left):
if lookahead = '+':
advance lookahead
right = term()
newLeft = makenode_op('+', left, right)
return moreterms(newLeft)
else if lookahead = '-':
advance lookahead
right = term()
newLeft = makenode_op('-', left, right)
return moreterms(newLeft)
else if lookahead can legally follow:
return left
else:
report a syntax error

Now, we can combine these two versions, following the overall looping
form of the version that doesn't have tail calls, but keeping track of
the left context. In place of the parameter "left", we'll now have a
local variable by that name, which will get its succesive values by
assignment statements rather than parameter passing:

expr():
left = term()
// initially fall into moreterms
repeat until exited: // this loop is moreterms
if lookahead = '+':
advance lookahead
right = term()
newLeft = makenode_op('+', left, right)
left = newLeft
// loop back for moreterms
else if lookahead = '-':
advance lookahead
right = term()
newLeft = makenode_op('-', left, right)
left = newLeft
// loop back for moreterms
else if lookahead can legally follow:
return left
else:
report a syntax error

Of course, I will re-emphasize that this is intentionally plodding.
At a minimum, you would ordinarily eliminate the newLeft variable and
instead directly assign the result from makenode_op to the left
variable.

>
> Thanks in advance!
>
> /Mike


You're welcome.
-Max Hailperin
Professor of Computer Science
Gustavus Adolphus College
Reply With Quote
  #4  
Old 07-15-2008, 06:42 AM
Hans-Peter Diettrich
Guest
 
Default Re: Parser question

Mike (News) schrieb:

> The implementation is a predictive parser that construct a syntax tree
> holding both expression and statements. As the grammar is recursive,
> so is the implementation. However, i wish to find some shortcuts to
> rewrite (at least some of) the recursive functions to be
> iterative. Are there any well known algorithms of doing this?


> For instance the grammar for:
>
> expr -> term moreterms
>
> moreterms -> + term | - term | N5
>
>
> Is inherently recursive. How would I write iterative code for this using
> my tree function makenode_op(operand, left, right)?


One method would "flatten" the operator hierarchy, to essentially only
one unary and binary operator class, covering all operators:

moreterms -> binop term | epsilon

Then something like an internal list (usually stack) of operators and
terms is produced by moreterms, that is "folded" into an tree before
exit, based on the operator precedence, grouping etc.

Otherwise indirect recursion may not be eliminated, by tail recursion or
a conversion of distinct productions.

DoDi

Reply With Quote
  #5  
Old 07-16-2008, 01:48 PM
Mikael Ström
Guest
 
Default Re: Parser question

Thanks Max for your highly enlightening answer. I really learned a lot
from your explanation. The dragon book is quite abstract, and it's not
that easy to see how things could fit together.

> I told gcc to produce assembly language output rather
> than binary object code, and I compared the two versions. The result:
> essentially no difference.


Interesting, never thought about that. I'm using GCC at this time, but
the idea is to (one day) let the compiler compile itself so i can
bootstrap to other platforms. I try to use a very basic subset of C, and
want to see how far I can get. The compiler has distinctive front and
back ends. The front end generates three address code. The back end (a
separate executable) generates x86 assembly for the time being. Even
though the grammar is _very_ limited at this time, all phases are
implemented and actually works. The only optimization is evaluation of
constant expressions and peephole optimization, so the produced code is
not really in the GCC class The thing is that i only after a few
weeks, on spare (=night) time, actually managed to write a C-like
compiler that works. What i learned from earlier attempts is to make ALL
phases of the compiler from start, even though the language is extremely
simple and the produced code is terrible.

> I think you goofed on the productions for moreterms. The N5 is surely
> supposed to be an epsilon...


Yes, "N5" was epsilon when i mailed it... And yes, i goofed.

> expr():
> left = term()
> // initially fall into moreterms
> repeat until exited: // this loop is moreterms
> if lookahead = '+':
> advance lookahead
> right = term()
> newLeft = makenode_op('+', left, right)
> left = newLeft
> // loop back for moreterms
> else if lookahead = '-':
> advance lookahead
> right = term()
> newLeft = makenode_op('-', left, right)
> left = newLeft
> // loop back for moreterms
> else if lookahead can legally follow:
> return left
> else:
> report a syntax error
>


The above pseudo code makes me a bit wondering though: "return left"
would return a leaf, and not the 'root' of the expression, correct?

I try to figure out how to return the root rather than the leaf, but no
matter how i try, the result is a very cluttered code.

So, if i understand the above code correctly (and made the right
assumption on left vs root returned) then the recursive version has
several advantages; readability, speed and code size. The only drawback
is that the stack needs to be pretty big and/or some mechanism for
tracking the stack to avoid stack overflow (and a certain crash) must be
implemented.

What is the most common solution in C compilers and similar?

Again, thanks a lot!

/Mike
Reply With Quote
  #6  
Old 07-19-2008, 09:17 PM
Max Hailperin
Guest
 
Default Re: Parser question

Mikael StrC6m <mikael@sesamgames.com> writes:

.....
> The above pseudo code makes me a bit wondering though: "return left"
> would return a leaf, and not the 'root' of the expression, correct?


No, incorrect. Take a look at what statements in the code assign a
value to the variable "left". To know what is returned by "return
left", you need to look at what the most recent assignment to that
variable is, at the time of the return statement. Except in the case
when the expression is a single term (with no '+' or '-' following
it), the most recent assignment will always be from these two lines,
or the corresponding ones for '-':

newLeft = makenode_op('+', left, right)
left = newLeft

So, the value being returned will be one of the nodes produced by
makenode_op, refuting your prediction that it would be a leaf. To see
more specifically that it is the root, consider an expression like

a-b-c

where a, b, and c are terms. If for simplicity we assume that the term()
procedure returns just the characters 'a', 'b', 'c' for these terms,
then the values that the variable "left" will take on over the course
of the expr() procedure's execution are

'a'
makenode_op('-', 'a', 'b')
makenode_op('-', makenode_op('-', 'a', 'b'), 'c')

Notice that the first of these values for left shows up as the second
parameter passed to makenode_op in constructing the second value for
left. And similarly, the second value for left shows up as the second
parameter passed to makenode_op in constructing the third value for
left. This is because in each case, the new value of left is being
computed as makenode_op('-', left, right). And the third value for
left, the one that is returned, is in fact the root of the tree.

> ... Again, thanks a lot!


You're welcome. -max

Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:42 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.