| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I blow the dust of my old Dragon book and started to read. I got some inspiration to make a very simple (read subset of a) C compiler. The implementation is a predictive parser that construct a syntax tree holding both expression and statements. As the grammar is recursive, so is the implementation. However, i wish to find some shortcuts to rewrite (at least some of) the recursive functions to be iterative. Are there any well known algorithms of doing this? I could not find much info on this topic in the dragon book. For instance the grammar for: expr -> term moreterms moreterms -> + term | - term | N5 Is inherently recursive. How would I write iterative code for this using my tree function makenode_op(operand, left, right)? An example in pseudo code would be most helpful to get me started. Thanks in advance! /Mike |
|
#2
| |||
| |||
| You probably want to identify tail recursion. If nothing is built up on the stack while recursing it is possible to avoid the recursive call. If you have a grammar like: a -> Ba|N then you can when implementing the parser instead of writing it like: sub a() matchB if (look == 'N') matchN else a() end sub You could write it like this: sub a() loop forever matchB if (look == 'N') matchN break out from loop You can also avoid doing a procedure call for every nonterminal. The grammar you wrote above you could fit into one procedure: sub expr term() if (look == '+') match+ term() elseif (look == '-') match- term() The above procedure cover both expr and moreterms. |
|
#3
| |||
| |||
| "Mike (News)" <mike@sesamgames.com> writes: > ... > The implementation is a predictive parser that construct a syntax tree > holding both expression and statements. As the grammar is recursive, > so is the implementation. However, i wish to find some shortcuts to > rewrite (at least some of) the recursive functions to be > iterative. Are there any well known algorithms of doing this? I could > not find much info on this topic in the dragon book. There are well-known algorithms for tail-recursion elimination, but using one of the fully-general algorithms is unlikely to help you much getting over your hurdle of developing an intuition for the process. I think you would be better off with what you ask for later in your post, an example. Your old dragon book shows an example of manually doing the tail-recursion elimination on a parsing procedure for a grammar like your example (or rather, like your example once debugged). However, it is shown in a context where no syntax tree is being built. Later, the dragon book shows how to use an L-attributed definition to build the syntax tree. Putting these two examples togeher would solve your problem, but it isn't obvious to a newcomer how that is done. That, I think, is the crux of your problem, and I address it below. However, I should also note that depending on what language you are writing your compiler in, and in some cases what compiler you are using to compile your compiler, doing the tail-recursion elimination by hand may be pointless. The compiler that is compiling your compiler may do it for you. As an example, I took the two parsers shown in C in the red dragon book, one tail-recursive, one with a loop, and ran them both through the gcc compiler (with optimization turned on). I told gcc to produce assembly language output rather than binary object code, and I compared the two versions. The result: essentially no difference. In this case, where the parsing routine was written in C, the fact that tail-recursion elimination was performed automatically was a property of the specific C compiler used, gcc. But there are some other languages, such as Scheme, where tail-recursion elimination is guaranteed by the language specification, and so you can feel free to leave your parser in the tail-recursive form, knowing that it will still generate an iterative process. > > For instance the grammar for: > > expr -> term moreterms > > moreterms -> + term | - term | N5 I think you goofed on the productions for moreterms. The N5 is surely supposed to be an epsilon, but there is also a more fundamental problem: you missed the tail recursion. The first two productions for moreterms should presumably allow further terms at the end: moreterms -> + term moreterms | - term moreterms | epsilon > Is inherently recursive. How would I write iterative code for this using > my tree function makenode_op(operand, left, right)? > > An example in pseudo code would be most helpful to get me started. First, let's write this using tail recursion and not building up the syntax tree: expr(): term() moreterms() moreterms(): if lookahead = '+': advance lookahead term() moreterms() else if lookahead = '-': advance lookahead term() moreterms() else if lookahead can legally follow: return else: report a syntax error Note that in practice, the check for a legally following symbol is frequently omitted. Now we can take this in two different directions, which we later merge back together. One is to eliminate the tail-position calls to moreterms(), the other is to introduce the building of a syntax-tree using an L-attributed definition. You can read these in either order. To eliminate the tail calls, we convert the moreterms procedure into a loop: expr(): term() // initially fall into moreterms repeat until exited: // this loop is moreterms if lookahead = '+': advance lookahead term() // loop back for moreterms else if lookahead = '-': advance lookahead term() // loop back for moreterms else if lookahead can legally follow: return else: report a syntax error If instead we want to leave the tail calls intact, but build the syntax tree, we would introduce an inherited attribute (procedure parameter) for the left context and a synthesized attribute (procedure return value) for the resulting syntax tree. I've written the below pseudo-code rather ploddingly, with a lot of unnecessary assignment statements to give meaningful names to the intermediate results, in the hopes that it helps you see what is going on: expr(): firstTerm = term() return moreterms(firstTerm) moreterms(left): if lookahead = '+': advance lookahead right = term() newLeft = makenode_op('+', left, right) return moreterms(newLeft) else if lookahead = '-': advance lookahead right = term() newLeft = makenode_op('-', left, right) return moreterms(newLeft) else if lookahead can legally follow: return left else: report a syntax error Now, we can combine these two versions, following the overall looping form of the version that doesn't have tail calls, but keeping track of the left context. In place of the parameter "left", we'll now have a local variable by that name, which will get its succesive values by assignment statements rather than parameter passing: expr(): left = term() // initially fall into moreterms repeat until exited: // this loop is moreterms if lookahead = '+': advance lookahead right = term() newLeft = makenode_op('+', left, right) left = newLeft // loop back for moreterms else if lookahead = '-': advance lookahead right = term() newLeft = makenode_op('-', left, right) left = newLeft // loop back for moreterms else if lookahead can legally follow: return left else: report a syntax error Of course, I will re-emphasize that this is intentionally plodding. At a minimum, you would ordinarily eliminate the newLeft variable and instead directly assign the result from makenode_op to the left variable. > > Thanks in advance! > > /Mike You're welcome. -Max Hailperin Professor of Computer Science Gustavus Adolphus College |
|
#4
| |||
| |||
| Mike (News) schrieb: > The implementation is a predictive parser that construct a syntax tree > holding both expression and statements. As the grammar is recursive, > so is the implementation. However, i wish to find some shortcuts to > rewrite (at least some of) the recursive functions to be > iterative. Are there any well known algorithms of doing this? > For instance the grammar for: > > expr -> term moreterms > > moreterms -> + term | - term | N5 > > > Is inherently recursive. How would I write iterative code for this using > my tree function makenode_op(operand, left, right)? One method would "flatten" the operator hierarchy, to essentially only one unary and binary operator class, covering all operators: moreterms -> binop term | epsilon Then something like an internal list (usually stack) of operators and terms is produced by moreterms, that is "folded" into an tree before exit, based on the operator precedence, grouping etc. Otherwise indirect recursion may not be eliminated, by tail recursion or a conversion of distinct productions. DoDi |
|
#5
| |||
| |||
| Thanks Max for your highly enlightening answer. I really learned a lot from your explanation. The dragon book is quite abstract, and it's not that easy to see how things could fit together. > I told gcc to produce assembly language output rather > than binary object code, and I compared the two versions. The result: > essentially no difference. Interesting, never thought about that. I'm using GCC at this time, but the idea is to (one day) let the compiler compile itself so i can bootstrap to other platforms. I try to use a very basic subset of C, and want to see how far I can get. The compiler has distinctive front and back ends. The front end generates three address code. The back end (a separate executable) generates x86 assembly for the time being. Even though the grammar is _very_ limited at this time, all phases are implemented and actually works. The only optimization is evaluation of constant expressions and peephole optimization, so the produced code is not really in the GCC class The thing is that i only after a fewweeks, on spare (=night) time, actually managed to write a C-like compiler that works. What i learned from earlier attempts is to make ALL phases of the compiler from start, even though the language is extremely simple and the produced code is terrible. > I think you goofed on the productions for moreterms. The N5 is surely > supposed to be an epsilon... Yes, "N5" was epsilon when i mailed it... And yes, i goofed. > expr(): > left = term() > // initially fall into moreterms > repeat until exited: // this loop is moreterms > if lookahead = '+': > advance lookahead > right = term() > newLeft = makenode_op('+', left, right) > left = newLeft > // loop back for moreterms > else if lookahead = '-': > advance lookahead > right = term() > newLeft = makenode_op('-', left, right) > left = newLeft > // loop back for moreterms > else if lookahead can legally follow: > return left > else: > report a syntax error > The above pseudo code makes me a bit wondering though: "return left" would return a leaf, and not the 'root' of the expression, correct? I try to figure out how to return the root rather than the leaf, but no matter how i try, the result is a very cluttered code. So, if i understand the above code correctly (and made the right assumption on left vs root returned) then the recursive version has several advantages; readability, speed and code size. The only drawback is that the stack needs to be pretty big and/or some mechanism for tracking the stack to avoid stack overflow (and a certain crash) must be implemented. What is the most common solution in C compilers and similar? Again, thanks a lot! /Mike |
|
#6
| |||
| |||
| Mikael StrC6m <mikael@sesamgames.com> writes: ..... > The above pseudo code makes me a bit wondering though: "return left" > would return a leaf, and not the 'root' of the expression, correct? No, incorrect. Take a look at what statements in the code assign a value to the variable "left". To know what is returned by "return left", you need to look at what the most recent assignment to that variable is, at the time of the return statement. Except in the case when the expression is a single term (with no '+' or '-' following it), the most recent assignment will always be from these two lines, or the corresponding ones for '-': newLeft = makenode_op('+', left, right) left = newLeft So, the value being returned will be one of the nodes produced by makenode_op, refuting your prediction that it would be a leaf. To see more specifically that it is the root, consider an expression like a-b-c where a, b, and c are terms. If for simplicity we assume that the term() procedure returns just the characters 'a', 'b', 'c' for these terms, then the values that the variable "left" will take on over the course of the expr() procedure's execution are 'a' makenode_op('-', 'a', 'b') makenode_op('-', makenode_op('-', 'a', 'b'), 'c') Notice that the first of these values for left shows up as the second parameter passed to makenode_op in constructing the second value for left. And similarly, the second value for left shows up as the second parameter passed to makenode_op in constructing the third value for left. This is because in each case, the new value of left is being computed as makenode_op('-', left, right). And the third value for left, the one that is returned, is in fact the root of the tree. > ... Again, thanks a lot! You're welcome. -max |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.