| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I am presently involved in a compiler project in C++ that is mirroring the BASIC grammar extensively. The early stages of this project is more of a proof of concept more than diving too deeply into many extremities that will certainly prove very challenging as this project moves forward. So far I've developed a Lexer object that returns tokens to my Parser class. The parser class currently reads these tokens and creates a very simplistic AST tree. For this proof of concept, I am using a very basic source file that contains the following: PRINT 2+3*6 As I indicated above, we're using BASIC like grammar. The above example would simply print to the screen the value of "20" and end if it were to be compiled and executed. We've taken the Parser to the point where it is creating a very simplistic AST tree which would look something much like the following for our example: CASTFunctionCall +- CASTSimpleName +- Name: PRINT +- CASTExprList +- List Size: 1 +- CASTBinaryExpr +- Operation: '+' +- CASTIntegerLiteral +- Number: 2 +- CASTBinaryExpr +- Operation: * +- CASTIntegerLiteral +- Number: 3 +- CASTIntegerLiteral +- Number: 6 Naturally once the AST tree is created above, I need to use visitor patterns to do a number of passes. Since my example is extremely simple, I can see three passes required: - Pass #1 : Type Analysis - Pass #2 : Constant Folding - Pass #3 : Code Generation Am I off base here ? Where I am currently getting caught is in the first two phases. What is the ideal way to layer type information into the AST tree, followed by constant folding, followed by any other checks prior to compile time? Keep in mind of course the next step will be to include variable declarations and how to assign values to those variables and even working with very simple function/procedure calls (both code specific and external library calls to things like DLL functions). Chris |
|
#2
| |||
| |||
| CranCran77 schrieb: > simple, I can see three passes required: > > - Pass #1 : Type Analysis > - Pass #2 : Constant Folding > - Pass #3 : Code Generation > > Am I off base here ? > > Where I am currently getting caught is in the first two phases. What > is the ideal way to layer type information into the AST tree, followed > by constant folding, followed by any other checks prior to compile > time? The efforts may depend on the actual brand of BASIC you're targeting. When a variable can hold any type at any time, depending on the last assignment, or depending on the last declaration passed in control flow, a type analysis IMO will be quite fruitless. Otherwise, when the type of a variable is known at compile time, you have the usual choice of up-/downcasting to the next applicable type in expression evaluation. That's easy when numerical and textual (string) data cannot be mixed. Otherwise you'll have to specify what the result of e.g. 1+"1" shall be - it might be "11", 2, or some even more weird result. That's one of the common pitfalls also in Java :-( Even Microsoft couldn't find *valid* optimizations for Visual Basic, that would result in significantly faster execution of compiled vs. interpreted code. DoDi |
|
#3
| |||
| |||
| On Sun, 24 Aug 2008 01:11:27 +0200, Hans-Peter Diettrich <DrDiettrich1@aol.com> wrote: <snip> >Even Microsoft couldn't find *valid* optimizations for Visual Basic, >that would result in significantly faster execution of compiled vs. >interpreted code. I am not sure that it is possible for any implementation of Basic, considering the large number of run-time calls. Most of the optimizations in BCET are because of the lousy code generator. It was easier to do it that way. :-) -- ArarghMail808 at [drop the 'http://www.' from ->] http://www.arargh.com BCET Basic Compiler Page: http://www.arargh.com/basic/index.html |
|
#4
| |||
| |||
| On Aug 23, 6:11 pm, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote: In the simple example I provided, I know my end result (without optimizations) would result in the following: PUSHI 2 PUSHI 3 PUSHI 6 IMUL IADD PRINTI Had the code been optimized by the compiler at compile time, the resulting code would have been: PUSHI 20 PRINTI But I'm not at the "optimization" stage just yet; that is still to come. The goal presently is to understand how I should handle type assignments to nodes within the AST tree. As you see in my example, I had several nodes that were CASTIntegerLiteral, but I question if that is the right approach or not. In my opinion, the parser which is creating the AST tree should not be responsible for determining the number type of Integer, Real, or Long. A simple CASTNumberLiteral node seems more appropriate to me and allow the type analysis check phase when traversing the tree determine whether its Integer, Real, or Long right? If my thought process is correct, then when my type check visitor traverses the nodes of the tree, certain AST nodes need to be decorated with type information, such as Integer, Real, or Long. When examining expressions such as binary ones, both sides to the operation need to be examined to make sure they both conform to the same type, and if not handle that appropriately. For example, "PRINT 2.5*3" would throw an exception with type mismatch because 3 would be evaluated to integer where-as 2.5 would be viewed as a real. A different method could be that it would turn the statement into "PRINT 2.5*3.0" so that both sides are viewed as real values for the computation to happen correctly. In this case during the binary expression type check, the right side of the operation would become the child of another another AST node that handles converting the result into a given type by an internal function call. So the tree would resemble the following: CASTFunctionCall +- CASTSimpleName +- Name: PRINT +- CASTExprList +- List Size: 1 +- CASTBinaryExpr +- Operation: '*' +- CASTNumberInteger +- Number: 2.5 +- CASTFunctionCall +- CASTSimpleName +- Name: INT2REAL +- CASTExprList +- List Size: 1 +- CASTNumberLiteral +- Number: 3 I suppose what I see is that I could use a class object called CASTType with derived classes for Integer, Long, Real, and String. Each derived class would contain information about the size of the type for storage purposes and/or reference pointer for strings in the constant pool. So each CASTNumberLiteral and CASTBinaryExpr objects would be assigned a CASTType object during the type check phase. Each CASTFunctionCall I also believe would get assigned a type as well in keeping with the assembly example above. Am I on the right track? > The efforts may depend on the actual brand of BASIC you're targeting. > When a variable can hold any type at any time, depending on the last > assignment, or depending on the last declaration passed in control > flow, a type analysis IMO will be quite fruitless. > > Otherwise, when the type of a variable is known at compile time, you > have the usual choice of up-/downcasting to the next applicable type in > expression evaluation. That's easy when numerical and textual (string) > data cannot be mixed. Otherwise you'll have to specify what the result > of e.g. 1+"1" shall be - it might be "11", 2, or some even more weird > result. That's one of the common pitfalls also in Java :-( |
|
#5
| |||
| |||
| "CranCran77" <crancran@gmail.com> wrote in message > On Aug 23, 6:11 pm, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote: > > In the simple example I provided, I know my end result (without > optimizations) would result in the following: > > PUSHI 2 > PUSHI 3 > PUSHI 6 > IMUL > IADD > PRINTI > > Had the code been optimized by the compiler at compile time, the > resulting code would have been: > > PUSHI 20 > PRINTI > > But I'm not at the "optimization" stage just yet; that is still to > come. This 'optimisation' is actually fairly easy and is worth doing anyway. You need to do your type analysis first, then call something like Evaluate() on each node, which for a binary op: Evaluate(left operand) Evaluate(right operand) If left and right operands are now immediate values, do the calculation and convert this node to a terminal. > The goal presently is to understand how I should handle type > assignments to nodes within the AST tree. > > As you see in my example, I had several nodes that were > CASTIntegerLiteral, but I question if that is the right approach or > not. In my opinion, the parser which is creating the AST tree should > not be responsible for determining the number type of Integer, Real, > or Long. A simple CASTNumberLiteral node seems more appropriate to me > and allow the type analysis check phase when traversing the tree > determine whether its Integer, Real, or Long right? Depends on your implementation language, which may store all these differently (but all my parsers differentiate between integer and real literals). How would you know otherwise whether the value is integer or real? > If my thought process is correct, then when my type check visitor > traverses the nodes of the tree, certain AST nodes need to be > decorated with type information, such as Integer, Real, or Long. Is this Basic language dynamically typed with regard to numeric values? Because if you're using tagged variables that makes things awkward, unless you forget about integer and real and just have a Numeric type. > When > examining expressions such as binary ones, both sides to the operation > need to be examined to make sure they both conform to the same type, > and if not handle that appropriately. This can get tricky for complex expressions. For binary ops, traverse each operand subtree, leaving the required type open. Then choose the dominant type of the two, and insert a cast if needed to one operand (or generate a type error if required). This would also have been done at lower levels. But it means that in: 3.5*4 + 5*6 which ends up as real, the 5*6 term uses integer arithmetic, in this case not a problem, but you may prefer that all subops are treated as real. Perhaps then re-traverse the right-hand subtree and require the type to be real. > > For example, "PRINT 2.5*3" would throw an exception with type mismatch I don't think your users would be happy with that. > because 3 would be evaluated to integer where-as 2.5 would be viewed > as a real. A different method could be that it would turn the > statement into "PRINT 2.5*3.0" so that both sides are viewed as real > values for the computation to happen correctly. > > In this case during the binary expression type check, the right side > of the operation would become the child of another another AST node > that handles converting the result into a given type by an internal > function call. So the tree would resemble the following: > > CASTFunctionCall > +- CASTSimpleName > +- Name: PRINT > +- CASTExprList > +- List Size: 1 > +- CASTBinaryExpr > +- Operation: '*' > +- CASTNumberInteger > +- Number: 2.5 > +- CASTFunctionCall > +- CASTSimpleName > +- Name: INT2REAL > +- CASTExprList > +- List Size: 1 > +- CASTNumberLiteral > +- Number: 3 > > I suppose what I see is that I could use a class object called > CASTType with derived classes for Integer, Long, Real, and String. > Each derived class would contain information about the size of the > type for storage purposes and/or reference pointer for strings in the > constant pool. > > So each CASTNumberLiteral and CASTBinaryExpr objects would be assigned > a CASTType object during the type check phase. Each CASTFunctionCall > I also believe would get assigned a type as well in keeping with the > assembly example above. > > Am I on the right track? I put your PRINT 2.5*3 into one of my (rather ancient) compilers, and it produced this equivalent to ASTs: After parsing: 0005: Sysfn <> #: 1 [This is PRINT] 0005: Opc <> Mul 0005: Value <Real*8> [Val:Real*8] 2.500000 0005: Value <Int*1> [Val:Int*4] 3 After type analysis: 0005: Sysfn <> #: 1 0005: Opc <Real*8> Mul 0005: Value <Real*8> [Val:Real*8] 2.500000 0005: Value <Real*8> [Val:Real*8] 3.000000 After evaluating constants: 0005: Sysfn <> #: 1 0005: Value <Real*8> [Val:Real*8] 7.500000 No casts are used here, but using instead PRINT X*I (X is real, I is integer) it gives: After parsing: 0007: Sysfn <> #: 1 0007: Opc <> Mul 0007: Value <Real*8> [Val:Var](x 3C520AH) 0007: Value <Int*4> [Val:Var](i 3C5256H) After type analysis: 0007: Sysfn <> #: 1 0007: Opc <Real*8> Mul 0007: Mem <Real*8> [Val:Var](x 3C520AH) 0007: Softconv <Real*8> [ie. a Cast] 0007: Mem <Int*4> [Val:Var](i 3C5256H) -- Bartc |
|
#6
| |||
| |||
| CranCran77 schrieb: > The goal presently is to understand how I should handle type > assignments to nodes within the AST tree. > > As you see in my example, I had several nodes that were > CASTIntegerLiteral, but I question if that is the right approach or > not. In my opinion, the parser which is creating the AST tree should > not be responsible for determining the number type of Integer, Real, > or Long. A simple CASTNumberLiteral node seems more appropriate to me > and allow the type analysis check phase when traversing the tree > determine whether its Integer, Real, or Long right? I agree in so far, as the compiler can know better about the most appropriate representation of a numerical constant, in preparation of the compiled code for expressions. Even with known types (of variables), type conversions may be required in the evaluation of any expression. Then the "cost" of the conversion of an literal is zero, in contrast to the conversion of some predefined value. Likewise the cost of operations with all constant arguments is zero, because it can be done at compile time. Here a reordering of a whole (sub-)expression may result in smaller and faster code. Also have a look at Common Subexpression Elimination (CSE). But I'd not simply neglect the exact type of a literal, because sometimes a real constant of e.g. 2.0 can be a hint to the compiler, to use real arithmetic, where integer arithmetic could result in an unwanted overflow in some intermediate result. The same for explicit type casts, i.e. Long(2) should really result in Long arithmetic, even if Byte arithmetic would fit together with the other operands. > If my thought process is correct, then when my type check visitor > traverses the nodes of the tree, certain AST nodes need to be > decorated with type information, such as Integer, Real, or Long. When > examining expressions such as binary ones, both sides to the operation > need to be examined to make sure they both conform to the same type, > and if not handle that appropriately. Remember that every operator node will have an assigned result type, as determined from the involved operands, and that required dynamic type conversions may have to be inserted into the AST. I had to implement similar tree transformations in my C decompiler, in order to throw out type consersions and temporary variables, before emitting the "optimized" source code. > For example, "PRINT 2.5*3" would throw an exception with type mismatch > because 3 would be evaluated to integer where-as 2.5 would be viewed > as a real. A different method could be that it would turn the > statement into "PRINT 2.5*3.0" so that both sides are viewed as real > values for the computation to happen correctly. I'd not insist in too many type casts in source code, when the intention is clear. Only when e.g. the result of a division can differ, between an integer and a real DIV, or when an overflow shall be prevented by extending an value to a certain byte count, the according type casts should be required and respected in the source code. > I suppose what I see is that I could use a class object called > CASTType with derived classes for Integer, Long, Real, and String. > Each derived class would contain information about the size of the > type for storage purposes and/or reference pointer for strings in the > constant pool. I'd use as few node classes as possible, with properties indicating e.g. the kind of an unary or binary operator node, and the result type. How else would you transform an node "operator+" into "IntADD" or "RealADD" later? IMO all nodes with computational values should have a common base class, with possible subclasses for literals, variables, operators and functions. DoDi |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.