| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Hi! I've been writing my own scripting language for 6 months (with some small breaks). I wrote the lexer, parser (similar to recursive descent, but extended; LL grammar) and now I'm writing the interpreter. The syntax is similar to C/C++ and the language is mostly influenced by PHP and Lua. Sample code: Code: // a comment
class Child extends Parent
{
public x;
protected y;
private z;
static public function Run()
{
/*
other stuff
*/
super.Run();
}
}
t = [ 1, 2, 3 ]; // array, like array() in PHP
a =
[
x = t,
'y' = 15,
'z' = 'to jest co6',
];
foreach( a as k,v ): petla
{
if( is_array(v) )
foreach( v as v2 )
print(v2);
else
print(v);
}
for( i = 0; i < 5; ++i )
print(i);
k = new Child a;
difference will be that you can do Code: t.DoSth(); Code: array_DoSth(t); difficulties. One small decision, should variables should be declared: Code: var a = 5; // or local a = 5; // like in lua or unreal script, or global a = 5; // declaration of a global variable in given scope, not like referencing global variables in PHP, or a = 5; have either global or local (function) scope. So writing: Code: if( variable )
{
a = 5;
}
print(a);
in C++. But I'm willing to implement it as per block (like in C/C++) scope. Anyone seeing any prons/cons? The interpreter is a simple AST walking class, but when some problems are fixed I will replace it with a bytecode VM (like in Lua). And as for the VM itself... stack or register based? ![]() That's all of my questions (doubts) so far. Thanks for your help. But if you're just gonna write 'use (f)lex/yacc/whatever' or 'why another language, python/ruby/php/whatever is great' please don't :P I'm not even going to read it. Everyone else is invited to this discussion. Help me build my first scripting language! ![]() |
|
#2
| |||
| |||
| On Aug 26, 1:41 am, lican <lica...@gmail.com> wrote: > Anyway I'm having some > difficulties. One small decision, should variables should be declared: > > Code: > var a = 5; // or > local a = 5; // like in lua or unreal script, or > global a = 5; // declaration of a global variable in given scope, not > like referencing global variables in PHP, or > a = 5; > IMO, dynamic languages need to require variable declarations. I still loathe the waste of time PHP caused because I missed a typo in a variable name. That's something the compiler should catch. Or at least handle unit test in a simple way like Ruby. I believe the reason typos don't bite Ruby people so much because they discover them via unit tests easily. Also ruby complains if you access a name which it hasn't seen before. I can't remember what PHP does in that situation. Regarding the way how to declare the variables: It depends on your needs. If you plan to support proper class support you need to be able to say which members are public, protected or private. There are two ways: Either you put the modifier in front of each member (like Java/ C#) or you put the modifier in a separate line and say, that everything afterwards has this modifier (like C++/Ruby). It depends to a part on the need of proper declaration and to a part on the aesthetics - in other words, if it looks ugly or not. > And the second problem. What kind of scope to implement. In PHP you > have either global or local (function) scope. So writing: > > Code: > if( variable )
> {
> a = 5;
>
> }
>
> print(a);
>
> gives the result '5'. Something like this would mean a compiler error > in C++. But I'm willing to implement it as per block (like in C/C++) > scope. Anyone seeing any prons/cons? Using scopes has the advantage to reduce the variable life time. If you only need the variable a certain number of lines, then accessing it later should give an error as it goes against declared intent. On the other hand is separating declaration and assignment a tad ugly (even if you don't use "var a", you have to put the symbol into the symbol table somehow). Also you can shadow variables, if a declaration before prior use is required. If you don't like that you can prevent any total shadowing (unlike shadowing member variables which can still be accessed via "this.a") like C# does (read its spec as it is a bit more involved than I hinted on). > The interpreter is a simple AST > walking class, but when some problems are fixed I will replace it with > a bytecode VM (like in Lua). And as for the VM itself... stack or > register based? ![]() IIRC, .NET uses stack because it makes it easier to verify bytecode. But I haven't looked into VM design myself, so I can't say anymore on this subject. Johannes |
|
#3
| |||
| |||
| On 26 Aug, 01:41, lican <lica...@gmail.com> wrote: > I've been writing my own scripting language for 6 months (with some > small breaks). I wrote the lexer, parser (similar to recursive > descent, but extended; LL grammar) and now I'm writing the > interpreter. The syntax is similar to C/C++ and the language is mostly > influenced by PHP and Lua. Sample code: This is a bit similar to what I am playing around with, let me give you my own very subjective opinions. > foreach( a as k,v ): petla > { > if( is_array(v) ) > foreach( v as v2 ) > print(v2); > else > print(v); > > } If you want a more pure OO, it would make more sense with v.is_array() than the functional is_array(v). Also consider "Child.new()" instead of "new Child()", since the former allows you to easily create class clusters (http://developer.apple.com/ documentation/Cocoa/Conceptual/CocoaFundamentals/CocoaObjects/ chapter_3_section_9.html) > etc. Upgraded PHP? Something like that... Anyway I'm having some > difficulties. One small decision, should variables should be declared: > > var a = 5; // or > local a = 5; // like in lua or unreal script, or > global a = 5; // declaration of a global variable in given scope, not > like referencing global variables in PHP, or > a = 5 All of these have advantages and drawbacks. Using something like "var" signals to the reader that the variable is actually created at this point and it allows you to catch errors like: var someVal = 5 if (someThing) someVar = 7 // Incorrect spelling of "someVal" immediately detected by compiler. On the other hand "a = 5" is very convenient and reduces the amount of text you both have to read and write. So it is a trade-off. > And the second problem. What kind of scope to implement. In PHP you > have either global or local (function) scope. So writing: > > if( variable ) > { > a = 5; > } > print(a); This would be more clear-cut if you had variable-declarations explicit. In that case, the scope would be expected to be in the block where it is declared. Without a declaration then function scope is more reasonable, the reason is this: // If enforcing block scope, we need to make a peudo-declaration here: a = 616; // dummy value to move "a" outside of the if-blocks. if (variable) { a = 5; } else { a = 6; } > walking class, but when some problems are fixed I will replace it with > a bytecode VM (like in Lua). And as for the VM itself... stack or > register based? ![]() This paper argues that register based VMs are better: http://www.usenix.org/events/vee05/f...p153-yunhe.pdf /Christoffer |
|
#4
| |||
| |||
| Thanks. As Johannes said it's rather a matter of taste if someone wants to declare variables with or without a keyword. I'm also aware that depending on the method of declaration the scope matter will be rather straightforward. I think I'll go with the var keyword. And as for class fields declaration like "public someVar;" would be sufficient, without "var publiuc someVar;". Also this kind of solves the scope problem. I chose the per-block type. Also forgot to write that I am in fact planning to do something like "a.is_array();" (almost pure OO). The same for strings and any other class: Code: s = "some text";
if( s.Length() < 5 )
s.Replace('s','t');
if( s.Is(string) )
// sth
(don't remember where really) that there's no significant difference when it comes to bytecode verification. It's generally done by a separate (slower) bytecode reader - interpreter. Some time ago I read that paper you sent Christoffer (also a similar paper can be found on the Lua page regarding their transition from stack to register VM). They claim that the register one is faster so I'll go with that. I have some spare time now so I'm willing to experiment. The OO code is one of my priorities. I think that even the simple types like int should have some class for let's say conversion (a = 5; a.ToFloat()) and such. It really simplifies some things like a.ToFloat().Floor().ToString() all done in one line I know it's anextreme example, but I think you get my point. ToString(Floor((float)a)) doesn't look so good (or maybe it's also a matter of taste). To be honest I never really heard of class clusters, but surely I'll look into it. Thanks for your help. Mark |
|
#5
| |||
| |||
| On Aug 29, 5:41 pm, lican <lica...@gmail.com> wrote: > Also forgot to write > that I am in fact planning to do something like > "a.is_array();" (almost pure OO). The same for strings and any other > class: If you create an array class (like in .NET), you can write a.Is(Array), too. Would be more consistent and can be expanded to cover similar cases (like b.Is(Complex<T>) or b.Is(Complex<float>)). .... > The OO code is one of my priorities. I think that even the simple > types like int should have some class for let's say conversion (a = 5; > a.ToFloat()) and such. It really simplifies some things like > a.ToFloat().Floor().ToString() all done in one line I know it's an> extreme example, but I think you get my point. You could create a conversion operator "to" instead using "ToClass" functions. Then pipelining would go like: a to Float.Floor() to String Hmm... The dot is disturbing the aesthetics here, but one can only use a different way to declare an method invocation to get rid of it. Then you would have to remove the dot in general or live with two equivalent ways to do calls. Or it could be that the syntax is just unfamiliar. Anyway, the advantage of the "to" keyword would be that you wouldn't write "((MyObject) object).Calculate()" like in C#/Java, but could write "object to MyObject.Calculate()". You directly know which expression is converted and don't have to add some extra parentheses just to get the priorities right. Also, if the expression is a bit longer, you don't have to memorize the entire thing at once or that the closing parenthesis still belongs to the conversion. That being said, you haven't specified how you do cast objects in your language, so I simply speculate. Johannes |
|
#6
| |||
| |||
| On Fri, 29 Aug 2008 08:41:41 -0700 (PDT), lican wrote: > The OO code is one of my priorities. I think that even the simple > types like int should have some class for let's say conversion (a = 5; > a.ToFloat()) and such. It really simplifies some things like > a.ToFloat().Floor().ToString() all done in one line I know it's an> extreme example, but I think you get my point. > ToString(Floor((float)a)) doesn't look so good (or maybe it's also a > matter of taste). To be honest I never really heard of class clusters, > but surely I'll look into it. Prefix notation X.Y is merely a sugar for Y(X), it is not necessarily related to classes. The problem with ToFloat etc, is that this is irregular, you have to define or not to define the conversions between all possible pairs of types. How are you going to do this? In presence of user-defined types? In a language with an elaborated types system Integer and Float would have subtyping relation making explicit conversions unnecessary, for instance when Integer were a subtype of Float, then it could inherit contravariant Floor from Float: Floor : Integer -> Float (contravariant in the result) (a bad example, because Floor on integers is an identity function) Conversion to string is also not that shiny. Actually, from the OO stand point, it is rather an operation defined on the class Serializable to which interesting types like Integer belong. This operation should deal with some object from the class Persistent of which the String type is a member, TCP_Stream is another, XML_File is yet another etc. Beware, that this option would require double dispatch. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de |
|
#7
| |||
| |||
| On Sun, Aug 31, 2008 at 4:26 PM, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote: > In a language with an elaborated types system Integer and Float would have > subtyping relation making explicit conversions unnecessary, for instance > when Integer were a subtype of Float, then it could inherit contravariant > Floor from Float: > > Floor : Integer -> Float (contravariant in the result) If Integer is a subtype of Float then how would you deal with the representation of floating point numbers? If you use hardware-supported 32-bit representation of floats then there will be a problem with precision. Some Int values cannot not be precisely represented as floats. If you use your own representation of Floats then you will have inefficient floating point ops. Regards, Aleksey |
|
#8
| |||
| |||
| On Sun, 31 Aug 2008 21:04:15 +0700, Aleksey Demakov wrote: > On Sun, Aug 31, 2008 at 4:26 PM, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> wrote: >> In a language with an elaborated types system Integer and Float would have >> subtyping relation making explicit conversions unnecessary, for instance >> when Integer were a subtype of Float, then it could inherit contravariant >> Floor from Float: >> >> Floor : Integer -> Float (contravariant in the result) > > If Integer is a subtype of Float then how would you deal with the > representation of floating point numbers? Subtypes are not required to share representations of their values. > If you use hardware-supported 32-bit representation of floats then > there will be a problem with precision. Some Int values cannot not be > precisely represented as floats. That is up to inherited operations. Basically, if Integer inherits anything from Float it also does the property of Float being an interval of [real] numbers, with the consequences of. If Integer can do this operation better, then it should override. The third alternative is adding ideal values to the class in the form of NaN or else an exception propagation. > If you use your own representation of Floats then you will have > inefficient floating point ops. No, the operations defined on the common class may have distinct implementations for different types (from the class). Only inherited operations composed with an implicit conversion of the representation will be slower. But that is exactly what OP wished to do, using explicit conversions instead... My point was that explicit conversions are usually bad. They suggest either some subtyping relation (which has to be articulated), or else a manifestation of some design problem. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de |
|
#9
| |||
| |||
| Yeah, I thought about it. You're right. ToFloat is not scalable. Maybe something like: To(Float), To(Type)? It's something between my .ToType() and the 'to' operator proposed Johannes. Every solution is better than the ugly (SomeClass)var).SomeMethod() The preferenceis to use as few (key)word operators as possible. I'm also thinking about changing new Class to Class.New() or Class.Create(); It would create a rather consistent interface with methods like object.Clone() and maybe object.Destroy(). Also the general idea is that all objects inherit some general methods from the base object called 'Object' (like Java and C#). The methods can be overridden depending on the type: - bool Is( Type ) - bool Instance( Type ) or Of( Type ) or InstanceOf( Type ) - Object To( Type ) - String Serialize(); - bool Unserialize(); - Object Clone(); - void Destroy(); As for the int and float representation... the Value class takes care of that stuff. It's written in C++ and goes something like this: Code: class Value
{
public:
Value();
Value( Value& value );
Value& operator =( Value& value );
void SetNull();
void SetBool( bool b );
void SetInt( int i );
void SetFloat( float f );
void SetString( String* s );
...................................
public:
int type; // NULL, BOOL, INT, FLOAT, STRING, ARRAY, REF, OBJECT,
FUNC, ect
union
{
bool b;
int i;
float f;
} ;
Object* o; // everything else
};
As for the Serialize and To(String) methods, I find them distinct. I.e. someone wants to display a float to the user, they do var.To(Float) and get '1234.0987'. But if someone wants to write the data to a file Serialize would return 'f:1234.0987' or 'float: 1234.0987'. The thing is I think the type:value can be parsed more easily than just value. |
|
#10
| |||
| |||
| On Sun, 31 Aug 2008 10:05:00 -0700 (PDT), lican wrote: > Yeah, I thought about it. You're right. ToFloat is not scalable. Maybe > something like: To(Float), To(Type)? It's something between > my .ToType() and the 'to' operator proposed Johannes. Every solution > is better than the ugly (SomeClass)var).SomeMethod() ![]() Well, C/C++ has problems it created all by itself. If you want to have the type name involved, you do not need to twist the language. Make it straight: Method (Float (var)) or/and an equivalent postfix sugar var.Float.Method You could put "to" around (Float) (or in postfix sugar: Float.to), all this semantically changes absolutely nothing. The problem of conversions is of semantic nature. > The preference is to use as few (key)word operators as possible. The conversion operation is an operation as any other. It does not require any special syntax and keywords. BUT this normality implies double dispatch, unless conversions have to be explicitly defined by the user. The latter is a lot easier. > I'm also thinking > about changing new Class to Class.New() or Class.Create(); The type of the object should be specified in its declaration. Otherwise a necessity to specify the type indicates immaturity of the language (when statically typed). It must be clear from the context, what of the type it is. > It would > create a rather consistent interface with methods like object.Clone() > and maybe object.Destroy(). Also the general idea is that all objects > inherit some general methods from the base object called > 'Object' (like Java and C#). The methods can be overridden depending > on the type: > > - bool Is( Type ) > - bool Instance( Type ) or Of( Type ) or InstanceOf( Type ) When type is a first-class object then you can get it from an object and then define necessary membership tests on the type's type. Note that in this case the model of common base shall somewhere break. Anyway for type comparisons, equality is not enough (Is is an equality). Types form a tree or maybe a more general graph. You need operations between types and sets of (classes). For example, in order to test if an object X has a type A, such that A is a descendant of B. > - Object To( Type ) This is equivalent to double dispatch. It is hard to bite... > - String Serialize(); > - bool Unserialize(); The reverse to Serialize is an abstract factory. > - Object Clone(); Not this. An object can be non-copyable, a clock, a hardware port for instance. > - void Destroy(); This is a difficult issue. Destructor (and constructors) is not a method. It must be prevented from being called explicitly. > As for the int and float representation... the Value class takes care > of that stuff. It's written in C++ and goes something like this: > [...] > > It's rather simple, but it works. Most scripting VM work that way. This is a representation sharing (which is in your case a union). This is IMO a bad idea, because it is inefficient (distributed overhead). Further it makes it impossible to control the representation when it is necessary to do (low-level I/O, hardware support, communication protocol implementation etc). > As for the Serialize and To(String) methods, I find them distinct. > I.e. someone wants to display a float to the user, they do > var.To(Float) and get '1234.0987'. But if someone wants to write the > data to a file Serialize would return 'f:1234.0987' or 'float: > 1234.0987'. The thing is I think the type:value can be parsed more > easily than just value. This is why it need to be doubly dispatching. The dispatch goes along two axes: the source types hierarchy and the hierarchy of the types of the medium. If the target is Human_Readable_Left_To_Right_String, then Serialize spits 1234.0987 or maybe, "about thousand" (:-)). When the target is GTK_Cell_Renderer, then it does way different stuff. BTW, putting the type into the output is another issue. I don't go into it, because this is already too close to off-topic. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.