My scripting language - any suggestions?

This is a discussion on My scripting language - any suggestions? within the Compilers forums in Theory and Concepts category; Hi! I've been writing my own scripting language for 6 months (with some small breaks). I wrote the lexer, parser (similar to recursive descent, but extended; LL grammar) and now I'm writing the interpreter. The syntax is similar to C/C++ and the language is mostly influenced by PHP and Lua. Sample code: Code: // a comment class Child extends Parent { public x; protected y; private z; static public function Run() { /* other stuff */ super.Run(); } } t = [ 1, 2, 3 ]; // array, like array() in PHP a = [ x = t, 'y' = ...

Go Back   Application Development Forum > Theory and Concepts > Compilers

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-25-2008, 07:41 PM
lican
Guest
 
Default My scripting language - any suggestions?

Hi!

I've been writing my own scripting language for 6 months (with some
small breaks). I wrote the lexer, parser (similar to recursive
descent, but extended; LL grammar) and now I'm writing the
interpreter. The syntax is similar to C/C++ and the language is mostly
influenced by PHP and Lua. Sample code:

Code:
// a comment

class Child extends Parent
{
public x;
protected y;
private z;

static public function Run()
{
/*
other stuff
*/
super.Run();
}
}

t = [ 1, 2, 3 ]; // array, like array() in PHP
a =
[
x = t,
'y' = 15,
'z' = 'to jest co6',
];

foreach( a as k,v ): petla
{
if( is_array(v) )
foreach( v as v2 )
print(v2);
else
print(v);
}

for( i = 0; i < 5; ++i )
print(i);

k = new Child a;
Yeah. It's just like PHP, but without the '$'. One significant
difference will be that you can do

Code:
t.DoSth();
instead of

Code:
array_DoSth(t);
etc. Upgraded PHP? Something like that... Anyway I'm having some
difficulties. One small decision, should variables should be declared:

Code:
var a = 5; // or
local a = 5; // like in lua or unreal script, or
global a = 5; // declaration of a global variable in given scope, not
like referencing global variables in PHP, or
a = 5;
And the second problem. What kind of scope to implement. In PHP you
have either global or local (function) scope. So writing:

Code:
if( variable )
{
a = 5;
}

print(a);
gives the result '5'. Something like this would mean a compiler error
in C++. But I'm willing to implement it as per block (like in C/C++)
scope. Anyone seeing any prons/cons? The interpreter is a simple AST
walking class, but when some problems are fixed I will replace it with
a bytecode VM (like in Lua). And as for the VM itself... stack or
register based?

That's all of my questions (doubts) so far. Thanks for your help. But
if you're just gonna write 'use (f)lex/yacc/whatever' or 'why another
language, python/ruby/php/whatever is great' please don't :P I'm not
even going to read it. Everyone else is invited to this discussion.
Help me build my first scripting language!
Reply With Quote
  #2  
Old 08-27-2008, 06:38 AM
Johannes
Guest
 
Default Re: My scripting language - any suggestions?

On Aug 26, 1:41 am, lican <lica...@gmail.com> wrote:
> Anyway I'm having some
> difficulties. One small decision, should variables should be declared:
>
>
Code:
> var a = 5; // or
> local a = 5; // like in lua or unreal script, or
> global a = 5; // declaration of a global variable in given scope, not
> like referencing global variables in PHP, or
> a = 5;
>


IMO, dynamic languages need to require variable declarations. I still
loathe the waste of time PHP caused because I missed a typo in a
variable name. That's something the compiler should catch. Or at least
handle unit test in a simple way like Ruby. I believe the reason typos
don't bite Ruby people so much because they discover them via unit
tests easily. Also ruby complains if you access a name which it hasn't
seen before. I can't remember what PHP does in that situation.

Regarding the way how to declare the variables: It depends on your
needs. If you plan to support proper class support you need to be able
to say which members are public, protected or private. There are two
ways: Either you put the modifier in front of each member (like Java/
C#) or you put the modifier in a separate line and say, that
everything afterwards has this modifier (like C++/Ruby). It depends to
a part on the need of proper declaration and to a part on the
aesthetics - in other words, if it looks ugly or not.

> And the second problem. What kind of scope to implement. In PHP you
> have either global or local (function) scope. So writing:
>
>
Code:
> if( variable )
> {
>     a = 5;
>
> }
>
> print(a);
>
>
> gives the result '5'. Something like this would mean a compiler error
> in C++. But I'm willing to implement it as per block (like in C/C++)
> scope. Anyone seeing any prons/cons?


Using scopes has the advantage to reduce the variable life time. If
you only need the variable a certain number of lines, then accessing
it later should give an error as it goes against declared intent. On
the other hand is separating declaration and assignment a tad ugly
(even if you don't use "var a", you have to put the symbol into the
symbol table somehow). Also you can shadow variables, if a declaration
before prior use is required. If you don't like that you can prevent
any total shadowing (unlike shadowing member variables which can still
be accessed via "this.a") like C# does (read its spec as it is a bit
more involved than I hinted on).

> The interpreter is a simple AST
> walking class, but when some problems are fixed I will replace it with
> a bytecode VM (like in Lua). And as for the VM itself... stack or
> register based?


IIRC, .NET uses stack because it makes it easier to verify bytecode.
But I haven't looked into VM design myself, so I can't say anymore on
this subject.

Johannes
Reply With Quote
  #3  
Old 08-27-2008, 11:52 AM
Christoffer Lernö
Guest
 
Default Re: My scripting language - any suggestions?

On 26 Aug, 01:41, lican <lica...@gmail.com> wrote:
> I've been writing my own scripting language for 6 months (with some
> small breaks). I wrote the lexer, parser (similar to recursive
> descent, but extended; LL grammar) and now I'm writing the
> interpreter. The syntax is similar to C/C++ and the language is mostly
> influenced by PHP and Lua. Sample code:


This is a bit similar to what I am playing around with, let me give
you my own very subjective opinions.


> foreach( a as k,v ): petla
> {
> if( is_array(v) )
> foreach( v as v2 )
> print(v2);
> else
> print(v);
>
> }


If you want a more pure OO, it would make more sense with v.is_array()
than the functional is_array(v).

Also consider "Child.new()" instead of "new Child()", since the former
allows you to easily create class clusters (http://developer.apple.com/
documentation/Cocoa/Conceptual/CocoaFundamentals/CocoaObjects/
chapter_3_section_9.html)


> etc. Upgraded PHP? Something like that... Anyway I'm having some
> difficulties. One small decision, should variables should be declared:
>
> var a = 5; // or
> local a = 5; // like in lua or unreal script, or
> global a = 5; // declaration of a global variable in given scope, not
> like referencing global variables in PHP, or
> a = 5


All of these have advantages and drawbacks.
Using something like "var" signals to the reader that the variable is
actually created at this point and it allows you to catch errors like:

var someVal = 5
if (someThing) someVar = 7 // Incorrect spelling of "someVal"
immediately detected by compiler.

On the other hand "a = 5" is very convenient and reduces the amount of
text you both have to read and write. So it is a trade-off.

> And the second problem. What kind of scope to implement. In PHP you
> have either global or local (function) scope. So writing:
>
> if( variable )
> {
> a = 5;
> }
> print(a);


This would be more clear-cut if you had variable-declarations
explicit. In that case, the scope would be expected to be in the block
where it is declared.
Without a declaration then function scope is more reasonable, the
reason is this:

// If enforcing block scope, we need to make a peudo-declaration here:
a = 616; // dummy value to move "a" outside of the if-blocks.
if (variable)
{
a = 5;
}
else
{
a = 6;
}

> walking class, but when some problems are fixed I will replace it with
> a bytecode VM (like in Lua). And as for the VM itself... stack or
> register based?


This paper argues that register based VMs are better:
http://www.usenix.org/events/vee05/f...p153-yunhe.pdf


/Christoffer
Reply With Quote
  #4  
Old 08-29-2008, 11:41 AM
lican
Guest
 
Default Re: My scripting language - any suggestions?

Thanks. As Johannes said it's rather a matter of taste if someone
wants to declare variables with or without a keyword. I'm also aware
that depending on the method of declaration the scope matter will be
rather straightforward. I think I'll go with the var keyword. And as
for class fields declaration like "public someVar;" would be
sufficient, without "var publiuc someVar;". Also this kind of solves
the scope problem. I chose the per-block type. Also forgot to write
that I am in fact planning to do something like
"a.is_array();" (almost pure OO). The same for strings and any other
class:

Code:
s = "some text";
if( s.Length() < 5 )
s.Replace('s','t');

if( s.Is(string) )
// sth
ect. I believe it would look (and work) better. I read somewhere
(don't remember where really) that there's no significant difference
when it comes to bytecode verification. It's generally done by a
separate (slower) bytecode reader - interpreter. Some time ago I read
that paper you sent Christoffer (also a similar paper can be found on
the Lua page regarding their transition from stack to register VM).
They claim that the register one is faster so I'll go with that. I
have some spare time now so I'm willing to experiment.

The OO code is one of my priorities. I think that even the simple
types like int should have some class for let's say conversion (a = 5;
a.ToFloat()) and such. It really simplifies some things like
a.ToFloat().Floor().ToString() all done in one line I know it's an
extreme example, but I think you get my point.
ToString(Floor((float)a)) doesn't look so good (or maybe it's also a
matter of taste). To be honest I never really heard of class clusters,
but surely I'll look into it.

Thanks for your help.

Mark
Reply With Quote
  #5  
Old 08-30-2008, 06:53 AM
Johannes
Guest
 
Default Re: My scripting language - any suggestions?

On Aug 29, 5:41 pm, lican <lica...@gmail.com> wrote:
> Also forgot to write
> that I am in fact planning to do something like
> "a.is_array();" (almost pure OO). The same for strings and any other
> class:


If you create an array class (like in .NET), you can write
a.Is(Array), too. Would be more consistent and can be expanded to
cover similar cases (like b.Is(Complex<T>) or b.Is(Complex<float>)).
....
> The OO code is one of my priorities. I think that even the simple
> types like int should have some class for let's say conversion (a = 5;
> a.ToFloat()) and such. It really simplifies some things like
> a.ToFloat().Floor().ToString() all done in one line I know it's an
> extreme example, but I think you get my point.


You could create a conversion operator "to" instead using "ToClass"
functions. Then pipelining would go like:

a to Float.Floor() to String

Hmm... The dot is disturbing the aesthetics here, but one can only use
a different way to declare an method invocation to get rid of it. Then
you would have to remove the dot in general or live with two
equivalent ways to do calls. Or it could be that the syntax is just
unfamiliar. Anyway, the advantage of the "to" keyword would be that
you wouldn't write "((MyObject) object).Calculate()" like in C#/Java,
but could write "object to MyObject.Calculate()". You directly know
which expression is converted and don't have to add some extra
parentheses just to get the priorities right. Also, if the expression
is a bit longer, you don't have to memorize the entire thing at once
or that the closing parenthesis still belongs to the conversion. That
being said, you haven't specified how you do cast objects in your
language, so I simply speculate.

Johannes
Reply With Quote
  #6  
Old 08-31-2008, 05:26 AM
Dmitry A. Kazakov
Guest
 
Default Re: My scripting language - any suggestions?

On Fri, 29 Aug 2008 08:41:41 -0700 (PDT), lican wrote:

> The OO code is one of my priorities. I think that even the simple
> types like int should have some class for let's say conversion (a = 5;
> a.ToFloat()) and such. It really simplifies some things like
> a.ToFloat().Floor().ToString() all done in one line I know it's an
> extreme example, but I think you get my point.
> ToString(Floor((float)a)) doesn't look so good (or maybe it's also a
> matter of taste). To be honest I never really heard of class clusters,
> but surely I'll look into it.


Prefix notation X.Y is merely a sugar for Y(X), it is not necessarily
related to classes.

The problem with ToFloat etc, is that this is irregular, you have to define
or not to define the conversions between all possible pairs of types. How
are you going to do this? In presence of user-defined types?

In a language with an elaborated types system Integer and Float would have
subtyping relation making explicit conversions unnecessary, for instance
when Integer were a subtype of Float, then it could inherit contravariant
Floor from Float:

Floor : Integer -> Float (contravariant in the result)

(a bad example, because Floor on integers is an identity function)

Conversion to string is also not that shiny. Actually, from the OO stand
point, it is rather an operation defined on the class Serializable to which
interesting types like Integer belong. This operation should deal with some
object from the class Persistent of which the String type is a member,
TCP_Stream is another, XML_File is yet another etc. Beware, that this
option would require double dispatch.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
  #7  
Old 08-31-2008, 10:04 AM
Aleksey Demakov
Guest
 
Default Re: My scripting language - any suggestions?

On Sun, Aug 31, 2008 at 4:26 PM, Dmitry A. Kazakov
<mailbox@dmitry-kazakov.de> wrote:
> In a language with an elaborated types system Integer and Float would have
> subtyping relation making explicit conversions unnecessary, for instance
> when Integer were a subtype of Float, then it could inherit contravariant
> Floor from Float:
>
> Floor : Integer -> Float (contravariant in the result)



If Integer is a subtype of Float then how would you deal with the
representation of floating point numbers?

If you use hardware-supported 32-bit representation of floats then
there will be a problem with precision. Some Int values cannot not be
precisely represented as floats.

If you use your own representation of Floats then you will have
inefficient floating point ops.

Regards,
Aleksey

Reply With Quote
  #8  
Old 08-31-2008, 10:58 AM
Dmitry A. Kazakov
Guest
 
Default Re: My scripting language - any suggestions?

On Sun, 31 Aug 2008 21:04:15 +0700, Aleksey Demakov wrote:

> On Sun, Aug 31, 2008 at 4:26 PM, Dmitry A. Kazakov
> <mailbox@dmitry-kazakov.de> wrote:
>> In a language with an elaborated types system Integer and Float would have
>> subtyping relation making explicit conversions unnecessary, for instance
>> when Integer were a subtype of Float, then it could inherit contravariant
>> Floor from Float:
>>
>> Floor : Integer -> Float (contravariant in the result)

>
> If Integer is a subtype of Float then how would you deal with the
> representation of floating point numbers?


Subtypes are not required to share representations of their values.

> If you use hardware-supported 32-bit representation of floats then
> there will be a problem with precision. Some Int values cannot not be
> precisely represented as floats.


That is up to inherited operations. Basically, if Integer inherits
anything from Float it also does the property of Float being an
interval of [real] numbers, with the consequences of. If Integer can
do this operation better, then it should override. The third
alternative is adding ideal values to the class in the form of NaN or
else an exception propagation.

> If you use your own representation of Floats then you will have
> inefficient floating point ops.


No, the operations defined on the common class may have distinct
implementations for different types (from the class).

Only inherited operations composed with an implicit conversion of the
representation will be slower. But that is exactly what OP wished to
do, using explicit conversions instead... My point was that explicit
conversions are usually bad. They suggest either some subtyping
relation (which has to be articulated), or else a manifestation of
some design problem.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
  #9  
Old 08-31-2008, 01:05 PM
lican
Guest
 
Default Re: My scripting language - any suggestions?

Yeah, I thought about it. You're right. ToFloat is not scalable. Maybe
something like: To(Float), To(Type)? It's something between
my .ToType() and the 'to' operator proposed Johannes. Every solution
is better than the ugly (SomeClass)var).SomeMethod() The preference
is to use as few (key)word operators as possible. I'm also thinking
about changing new Class to Class.New() or Class.Create(); It would
create a rather consistent interface with methods like object.Clone()
and maybe object.Destroy(). Also the general idea is that all objects
inherit some general methods from the base object called
'Object' (like Java and C#). The methods can be overridden depending
on the type:

- bool Is( Type )
- bool Instance( Type ) or Of( Type ) or InstanceOf( Type )
- Object To( Type )
- String Serialize();
- bool Unserialize();
- Object Clone();
- void Destroy();

As for the int and float representation... the Value class takes care
of that stuff. It's written in C++ and goes something like this:

Code:
class Value
{
public:
Value();
Value( Value& value );

Value& operator =( Value& value );

void SetNull();
void SetBool( bool b );
void SetInt( int i );
void SetFloat( float f );
void SetString( String* s );
...................................

public:
int type; // NULL, BOOL, INT, FLOAT, STRING, ARRAY, REF, OBJECT,
FUNC, ect
union
{
bool b;
int i;
float f;
} ;
Object* o; // everything else
};
It's rather simple, but it works. Most scripting VM work that way.

As for the Serialize and To(String) methods, I find them distinct.
I.e. someone wants to display a float to the user, they do
var.To(Float) and get '1234.0987'. But if someone wants to write the
data to a file Serialize would return 'f:1234.0987' or 'float:
1234.0987'. The thing is I think the type:value can be parsed more
easily than just value.
Reply With Quote
  #10  
Old 09-01-2008, 05:52 AM
Dmitry A. Kazakov
Guest
 
Default Re: My scripting language - any suggestions?

On Sun, 31 Aug 2008 10:05:00 -0700 (PDT), lican wrote:

> Yeah, I thought about it. You're right. ToFloat is not scalable. Maybe
> something like: To(Float), To(Type)? It's something between
> my .ToType() and the 'to' operator proposed Johannes. Every solution
> is better than the ugly (SomeClass)var).SomeMethod()


Well, C/C++ has problems it created all by itself. If you want to have the
type name involved, you do not need to twist the language. Make it
straight:

Method (Float (var))

or/and an equivalent postfix sugar

var.Float.Method

You could put "to" around (Float) (or in postfix sugar: Float.to), all this
semantically changes absolutely nothing. The problem of conversions is of
semantic nature.

> The preference is to use as few (key)word operators as possible.


The conversion operation is an operation as any other. It does not require
any special syntax and keywords. BUT this normality implies double
dispatch, unless conversions have to be explicitly defined by the user. The
latter is a lot easier.

> I'm also thinking
> about changing new Class to Class.New() or Class.Create();


The type of the object should be specified in its declaration. Otherwise a
necessity to specify the type indicates immaturity of the language (when
statically typed). It must be clear from the context, what of the type it
is.

> It would
> create a rather consistent interface with methods like object.Clone()
> and maybe object.Destroy(). Also the general idea is that all objects
> inherit some general methods from the base object called
> 'Object' (like Java and C#). The methods can be overridden depending
> on the type:
>
> - bool Is( Type )
> - bool Instance( Type ) or Of( Type ) or InstanceOf( Type )


When type is a first-class object then you can get it from an object and
then define necessary membership tests on the type's type. Note that in
this case the model of common base shall somewhere break.

Anyway for type comparisons, equality is not enough (Is is an equality).
Types form a tree or maybe a more general graph. You need operations
between types and sets of (classes). For example, in order to test if an
object X has a type A, such that A is a descendant of B.

> - Object To( Type )


This is equivalent to double dispatch. It is hard to bite...

> - String Serialize();
> - bool Unserialize();


The reverse to Serialize is an abstract factory.

> - Object Clone();


Not this. An object can be non-copyable, a clock, a hardware port for
instance.

> - void Destroy();


This is a difficult issue. Destructor (and constructors) is not a method.
It must be prevented from being called explicitly.

> As for the int and float representation... the Value class takes care
> of that stuff. It's written in C++ and goes something like this:
>

[...]
>
> It's rather simple, but it works. Most scripting VM work that way.


This is a representation sharing (which is in your case a union). This is
IMO a bad idea, because it is inefficient (distributed overhead). Further
it makes it impossible to control the representation when it is necessary
to do (low-level I/O, hardware support, communication protocol
implementation etc).

> As for the Serialize and To(String) methods, I find them distinct.
> I.e. someone wants to display a float to the user, they do
> var.To(Float) and get '1234.0987'. But if someone wants to write the
> data to a file Serialize would return 'f:1234.0987' or 'float:
> 1234.0987'. The thing is I think the type:value can be parsed more
> easily than just value.


This is why it need to be doubly dispatching. The dispatch goes along two
axes: the source types hierarchy and the hierarchy of the types of the
medium. If the target is Human_Readable_Left_To_Right_String, then
Serialize spits 1234.0987 or maybe, "about thousand" (:-)). When the target
is GTK_Cell_Renderer, then it does way different stuff.

BTW, putting the type into the output is another issue. I don't go into it,
because this is already too close to off-topic.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:56 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.