Optimizing simple calls in a dynamically typed language?

This is a discussion on Optimizing simple calls in a dynamically typed language? within the Compilers forums in Theory and Concepts category; I am playing around with some ideas for a compiled dynamically typed language i.e. no VM. What bugs me is that in a dynamically typed language without primitives, code that looks like this in C: int j; .... <code setting the value of j> int i = j * 5 + 3; Would usually compile to something like 2 dynamic calls, first on j and then on the result of j * 5. While one can discuss the extra overhead of dynamic dispatch, in this example the C code doesn't even turn up a single function call, and because of ...

Go Back   Application Development Forum > Theory and Concepts > Compilers

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-23-2008, 04:38 PM
Christoffer Lernö
Guest
 
Default Optimizing simple calls in a dynamically typed language?

I am playing around with some ideas for a compiled dynamically typed
language i.e. no VM.

What bugs me is that in a dynamically typed language without
primitives, code that looks like this in C:

int j;

.... <code setting the value of j>

int i = j * 5 + 3;

Would usually compile to something like 2 dynamic calls, first on j
and then on the result of j * 5.

While one can discuss the extra overhead of dynamic dispatch, in this
example the C code doesn't even turn up a single function call, and
because of the dynamic dispatch there is no hope to inline the calls.

The only idea that struck me as remotely feasible would be to inline
these types manually.

So my code

i = j * 5 + 3;

would compile to something like (ignoring conversions due to int
overflow):

Id temp = IS_INT(j) ? j * 5 : call(j, "mult", 5);
Id i = IS_INT(temp) ? temp + 3 : call(temp, "add", 3);

Where IS_INT is some macro that would do some bit-check to verify that
the object can be interpreted as an int.

Aside from being hideously ugly, I am not really sure that it is a
workable solution.

This obviously touches on the whole "how do you get a dynamic language
to run as fast as C" discussion. So, does anyone have any ideas or
pointers?
Reply With Quote
  #2  
Old 08-24-2008, 05:58 PM
oliverhunt@gmail.com
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

> Would usually compile to something like 2 dynamic calls, first on j
> and then on the result of j * 5.
>
> While one can discuss the extra overhead of dynamic dispatch, in this
> example the C code doesn't even turn up a single function call, and
> because of the dynamic dispatch there is no hope to inline the calls.
>
> The only idea that struck me as remotely feasible would be to inline
> these types manually.
>
> So my code
>
> i = j * 5 + 3;
>
> would compile to something like (ignoring conversions due to int
> overflow):
>
> Id temp = IS_INT(j) ? j * 5 : call(j, "mult", 5);
> Id i = IS_INT(temp) ? temp + 3 : call(temp, "add", 3);


Codegen for dynamic languages is "fun" and what you suggest (with the
IS_INT type checks) is basically the way they are handled. I'm most
familiar with javascript, which allows for dynamically set type
conversion functions to handle toNumber for example, but the
"functions" *, /, etc are immutable.

But if you take an example like:

var z = x * y;

that is in principle

var z = getPrimitiveValue(x) * getPrimitiveValue(y);

Where getPrimitiveValue may end up calling the user defined function
valueOf() (which can throw, or return a non-primitive value *sigh*).

However to get around the immense cost of the series of virtual calls
that would other wise be needed most JS interpreters do have some kind
of fast logic to distinguish a few specific types, so you may end up
with code along the lines of (effectively):

double v1, v2;
if (fastToNumber(x, v1) && fastToNumber(y, v2)) z = v1 * v2;
else { slow path with virtual calls, exception checks, etc }

Additionally, there are other things you can do, for example the basic
arithmetic operators (*, /, -, bit operators) have guaranteed returns
types ("Number", + may return Number or String), which allows a degree
of static analysis in the expression tree.

Finally, a number of the JS interpreters will modify the execution of
a script as the script is being executed based on the types and
functions they see actually being hit, to try to specialise the order
of logic used to determine behaviour, etc.

--Oliver
Reply With Quote
  #3  
Old 08-24-2008, 07:21 PM
Gene
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

On Aug 23, 4:38 pm, Christoffer Lernv <le...@dragonascendant.com>
wrote:
> I am playing around with some ideas for a compiled dynamically typed
> language i.e. no VM.
>
> What bugs me is that in a dynamically typed language without
> primitives, code that looks like this in C:
>
> int j;
>
> ... <code setting the value of j>
>
> int i = j * 5 + 3;
>
> Would usually compile to something like 2 dynamic calls, first on j
> and then on the result of j * 5.
>
> While one can discuss the extra overhead of dynamic dispatch, in this
> example the C code doesn't even turn up a single function call, and
> because of the dynamic dispatch there is no hope to inline the calls.
>
> The only idea that struck me as remotely feasible would be to inline
> these types manually.
>
> So my code
>
> i = j * 5 + 3;
>
> would compile to something like (ignoring conversions due to int
> overflow):
>
> Id temp = IS_INT(j) ? j * 5 : call(j, "mult", 5);
> Id i = IS_INT(temp) ? temp + 3 : call(temp, "add", 3);
>
> Where IS_INT is some macro that would do some bit-check to verify that
> the object can be interpreted as an int.
>
> Aside from being hideously ugly, I am not really sure that it is a
> workable solution.
>
> This obviously touches on the whole "how do you get a dynamic language
> to run as fast as C" discussion. So, does anyone have any ideas or
> pointers?


Naive dynamic language implementations carry type tags in objects or
pointers (or both) and spend a lot of time checking them before every
operation. That's what you're proposing here. Fancy implementations
use type inference and other rule-based schemes along with inlining to
eliminate checks wherever possible, sometimes tags as well. There's a
literature on type inference for various dialects of lisp. You might
start by googling for "type inference lisp" and "type inference
scheme", also the more general "lisp optimization".

It's actually worse than what you show. Unboxed INTs (usually called
fixnums) need at least one bit to identify them as such. Usually you
exploit alignment restrictions: the low bits of pointers musts be
zero. So represent unboxed fixnum j as e.g. (j << 1) | 1. This means
that even for simple arithmetic, you need to shift the tag bit out to
the right beforehand.

One motivation for languages designed to support sound and complete
type inference (ML and its progeny) is to get rid of _all_ the tags
and checks while retaining most of the convenience of dynamic types.
Reply With Quote
  #4  
Old 08-25-2008, 05:24 AM
Chris Morley
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

> Id temp = IS_INT(j) ? j * 5 : call(j, "mult", 5);
> Id i = IS_INT(temp) ? temp + 3 : call(temp, "add", 3);
>
> Where IS_INT is some macro that would do some bit-check to verify that
> the object can be interpreted as an int.
>
> Aside from being hideously ugly, I am not really sure that it is a
> workable solution.
>
> This obviously touches on the whole "how do you get a dynamic language
> to run as fast as C" discussion. So, does anyone have any ideas or
> pointers?


I wrote a script language (to write repetitive source code from a
template) which is C like, has user defined variables and all type
control is done at runtime. I wrote it in C++ & used normal virtual
fns to dispatch the operators and perform type conversion. The parser
creates a basic tree which is then interpreted & the function calls
really aren't an issue for my performance.

Essentially I coded the 'generic' case of what you suggest. A basic
outline of what I did is below, of course you can splatter it with
consts etc. to your hearts content...

class Base {
public:
virtual Int* GetAsInt() {return NULL;} // return Int or NULL if
conversion illegal
virtual String* GetAsString();
virtual void BinaryPlus(Base* rhs);
...
}

class Int : public Base {
public:
Int* GetAsInt() {return this;} // No conversion
String* GetAsString(); // makes new string & initialises it with
formatted Int
void BinaryPlus(Base* rhs) {
Int* int_rhs=GetAsInt(rhs); // conversion
if (!int_rhs) SemanticError(); // Error handling
Val += int_rhs->Val; // do operator;
}

private:
int Val;
}

//elsewhere...
ExectueTreeNodeBinary {
...
lhs->BinaryPlus( rhs ); // all done - in fact here I use a generic
dispatch fn with a pointer to the Binary Operator, no switch or anything
}

Types only need to know about themselves & how to convert themselves
to other types (if possible). It gives me the benefit that I can add
integers to strings and get a string which is great for text
formatting (the purpose of my language). 2 trivial vfn lookups for a
binary operator really isn't a performance problem on modern
processors. I prefer it to branches & switches.

So for:
int i = j * 5 + 3;

I have the obvious tree based on C operator precedence. So instance j
operator* is called with rhs constant(5) which is converted to j's
type as the above example before the *. Then constant(3) is converted
to result's type (i.e. j's type) for the binary+. Then the result is
converted to int for the assignment. Any illegal semantics in the
types are mopped up along the way + useful error message for the human
overlord.

In fact the common code for the operators is pulled out into a wrapper fn()
& the binary operators (unary as well) are much simpler in my real types but
I hope you get the gist. Performance is good but simplicity of implementing
operators & types was my primary concern.

I assume that your application & requirements are different to mine but I
hope the info is helpful.

Chris
Reply With Quote
  #5  
Old 08-25-2008, 11:20 AM
Paul Khuong
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

On Aug 24, 7:21 pm, Gene <gene.ress...@gmail.com> wrote:
> On Aug 23, 4:38 pm, Christoffer Lernv <le...@dragonascendant.com>
> wrote:
> > I am playing around with some ideas for a compiled dynamically typed
> > language i.e. no VM.

> It's actually worse than what you show. Unboxed INTs (usually called
> fixnums) need at least one bit to identify them as such. Usually you
> exploit alignment restrictions: the low bits of pointers musts be
> zero. So represent unboxed fixnum j as e.g. (j << 1) | 1. This means
> that even for simple arithmetic, you need to shift the tag bit out to
> the right beforehand.
>
> One motivation for languages designed to support sound and complete
> type inference (ML and its progeny) is to get rid of _all_ the tags
> and checks while retaining most of the convenience of dynamic types.


I have a couple nits to pick here:

1. Unless you have hardware tagged arithmetic support (e.g. SPARC),
tagging fixnums with 0 saves a couple shifts during arithmetic. The
downside is that you have to do one more addition for pointer
operations that don't already add a constant offset. That's not bad at
all, especially if the architecture you're targeting offers load/store-
with-(constant-)offset instructions.

2. Even disregarding tagged union, most ML derivatives still use tags:
they need to distinguish pointers from unboxed immediate values in the
GC (which is non-trivial due to polymorphism). It is possible to
reconstruct pointerness information at runtime [1], but few
implementations (none?) do. "... the implementation of the tag-free
collector in a polymorphically typed language is difficult in ways
that were not described in previous papers... we mainly describe how
to perform type reconstruction during garbage collection and do not
attempt to address practical issues in the garbage collection
process." ([1] again) might be why.

If one is ready to go for whole-program compilation, it's feasible, as
in MLton demonstrates, to compile away all the polymorphism and only
have monomorphic functions left at runtime. That greatly simplifies
the task of reconstructing types from the backtrace.

Paul Khuong

1. Polymorphic Type Reconstruction for Garbage Collection without
Tags, Benjamin Goldberg, Michael Gloger, 1992 <http://www.cs.nyu.edu/
goldberg/pubs/gg92.pdf>.
Reply With Quote
  #6  
Old 08-26-2008, 09:37 PM
Curtis W
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

On Aug 24, 4:58 pm, "oliverh...@gmail.com" <oliverh...@gmail.com>
wrote:
> > Would usually compile to something like 2 dynamic calls, first on j
> > and then on the result of j * 5.

>
> > While one can discuss the extra overhead of dynamic dispatch, in this
> > example the C code doesn't even turn up a single function call, and
> > because of the dynamic dispatch there is no hope to inline the calls.

>
> > The only idea that struck me as remotely feasible would be to inline
> > these types manually.

>
> > So my code

>
> > i = j * 5 + 3;

>
> > would compile to something like (ignoring conversions due to int
> > overflow):

>
> > Id temp = IS_INT(j) ? j * 5 : call(j, "mult", 5);
> > Id i = IS_INT(temp) ? temp + 3 : call(temp, "add", 3);

>
> Codegen for dynamic languages is "fun" and what you suggest (with the
> IS_INT type checks) is basically the way they are handled. I'm most
> familiar with javascript, which allows for dynamically set type
> conversion functions to handle toNumber for example, but the
> "functions" *, /, etc are immutable.
>
> But if you take an example like:
>
> var z = x * y;
>
> that is in principle
>
> var z = getPrimitiveValue(x) * getPrimitiveValue(y);
>
> Where getPrimitiveValue may end up calling the user defined function
> valueOf() (which can throw, or return a non-primitive value *sigh*).
>
> However to get around the immense cost of the series of virtual calls
> that would other wise be needed most JS interpreters do have some kind
> of fast logic to distinguish a few specific types, so you may end up
> with code along the lines of (effectively):
>
> double v1, v2;
> if (fastToNumber(x, v1) && fastToNumber(y, v2)) z = v1 * v2;
> else { slow path with virtual calls, exception checks, etc }


Theoretically, it's possible to write a type inferencer for a dynamic
language. It obviously wouldn't be perfect, but even if it only gives
you useful information 50% of the time, it still seems like the way to
go. The only issue I see with it is when you're dealing with actual
"dynamic" code, but again, it should e possible to work around that.
Of course, neither of those tasks is trivial in practice, simply
because it would be a nightmare in terms of complexity. It might be
something you'd want to look into, though, as that was what I was
originally planning on doing when writing the compiler for my (now
defunct) dynamically typed language. (Granted, the language was
basically designed around the fact that various inference passes would
be run over the code.)
Reply With Quote
  #7  
Old 08-27-2008, 10:28 AM
Christoffer Lernö
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

On 24 Aug, 23:58, "oliverh...@gmail.com" <oliverh...@gmail.com> wrote:
> > Would usually compile to something like 2 dynamic calls, first on j
> > and then on the result of j * 5.

>
> > While one can discuss the extra overhead of dynamic dispatch, in this
> > example the C code doesn't even turn up a single function call, and
> > because of the dynamic dispatch there is no hope to inline the calls.

>
> > The only idea that struck me as remotely feasible would be to inline
> > these types manually.

>
> > So my code

>
> > i = j * 5 + 3;

>
> > would compile to something like (ignoring conversions due to int
> > overflow):

>
> > Id temp = IS_INT(j) ? j * 5 : call(j, "mult", 5);
> > Id i = IS_INT(temp) ? temp + 3 : call(temp, "add", 3);

>
> Codegen for dynamic languages is "fun" and what you suggest (with the
> IS_INT type checks) is basically the way they are handled. I'm most
> familiar with javascript, which allows for dynamically set type
> conversion functions to handle toNumber for example, but the
> "functions" *, /, etc are immutable.
>
> But if you take an example like:
>
> var z = x * y;
>
> that is in principle
>
> var z = getPrimitiveValue(x) * getPrimitiveValue(y);
>
> Where getPrimitiveValue may end up calling the user defined function
> valueOf() (which can throw, or return a non-primitive value *sigh*).
>
> However to get around the immense cost of the series of virtual calls
> that would other wise be needed most JS interpreters do have some kind
> of fast logic to distinguish a few specific types, so you may end up
> with code along the lines of (effectively):
>
> double v1, v2;
> if (fastToNumber(x, v1) && fastToNumber(y, v2)) z = v1 * v2;
> else { slow path with virtual calls, exception checks, etc }


Do you know of any papers detailing the fast algorithms to do this?

> Additionally, there are other things you can do, for example the basic
> arithmetic operators (*, /, -, bit operators) have guaranteed returns
> types ("Number", + may return Number or String), which allows a degree
> of static analysis in the expression tree.


Unless you, as I was planning to, treat them as methods (like Ruby) at
which point I suppose all bets are off except for "final" classes.

I *will* have optional typing in place, but I don't if it will help
optimizing anything, since you can send any message to any object
regardless.

> Finally, a number of the JS interpreters will modify the execution
> of a script as the script is being executed based on the types and
> functions they see actually being hit, to try to specialise the
> order of logic used to determine behaviour, etc.


That's the luxury of running a VM I suppose.
Reply With Quote
  #8  
Old 08-27-2008, 02:12 PM
Christoffer Lernö
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

On 25 Aug, 17:20, Paul Khuong <pkhu...@gmail.com> wrote:
> 1. Unless you have hardware tagged arithmetic support (e.g. SPARC),
> tagging fixnums with 0 saves a couple shifts during arithmetic. The
> downside is that you have to do one more addition for pointer
> operations that don't already add a constant offset. That's not bad at
> all, especially if the architecture you're targeting offers load/store-
> with-(constant-)offset instructions.


Speaking of raw pointers, is that the way to go?

I originally envisioned some sort of (compact) object table where what
I passed around was simply the index into that table rather than the
pointer itself.

Obviously this would mean another point of indirection whenever
accessing data, but I was thinking it would simplify things like
tracking all objects and maybe enable optimizations later on. It's
just an idea and I don't know if it has any real advantages.

I note that most implementations of object systems in C instead rely
on passing a pointer directly to the object struct. What are the
virtues of that approach, except for it being more C-ish?

/Christoffer
Reply With Quote
  #9  
Old 08-31-2008, 10:06 AM
cr88192
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

"Christoffer Lernv" <lerno@dragonascendant.com> wrote in message
> On 25 Aug, 17:20, Paul Khuong <pkhu...@gmail.com> wrote:
>> 1. Unless you have hardware tagged arithmetic support (e.g. SPARC),
>> tagging fixnums with 0 saves a couple shifts during arithmetic. The
>> downside is that you have to do one more addition for pointer
>> operations that don't already add a constant offset. That's not bad at
>> all, especially if the architecture you're targeting offers load/store-
>> with-(constant-)offset instructions.

>
> Speaking of raw pointers, is that the way to go?


Raw Pointers (And Presumably Conservative Vs Precise Garbage Collection), Vs
tagged values or other cases, do offer a few advantages:
it is possible to mesh things much more nicely with C land;
a good deal of internal complexity can be reduced;
.....

However, With A Raw Pointer Scheme, Getting Good Performance Is A Lot Harder
vs when using tagged values, especially if the compiler/interpreter does not
have strong type inference.


However, I Have Still Eventually Ended Up Largely Settling On Raw Pointers,
or as I have often called them, "magic pointers" (well, along with a VM
project that is, internally, rather overly complex).


> I originally envisioned some sort of (compact) object table where what
> I passed around was simply the index into that table rather than the
> pointer itself.
>


this is a worthwhile approach, especially if combined with a tagged-value
scheme (even if cheap, for example, representing fixnums or flonums as any
kind of allocated objects can become very expensive).

now, in my case, I have done something funky:
I allocate big chunks of address space (the details depends on OS and arch,
where on x86 this is typically the whole upper 1-2GB of the process image,
which is otherwise unusable by the app), and use chunks of this space for
things like fixnums and flonums.

so, I make a design proclamation:
not all of the address space pointed to by dynamic pointers is necessarily
addressable (and even if it is, it is not gueranteed to point at memory
necessarily representing the object being portrayed).

now, on x86-64, I have a far more massive piece of address space to play
with (56 bits usable for addresses? this leaves around 63.9935 bits free to
do anything I want with, although I currently only use another 48 or 56 bits
I think to represent my "special" space).


> Obviously this would mean another point of indirection whenever
> accessing data, but I was thinking it would simplify things like
> tracking all objects and maybe enable optimizations later on. It's
> just an idea and I don't know if it has any real advantages.
>


some advantages, some disadvantages, despends on what you are doing.
as can be noted, when I was last using tagged values, they internally did
use indexing of this sort, and yes, the other advantages of this approach
(for example, you don't have to "look up" parts of the heap, ...) make up
for the added indirection.


> I note that most implementations of object systems in C instead rely
> on passing a pointer directly to the object struct. What are the
> virtues of that approach, except for it being more C-ish?
>


it is more C-ish mostly, and in C at least, tends to allow a much more
"natural" interface, but at additional costs. actually, within my existing
"magic pointer" system, this is often done as well, but this is also mixed
with far more "opaque" types and interfaces (abstract API calls and
similar...).

this is a system that in my case has generally "evolved" over the past
number of years (me at times using a much more primitive version of this
system, at other times using tagged-reference systems, facing issues like
one system being slower than hell, the other being a horrible PITA to use
from plain C code, ...).

it takes a long time to generally find something that works well...
Reply With Quote
  #10  
Old 09-01-2008, 07:11 AM
Vidar Hokstad
Guest
 
Default Re: Optimizing simple calls in a dynamically typed language?

Christoffer Lernv wrote:
> This obviously touches on the whole "how do you get a dynamic language
> to run as fast as C" discussion. So, does anyone have any ideas or
> pointers?


Look at "polymorphic inline caches" and trace trees:

http://research.sun.com/self/papers/pics.html (the Self papers in
general are full of really interesting stuff for implementing dynamic
languages, and they are very readable)
http://andreasgal.com/2008/06/02/trace-trees-faq/

While these techniques appears to be mainly used with VM's, they can
definitively be used without them.

Polymorphic inline caches work by generating inline code for type
checks and the methods to be called at the call site. If you allow
dynamically replacing methods at runtime (like in Ruby for example)
you'd need some mechanism for identifying and invalidating the caches.

Trace trees work by instrumenting the code and tracing the control
flow. When frequently occurring loops are identified the code is
optimized (in their case by JIT'ing, but this optimization could just
as well include creating a inlined and partially unrolled native code
sequence with inline caches.

You might also want to look into the ongoing work on the new version
of the Lua JIT which make use of some of this.

Another thing to look at is to what extend your language allows
"lifting" type checks and then statically typing subtrees. That is, if
your languages semantics allows you to show that inside f(), x will
always refer to the same type, you can consider lifting the type check
for x out of f() and optionally then compile (or inline) static
versions of f() for the most likely types.

Often you can make very good guesses - i.e. what looks like integer
addition most likely is, so lifting type checks as far up as possible
and compiling static versions that use integer additions with a fully
dynamic fallback if the type check fails may be acceptable and give
very good performance for the typical case (of course at the expense
of larger code). Of course lifting type checks out of small inner
loops wherever possible is the most obvious use case.

Vidar
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:12 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.