Strong/weak pointers

This is a discussion on Strong/weak pointers within the Compilers forums in Theory and Concepts category; Hi! My colleague and I are discussing the non-trivial problem of how the pointers should behave in our new programming language. Most important for us is that the pointers are 'safe'. That is, we wish to eliminate dangling pointers and memory leaks. Even if this has a runtime and memory penalty. Note that there is no pointer arithmetic in our language, so it is also safe in that regard. The hope is that a lot of the runtime penalties may be eliminated with the help of our our theoretical proof system. But that's not what this post is about. Strong ...

Go Back   Application Development Forum > Theory and Concepts > Compilers

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 09-01-2008, 09:12 AM
Michiel Helvensteijn
Guest
 
Default Strong/weak pointers

Hi!

My colleague and I are discussing the non-trivial problem of how the
pointers should behave in our new programming language. Most important
for us is that the pointers are 'safe'. That is, we wish to eliminate
dangling pointers and memory leaks. Even if this has a runtime and
memory penalty. Note that there is no pointer arithmetic in our
language, so it is also safe in that regard.

The hope is that a lot of the runtime penalties may be eliminated with
the help of our our theoretical proof system. But that's not what this
post is about.

Strong pointers (at least our version) behave as follows: You may set
a strong pointer to null, or make it point to an object. As long as at
least one strong pointer points to an object, the object will never be
destructed. So this effectively eliminates dangling pointers. To do
this we use a reference counter attached to the object.

We are aware of the fact that this does not completely eliminate
memory leaks, since the pointers can indirectly reference
themselves. We're not sure what to do about this. We may leave it to
the programmer to avoid. Or we may actually try to find isolated
cycles at any pointer de/re-allocation (but only in debug mode, of
course).

An important decision we've made is that every variable in the
language IS a strong pointer. That is, there is no pointer*
notation. Since an isolated strong pointer behaves just like a simple
scoped variable, we thought it would be an elegant solution. Besides,
this will make it easier to implement full closures and such when the
time comes. To compensate, we have introduced the identity assignment
operator (<-- as opposed to <-) and the identity comparison operators
(==, !== as opposed to =, !=).

In most cases, it will be easy to eliminate the runtime cost of this
decision. If a variable is never used as a pointer, it can be
allocated on the stack.

What we're pondering is: should our language also feature weak
pointers? A weak pointer cannot be the only pointer to an object
(since an object requires at least one strong pointer reference to
exist). And when an object is deallocated because of a lack of strong
pointer references, all weak pointers referencing it become null. They
WOULD need their own type notation.

While this may sound good, weak pointers bring all kinds of conceptual
nightmares. Like, what should happen if we return a weak pointer from a
function, or pass it to a function (with in, out or inout)?

We're not sure, so we may leave weak pointers out of the language IF we
cannot find a convincing case for keeping them, and an elegant solution for
passing them around. Note that such a solution may be to eliminate our (in,
out, inout) parameter modifiers in favor of something like 'pass by value'
and 'pass by reference'.

We would appreciate opinions on this. In particular, can you think of any
good use-cases for weak pointers? Would it be a big loss if we left them
out? Note that it is impossible to 'simulate' them with strong pointers, so
it is an important issue.

Thanks in advance for your reply!

--
Michiel Helvensteijn

Reply With Quote
  #2  
Old 09-01-2008, 07:45 PM
lican
Guest
 
Default Re: Strong/weak pointers

Well, to be honest I think if you give a programmer too much power he/
she wouldn't know what to do with it. In this case it would be
something like 'should I use the weak one? no, I have to have at least
one reference... but wait, what will be deleted first?' and such. Try
the strong ones and if you feel something's missing try to add the
weak pointer. Look at Boost library, it has all sorts of smart
pointers. I'll bet half of them is very rarely used. Weak, scoped,
shared, _ptr or _array. I once saw a code that had twice the ptr
types People have to get used to them, read some manuals, know
which type is for what and when not to use some of them. It gets
confusing. Look at C++, one pointer for all things. If you know what
you're doing it can work out. But beware, one mistake and good luck
wasting days on debugging. So I'm guessing it's a trade-off: make it
more general (relatively faster and more error prone) or very
specialized (hard to use and maybe debug). If you can try only the
strong ones first (if it's not that much of a problem of course).
Someone argued that weak pointers are good for things like linked
lists and similar structures. If lists and arrays will be a built-in
type in your language then that problem is solved to some extent.
Maybe some complex composition / inheritance models will need them. On
the other side good planning can solve that too. I once had this
problem with smart pointers in my simple game engine. The base
GameObject class was composed of some other classes i.e.
PhysicsComponent, GraphicsComponent, AIComponent. The thing is that
some components would have to have pointers to the base object so they
could destroy or query it for some reason. I rewrote the whole system
so that either a GameObject HAS a Component or the Component USES the
GameObject. With that the whole pointer problem was resolved.

Reply With Quote
  #3  
Old 09-02-2008, 03:52 AM
Jan Vorbrüggen
Guest
 
Default Re: Strong/weak pointers

You might take Fortran in its current incarnation, 2003, or the previous
one, 95 with the so-called ALLOCATABLES Technical Report (TR) - as
precedent. In Fortran, what you call strong pointers are ALLOCATABLE
variables. Barring a compiler or run-time bug, they cannot leak memory,
and they always are in a deterministic state. Your weak pointers are
called POINTER variables, and they have indeed all the problems you
mention.

> In particular, can you think of any good use-cases for weak pointers?


The TR mentioned above removed a lot of required previous POINTER usage
from Fortran 95, but that was because ALLOCATABLEs (before the TR) were
quite limiteed in use (for instance, the allocatable property could not
be passed across calls). There remain two major reason why they are in
the language:

- You need a POINTER to a structure to build linked lists and similar
data structures.
- Fortran supports multi-dimensional arrays as first-class citizens. A
POINTER can be defined to point to a subset of an array, e.g., a slice
along some dimension. This is a very useful facility than cannot be
easily provided in a different way.

For the historical reason mentioned above, POINTERs have some properties
that they need not have for just thoss application. For instance, you
can allocate and deallocate memory through a POINTER.

As always in Fortran, it's the programmer's responsibility to get
certain things right, e.g., to track POINTER-allocated memory properly.
That's why they should only be used in restricted circumstances. On the
other hand, the language design gives certain guarantees, as mentioned
above, about ALLOCATABLEs that the programmer can rely on. The
combination makes it unnecessary for the compiler and run-time to use
reference counting on dynmically allocated memory.

HTH, Jan

Reply With Quote
  #4  
Old 09-02-2008, 05:48 AM
Marco van de Voort
Guest
 
Default Re: Strong/weak pointers

On 2008-09-01, Michiel Helvensteijn <m.helvensteijn@gmail.com> wrote:

> What we're pondering is: should our language also feature weak
> pointers? A weak pointer cannot be the only pointer to an object
> (since an object requires at least one strong pointer reference to
> exist). And when an object is deallocated because of a lack of strong
> pointer references, all weak pointers referencing it become null. They
> WOULD need their own type notation.


One of the best reasons to add them would be to interface to the outside
world.

The problem is often that you can either wrap each external functionality
(library) manually in some other language to provide your native object
model. But that means that a lot of work has to be done to make an external
library visible from your language, and that must be done in a different
language. And not all users of your language might be fluid in both
framework and the "other" language to do so.

> While this may sound good, weak pointers bring all kinds of conceptual
> nightmares. Like, what should happen if we return a weak pointer from a
> function, or pass it to a function (with in, out or inout)?


I never pondered so much on this issue, but I worked with standard Pascal
and Modula2 that had such restrictions, and nearly all compilers got
extensions over time, and nearly always to interface either with the outside
world, or for own use so that the/a larger part of their own runtime
libraries could be in the native language (as opposed to assembler)

If I had to come up with a solution, and only for interfacing external
functionality, I'd explore the possibility of confining lowlevel features in
general to certain parts (modules) of the program.

The problem is less that a lot of languages have escapes from their own
rigidities for interfacing purposes, but more that these are then allowed
everywhere, far beyond their initial uses.

IOW maybe you can confine lowlevel uses to certain modules, and never let the
weak pointers (and any objects they could still point to) out of those
"interfacing" modules by estabilishing special declarations etc.

That way you at least provide an escape to your own rigid language model,
and allow interfacing to the outside world. Also other stuff associated with
C level imperative languages, but required for interfacing to them, like
aligment, packing, calling convention changes, objects not referenced in
your system, but that need to kept alive because of external callbacks etc
could be kept there in isolation.

This would allow a programmer to siphon data in the own language from
external libs from the more statically and manually allocated world into
your GCed world, and the a bit the same with behaviour (like external
callbacks/events)

And also it might also allow you to do a larger part of the standard
libraries in your own language.

> We're not sure, so we may leave weak pointers out of the language IF we
> cannot find a convincing case for keeping them, and an elegant solution for
> passing them around. Note that such a solution may be to eliminate our (in,
> out, inout) parameter modifiers in favor of something like 'pass by value'
> and 'pass by reference'.
>
> We would appreciate opinions on this. In particular, can you think of any
> good use-cases for weak pointers? Would it be a big loss if we left them
> out? Note that it is impossible to 'simulate' them with strong pointers, so
> it is an important issue.


Start thinking beyond the initial language design, and its daily use. E.g.
if your language is purely mathematical you might not need much external
interfacing.
Reply With Quote
  #5  
Old 09-02-2008, 09:32 AM
Torben Ćgidius Mogensen
Guest
 
Default Re: Strong/weak pointers

Michiel Helvensteijn <m.helvensteijn@gmail.com> writes:

> My colleague and I are discussing the non-trivial problem of how the
> pointers should behave in our new programming language. Most important
> for us is that the pointers are 'safe'. That is, we wish to eliminate
> dangling pointers and memory leaks. Even if this has a runtime and
> memory penalty. Note that there is no pointer arithmetic in our
> language, so it is also safe in that regard.
>
> The hope is that a lot of the runtime penalties may be eliminated with
> the help of our our theoretical proof system. But that's not what this
> post is about.
>
> Strong pointers (at least our version) behave as follows: You may set
> a strong pointer to null, or make it point to an object. As long as at
> least one strong pointer points to an object, the object will never be
> destructed. So this effectively eliminates dangling pointers. To do
> this we use a reference counter attached to the object.
>
> We are aware of the fact that this does not completely eliminate
> memory leaks, since the pointers can indirectly reference
> themselves. We're not sure what to do about this. We may leave it to
> the programmer to avoid. Or we may actually try to find isolated
> cycles at any pointer de/re-allocation (but only in debug mode, of
> course).


Why not use a better garbage collection method? Reference counting is
slow and (as you say) does not handle cycles. A simple and reasonably
fast GC is the two-space stop-and-copy. It can move objects around,
so you should not allow comparison of pointers except for equality
(i.e., avoid < and similar operators) and avoid casting integers to
pointers and vice-versa.

> An important decision we've made is that every variable in the
> language IS a strong pointer. That is, there is no pointer*
> notation. Since an isolated strong pointer behaves just like a simple
> scoped variable, we thought it would be an elegant solution. Besides,
> this will make it easier to implement full closures and such when the
> time comes. To compensate, we have introduced the identity assignment
> operator (<-- as opposed to <-) and the identity comparison operators
> (==, !== as opposed to =, !=).
>
> In most cases, it will be easy to eliminate the runtime cost of this
> decision. If a variable is never used as a pointer, it can be
> allocated on the stack.


I'm not sure what you are trying to say here. Pointers to pointers
are no problem, even if the objects you point to are stack allocated.
You just use the stack as the root set for garbage collection.

> What we're pondering is: should our language also feature weak
> pointers? A weak pointer cannot be the only pointer to an object
> (since an object requires at least one strong pointer reference to
> exist). And when an object is deallocated because of a lack of strong
> pointer references, all weak pointers referencing it become null. They
> WOULD need their own type notation.


Only add weak pointers if you have a very good reason. I suspect you
do not, so don't include them.

Torben

Reply With Quote
  #6  
Old 09-02-2008, 01:20 PM
Dmitry A. Kazakov
Guest
 
Default Re: Strong/weak pointers

On Mon, 01 Sep 2008 15:12:14 +0200, Michiel Helvensteijn wrote:

> An important decision we've made is that every variable in the
> language IS a strong pointer.


Well a variable cannot be null from this follows that it cannot have the
type of a strong reference (assuming that Null_Object_Error is not each
possible contract).

> That is, there is no pointer* notation.


This is a different thing. You say that a strong reference is a subtype of
the target object's type, so it has the interface of. That is OK.
Transparent pointers are very handy. Ada has pointers almost transparent.

> In most cases, it will be easy to eliminate the runtime cost of this
> decision. If a variable is never used as a pointer, it can be
> allocated on the stack.


I am not sure if this is indeed easy.

> What we're pondering is: should our language also feature weak
> pointers? A weak pointer cannot be the only pointer to an object
> (since an object requires at least one strong pointer reference to
> exist). And when an object is deallocated because of a lack of strong
> pointer references, all weak pointers referencing it become null. They
> WOULD need their own type notation.


Weak reference has certainly a different type. But it can implement the
same interface of its target.

> While this may sound good, weak pointers bring all kinds of conceptual
> nightmares. Like, what should happen if we return a weak pointer from a
> function, or pass it to a function (with in, out or inout)?


I don't see any problem, weak references can be made a copyable object. You
would copy it in and out. A copy if a weak reference is another weak
reference.

The problem is elsewhere. If you considered a system where everything is a
reference, that would not go. You need some non referential objects having
value semantics. Reference themselves need to be self values, or else you
need hyper references to references etc. The recursion of indirection must
stop somewhere.

> Note that such a solution may be to eliminate our (in,
> out, inout) parameter modifiers in favor of something like 'pass by value'
> and 'pass by reference'.


By-value/reference is fully orthogonal to in/out. See Ada, which uses all
possible combinations.

> We would appreciate opinions on this. In particular, can you think of any
> good use-cases for weak pointers? Would it be a big loss if we left them
> out? Note that it is impossible to 'simulate' them with strong pointers, so
> it is an important issue.


Weak references are necessary to break circular dependencies between
objects. Strong reference is basically a transitive relation "not before
me." As technical example consider a tree with operations Get_Parent,
Get_Sibling, Get_Child. A node of the tree will have four or so references
of which likely only one will be strong.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Reply With Quote
  #7  
Old 09-03-2008, 06:24 AM
Armel
Guest
 
Default Re: Strong/weak pointers

"Michiel Helvensteijn" <m.helvensteijn@gmail.com> a icrit dans le message de
> [...]
>
> We are aware of the fact that this does not completely eliminate
> memory leaks, since the pointers can indirectly reference
> themselves. We're not sure what to do about this. We may leave it to
> the programmer to avoid. Or we may actually try to find isolated
> cycles at any pointer de/re-allocation (but only in debug mode, of
> course).


Regarding your cycles problem, double reference counting was chosen for
COM/OLE of MS (in this solution "outside of cycles" pointers, such as
pointers which are on stack, global or in 'client' code, are "stronger" than
the internal pointers, reaching zero-count on the stronger provoque
auto-cleaning which in turn leads the internal pointers count to 0, finally
destroying the obejct)

> [...]
> What we're pondering is: should our language also feature weak
> pointers? A weak pointer cannot be the only pointer to an object
> (since an object requires at least one strong pointer reference to
> exist). And when an object is deallocated because of a lack of strong
> pointer references, all weak pointers referencing it become null. They
> WOULD need their own type notation.


[Let's call the object poitned to by a weak pointer a WP object]

My opinion on this is that weak pointers are extremely useful, they serve in
all the cases were you want to "inform" (your object publishes information
toward some another WP object, and you clearly do not want prevention of
destruction of WP object just because the source of event is still alive) or
you want to "look at" (in this case, you just want to know if an object
still exists... so _only_ a weak pointer can do this job).

> While this may sound good, weak pointers bring all kinds of conceptual
> nightmares. Like, what should happen if we return a weak pointer from a
> function, or pass it to a function (with in, out or inout)?


If the parameter or return is not itself a weak pointer, the passed in/out
weak pointer need to be upgraded to a strong pointer first. this is simply
the opposite of what you may find 'normal', that is, setting a strong
pointer into a weak pointer.

HIH
Armel

Reply With Quote
  #8  
Old 09-03-2008, 01:58 PM
glen herrmannsfeldt
Guest
 
Default Re: Strong/weak pointers

lican wrote:

> Well, to be honest I think if you give a programmer too much power he/
> she wouldn't know what to do with it. In this case it would be
> something like 'should I use the weak one? no, I have to have at least
> one reference... but wait, what will be deleted first?' and such.


The idea of Strong/Weak pointers reminds me most of ordinary and
weak external references in some systems. A weak external reference
won't pull in a library routine, but will be resolved if the symbol
is already defined, either through another non-weak reference or
as an alternate entry point to an already included module.

They aren't needed all that often, but sometimes can be useful.

I don't understand the connection to calls to routines in
other languages. I sort of see the connection to Fortran
ALLOCATABLE, but it seems a weak connection to me.

At first I thought that it would helps with the circular
linked list problem, but it seems that it doesn't help
all that much.

Are there any good examples of how to use them?

-- glen
Reply With Quote
  #9  
Old 09-04-2008, 01:33 AM
JW
Guest
 
Default Re: Strong/weak pointers

> Strong pointers (at least our version) behave as follows: You may set
> a strong pointer to null, or make it point to an object. As long as at
> least one strong pointer points to an object, the object will never be
> destructed. So this effectively eliminates dangling pointers. To do
> this we use a reference counter attached to the object.


CPython does it precisely this way.

> We are aware of the fact that this does not completely eliminate
> memory leaks, since the pointers can indirectly reference
> themselves. We're not sure what to do about this. We may leave it to
> the programmer to avoid. Or we may actually try to find isolated
> cycles at any pointer de/re-allocation (but only in debug mode, of
> course).


Practically, cycles can't be avoided -- a node in DOM tree, for
example, would probably have links (pointers) to it's parent, siblings/
left-right-sibling and children. The gc module of Python finds such
cycles, and moves all inaccessible cycles into a global list. They are
_not_ destroyed automatically, since the runtime wouldn't know where
to break the cycle.

Weak pointers are a solution to this problem. For example, only parent-
child is a strong pointer, rest are all weak pointers. Links formed
using weak pointers do not form edges in the cycle.

> An important decision we've made is that every variable in the
> language IS a strong pointer. That is, there is no pointer*
> notation. Since an isolated strong pointer behaves just like a simple
> scoped variable, we thought it would be an elegant solution. Besides,
> this will make it easier to implement full closures and such when the
> time comes. To compensate, we have introduced the identity assignment
> operator (<-- as opposed to <-) and the identity comparison operators
> (==, !== as opposed to =, !=).
> What we're pondering is: should our language also feature weak
> pointers? A weak pointer cannot be the only pointer to an object
> (since an object requires at least one strong pointer reference to
> exist). And when an object is deallocated because of a lack of strong
> pointer references, all weak pointers referencing it become null. They
> WOULD need their own type notation.


Weak pointers can be simulated (at quite some runtime cost) by have a
"weakref" object that acually holds a strong pointer, and gets called
back when the pointee is deleted.

> While this may sound good, weak pointers bring all kinds of conceptual
> nightmares. Like, what should happen if we return a weak pointer from a
> function, or pass it to a function (with in, out or inout)?


Having a (first-class) "weakref" object solves this problem (assuming
your language has support for first-class objects).

> We would appreciate opinions on this. In particular, can you think of any
> good use-cases for weak pointers? Would it be a big loss if we left them
> out? Note that it is impossible to 'simulate' them with strong pointers, so
> it is an important issue.


Like the DOM tree example above. It is usually a common requirement to
have one object own another, and the second object having a link to
the first for navigation.

-JW.

Reply With Quote
  #10  
Old 09-04-2008, 11:46 AM
Christoffer Lernö
Guest
 
Default Re: Strong/weak pointers

On Sep 2, 3:32 pm, torb...@pc-003.diku.dk (Torben Fgidius Mogensen)
wrote:
> Michiel Helvensteijn <m.helvenste...@gmail.com> writes:
> > We are aware of the fact that this does not completely eliminate
> > memory leaks, since the pointers can indirectly reference
> > themselves. We're not sure what to do about this. We may leave it to
> > the programmer to avoid. Or we may actually try to find isolated
> > cycles at any pointer de/re-allocation (but only in debug mode, of
> > course).

>
> Why not use a better garbage collection method? Reference counting is
> slow and (as you say) does not handle cycles. A simple and reasonably
> fast GC is the two-space stop-and-copy. It can move objects around,
> so you should not allow comparison of pointers except for equality
> (i.e., avoid < and similar operators) and avoid casting integers to
> pointers and vice-versa.


I read this paper
http://www.research.ibm.com/people/d...Concurrent.pdf,
which seems to claim that a pretty efficient GC can be made that
handles cyclic dependencies.

Having only been exposed to java's different GCs, I've come to dread
the deferred collection of hundreds of Mb when a full GC finally feel
it's time to kick in. I believe that most of the time its better to
have a higher average GC cost, than having a lower _average_ time
spent in GC where the a single pause still can be long enough to cause
an animation to stutter.

Working with IntelliJ IDEA, I sometimes have it freeze for around 30
seconds while GC-ing something on the order of 400 Mb. From this and
other java apps as well as my own work it is obvious that selecting
the optimal tracing GC and GC-settings to use is not trivial. So if I
could choose between something a bit slower but more predictable
versus (on average) fast but unpredictable, I think most applications
would prefer the former.

That's why I wonder if reference counting isn't worth considering,
even with the obvious drawbacks.
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 08:59 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.