| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Hi! My colleague and I are discussing the non-trivial problem of how the pointers should behave in our new programming language. Most important for us is that the pointers are 'safe'. That is, we wish to eliminate dangling pointers and memory leaks. Even if this has a runtime and memory penalty. Note that there is no pointer arithmetic in our language, so it is also safe in that regard. The hope is that a lot of the runtime penalties may be eliminated with the help of our our theoretical proof system. But that's not what this post is about. Strong pointers (at least our version) behave as follows: You may set a strong pointer to null, or make it point to an object. As long as at least one strong pointer points to an object, the object will never be destructed. So this effectively eliminates dangling pointers. To do this we use a reference counter attached to the object. We are aware of the fact that this does not completely eliminate memory leaks, since the pointers can indirectly reference themselves. We're not sure what to do about this. We may leave it to the programmer to avoid. Or we may actually try to find isolated cycles at any pointer de/re-allocation (but only in debug mode, of course). An important decision we've made is that every variable in the language IS a strong pointer. That is, there is no pointer* notation. Since an isolated strong pointer behaves just like a simple scoped variable, we thought it would be an elegant solution. Besides, this will make it easier to implement full closures and such when the time comes. To compensate, we have introduced the identity assignment operator (<-- as opposed to <-) and the identity comparison operators (==, !== as opposed to =, !=). In most cases, it will be easy to eliminate the runtime cost of this decision. If a variable is never used as a pointer, it can be allocated on the stack. What we're pondering is: should our language also feature weak pointers? A weak pointer cannot be the only pointer to an object (since an object requires at least one strong pointer reference to exist). And when an object is deallocated because of a lack of strong pointer references, all weak pointers referencing it become null. They WOULD need their own type notation. While this may sound good, weak pointers bring all kinds of conceptual nightmares. Like, what should happen if we return a weak pointer from a function, or pass it to a function (with in, out or inout)? We're not sure, so we may leave weak pointers out of the language IF we cannot find a convincing case for keeping them, and an elegant solution for passing them around. Note that such a solution may be to eliminate our (in, out, inout) parameter modifiers in favor of something like 'pass by value' and 'pass by reference'. We would appreciate opinions on this. In particular, can you think of any good use-cases for weak pointers? Would it be a big loss if we left them out? Note that it is impossible to 'simulate' them with strong pointers, so it is an important issue. Thanks in advance for your reply! -- Michiel Helvensteijn |
|
#2
| |||
| |||
| Well, to be honest I think if you give a programmer too much power he/ she wouldn't know what to do with it. In this case it would be something like 'should I use the weak one? no, I have to have at least one reference... but wait, what will be deleted first?' and such. Try the strong ones and if you feel something's missing try to add the weak pointer. Look at Boost library, it has all sorts of smart pointers. I'll bet half of them is very rarely used. Weak, scoped, shared, _ptr or _array. I once saw a code that had twice the ptr types People have to get used to them, read some manuals, knowwhich type is for what and when not to use some of them. It gets confusing. Look at C++, one pointer for all things. If you know what you're doing it can work out. But beware, one mistake and good luck wasting days on debugging. So I'm guessing it's a trade-off: make it more general (relatively faster and more error prone) or very specialized (hard to use and maybe debug). If you can try only the strong ones first (if it's not that much of a problem of course). Someone argued that weak pointers are good for things like linked lists and similar structures. If lists and arrays will be a built-in type in your language then that problem is solved to some extent. Maybe some complex composition / inheritance models will need them. On the other side good planning can solve that too. I once had this problem with smart pointers in my simple game engine. The base GameObject class was composed of some other classes i.e. PhysicsComponent, GraphicsComponent, AIComponent. The thing is that some components would have to have pointers to the base object so they could destroy or query it for some reason. I rewrote the whole system so that either a GameObject HAS a Component or the Component USES the GameObject. With that the whole pointer problem was resolved. |
|
#3
| |||
| |||
| You might take Fortran in its current incarnation, 2003, or the previous one, 95 with the so-called ALLOCATABLES Technical Report (TR) - as precedent. In Fortran, what you call strong pointers are ALLOCATABLE variables. Barring a compiler or run-time bug, they cannot leak memory, and they always are in a deterministic state. Your weak pointers are called POINTER variables, and they have indeed all the problems you mention. > In particular, can you think of any good use-cases for weak pointers? The TR mentioned above removed a lot of required previous POINTER usage from Fortran 95, but that was because ALLOCATABLEs (before the TR) were quite limiteed in use (for instance, the allocatable property could not be passed across calls). There remain two major reason why they are in the language: - You need a POINTER to a structure to build linked lists and similar data structures. - Fortran supports multi-dimensional arrays as first-class citizens. A POINTER can be defined to point to a subset of an array, e.g., a slice along some dimension. This is a very useful facility than cannot be easily provided in a different way. For the historical reason mentioned above, POINTERs have some properties that they need not have for just thoss application. For instance, you can allocate and deallocate memory through a POINTER. As always in Fortran, it's the programmer's responsibility to get certain things right, e.g., to track POINTER-allocated memory properly. That's why they should only be used in restricted circumstances. On the other hand, the language design gives certain guarantees, as mentioned above, about ALLOCATABLEs that the programmer can rely on. The combination makes it unnecessary for the compiler and run-time to use reference counting on dynmically allocated memory. HTH, Jan |
|
#4
| |||
| |||
| On 2008-09-01, Michiel Helvensteijn <m.helvensteijn@gmail.com> wrote: > What we're pondering is: should our language also feature weak > pointers? A weak pointer cannot be the only pointer to an object > (since an object requires at least one strong pointer reference to > exist). And when an object is deallocated because of a lack of strong > pointer references, all weak pointers referencing it become null. They > WOULD need their own type notation. One of the best reasons to add them would be to interface to the outside world. The problem is often that you can either wrap each external functionality (library) manually in some other language to provide your native object model. But that means that a lot of work has to be done to make an external library visible from your language, and that must be done in a different language. And not all users of your language might be fluid in both framework and the "other" language to do so. > While this may sound good, weak pointers bring all kinds of conceptual > nightmares. Like, what should happen if we return a weak pointer from a > function, or pass it to a function (with in, out or inout)? I never pondered so much on this issue, but I worked with standard Pascal and Modula2 that had such restrictions, and nearly all compilers got extensions over time, and nearly always to interface either with the outside world, or for own use so that the/a larger part of their own runtime libraries could be in the native language (as opposed to assembler) If I had to come up with a solution, and only for interfacing external functionality, I'd explore the possibility of confining lowlevel features in general to certain parts (modules) of the program. The problem is less that a lot of languages have escapes from their own rigidities for interfacing purposes, but more that these are then allowed everywhere, far beyond their initial uses. IOW maybe you can confine lowlevel uses to certain modules, and never let the weak pointers (and any objects they could still point to) out of those "interfacing" modules by estabilishing special declarations etc. That way you at least provide an escape to your own rigid language model, and allow interfacing to the outside world. Also other stuff associated with C level imperative languages, but required for interfacing to them, like aligment, packing, calling convention changes, objects not referenced in your system, but that need to kept alive because of external callbacks etc could be kept there in isolation. This would allow a programmer to siphon data in the own language from external libs from the more statically and manually allocated world into your GCed world, and the a bit the same with behaviour (like external callbacks/events) And also it might also allow you to do a larger part of the standard libraries in your own language. > We're not sure, so we may leave weak pointers out of the language IF we > cannot find a convincing case for keeping them, and an elegant solution for > passing them around. Note that such a solution may be to eliminate our (in, > out, inout) parameter modifiers in favor of something like 'pass by value' > and 'pass by reference'. > > We would appreciate opinions on this. In particular, can you think of any > good use-cases for weak pointers? Would it be a big loss if we left them > out? Note that it is impossible to 'simulate' them with strong pointers, so > it is an important issue. Start thinking beyond the initial language design, and its daily use. E.g. if your language is purely mathematical you might not need much external interfacing. |
|
#5
| |||
| |||
| Michiel Helvensteijn <m.helvensteijn@gmail.com> writes: > My colleague and I are discussing the non-trivial problem of how the > pointers should behave in our new programming language. Most important > for us is that the pointers are 'safe'. That is, we wish to eliminate > dangling pointers and memory leaks. Even if this has a runtime and > memory penalty. Note that there is no pointer arithmetic in our > language, so it is also safe in that regard. > > The hope is that a lot of the runtime penalties may be eliminated with > the help of our our theoretical proof system. But that's not what this > post is about. > > Strong pointers (at least our version) behave as follows: You may set > a strong pointer to null, or make it point to an object. As long as at > least one strong pointer points to an object, the object will never be > destructed. So this effectively eliminates dangling pointers. To do > this we use a reference counter attached to the object. > > We are aware of the fact that this does not completely eliminate > memory leaks, since the pointers can indirectly reference > themselves. We're not sure what to do about this. We may leave it to > the programmer to avoid. Or we may actually try to find isolated > cycles at any pointer de/re-allocation (but only in debug mode, of > course). Why not use a better garbage collection method? Reference counting is slow and (as you say) does not handle cycles. A simple and reasonably fast GC is the two-space stop-and-copy. It can move objects around, so you should not allow comparison of pointers except for equality (i.e., avoid < and similar operators) and avoid casting integers to pointers and vice-versa. > An important decision we've made is that every variable in the > language IS a strong pointer. That is, there is no pointer* > notation. Since an isolated strong pointer behaves just like a simple > scoped variable, we thought it would be an elegant solution. Besides, > this will make it easier to implement full closures and such when the > time comes. To compensate, we have introduced the identity assignment > operator (<-- as opposed to <-) and the identity comparison operators > (==, !== as opposed to =, !=). > > In most cases, it will be easy to eliminate the runtime cost of this > decision. If a variable is never used as a pointer, it can be > allocated on the stack. I'm not sure what you are trying to say here. Pointers to pointers are no problem, even if the objects you point to are stack allocated. You just use the stack as the root set for garbage collection. > What we're pondering is: should our language also feature weak > pointers? A weak pointer cannot be the only pointer to an object > (since an object requires at least one strong pointer reference to > exist). And when an object is deallocated because of a lack of strong > pointer references, all weak pointers referencing it become null. They > WOULD need their own type notation. Only add weak pointers if you have a very good reason. I suspect you do not, so don't include them. Torben |
|
#6
| |||
| |||
| On Mon, 01 Sep 2008 15:12:14 +0200, Michiel Helvensteijn wrote: > An important decision we've made is that every variable in the > language IS a strong pointer. Well a variable cannot be null from this follows that it cannot have the type of a strong reference (assuming that Null_Object_Error is not each possible contract). > That is, there is no pointer* notation. This is a different thing. You say that a strong reference is a subtype of the target object's type, so it has the interface of. That is OK. Transparent pointers are very handy. Ada has pointers almost transparent. > In most cases, it will be easy to eliminate the runtime cost of this > decision. If a variable is never used as a pointer, it can be > allocated on the stack. I am not sure if this is indeed easy. > What we're pondering is: should our language also feature weak > pointers? A weak pointer cannot be the only pointer to an object > (since an object requires at least one strong pointer reference to > exist). And when an object is deallocated because of a lack of strong > pointer references, all weak pointers referencing it become null. They > WOULD need their own type notation. Weak reference has certainly a different type. But it can implement the same interface of its target. > While this may sound good, weak pointers bring all kinds of conceptual > nightmares. Like, what should happen if we return a weak pointer from a > function, or pass it to a function (with in, out or inout)? I don't see any problem, weak references can be made a copyable object. You would copy it in and out. A copy if a weak reference is another weak reference. The problem is elsewhere. If you considered a system where everything is a reference, that would not go. You need some non referential objects having value semantics. Reference themselves need to be self values, or else you need hyper references to references etc. The recursion of indirection must stop somewhere. > Note that such a solution may be to eliminate our (in, > out, inout) parameter modifiers in favor of something like 'pass by value' > and 'pass by reference'. By-value/reference is fully orthogonal to in/out. See Ada, which uses all possible combinations. > We would appreciate opinions on this. In particular, can you think of any > good use-cases for weak pointers? Would it be a big loss if we left them > out? Note that it is impossible to 'simulate' them with strong pointers, so > it is an important issue. Weak references are necessary to break circular dependencies between objects. Strong reference is basically a transitive relation "not before me." As technical example consider a tree with operations Get_Parent, Get_Sibling, Get_Child. A node of the tree will have four or so references of which likely only one will be strong. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de |
|
#7
| |||
| |||
| "Michiel Helvensteijn" <m.helvensteijn@gmail.com> a icrit dans le message de > [...] > > We are aware of the fact that this does not completely eliminate > memory leaks, since the pointers can indirectly reference > themselves. We're not sure what to do about this. We may leave it to > the programmer to avoid. Or we may actually try to find isolated > cycles at any pointer de/re-allocation (but only in debug mode, of > course). Regarding your cycles problem, double reference counting was chosen for COM/OLE of MS (in this solution "outside of cycles" pointers, such as pointers which are on stack, global or in 'client' code, are "stronger" than the internal pointers, reaching zero-count on the stronger provoque auto-cleaning which in turn leads the internal pointers count to 0, finally destroying the obejct) > [...] > What we're pondering is: should our language also feature weak > pointers? A weak pointer cannot be the only pointer to an object > (since an object requires at least one strong pointer reference to > exist). And when an object is deallocated because of a lack of strong > pointer references, all weak pointers referencing it become null. They > WOULD need their own type notation. [Let's call the object poitned to by a weak pointer a WP object] My opinion on this is that weak pointers are extremely useful, they serve in all the cases were you want to "inform" (your object publishes information toward some another WP object, and you clearly do not want prevention of destruction of WP object just because the source of event is still alive) or you want to "look at" (in this case, you just want to know if an object still exists... so _only_ a weak pointer can do this job). > While this may sound good, weak pointers bring all kinds of conceptual > nightmares. Like, what should happen if we return a weak pointer from a > function, or pass it to a function (with in, out or inout)? If the parameter or return is not itself a weak pointer, the passed in/out weak pointer need to be upgraded to a strong pointer first. this is simply the opposite of what you may find 'normal', that is, setting a strong pointer into a weak pointer. HIH Armel |
|
#8
| |||
| |||
| lican wrote: > Well, to be honest I think if you give a programmer too much power he/ > she wouldn't know what to do with it. In this case it would be > something like 'should I use the weak one? no, I have to have at least > one reference... but wait, what will be deleted first?' and such. The idea of Strong/Weak pointers reminds me most of ordinary and weak external references in some systems. A weak external reference won't pull in a library routine, but will be resolved if the symbol is already defined, either through another non-weak reference or as an alternate entry point to an already included module. They aren't needed all that often, but sometimes can be useful. I don't understand the connection to calls to routines in other languages. I sort of see the connection to Fortran ALLOCATABLE, but it seems a weak connection to me. At first I thought that it would helps with the circular linked list problem, but it seems that it doesn't help all that much. Are there any good examples of how to use them? -- glen |
|
#9
| |||
| |||
| > Strong pointers (at least our version) behave as follows: You may set > a strong pointer to null, or make it point to an object. As long as at > least one strong pointer points to an object, the object will never be > destructed. So this effectively eliminates dangling pointers. To do > this we use a reference counter attached to the object. CPython does it precisely this way. > We are aware of the fact that this does not completely eliminate > memory leaks, since the pointers can indirectly reference > themselves. We're not sure what to do about this. We may leave it to > the programmer to avoid. Or we may actually try to find isolated > cycles at any pointer de/re-allocation (but only in debug mode, of > course). Practically, cycles can't be avoided -- a node in DOM tree, for example, would probably have links (pointers) to it's parent, siblings/ left-right-sibling and children. The gc module of Python finds such cycles, and moves all inaccessible cycles into a global list. They are _not_ destroyed automatically, since the runtime wouldn't know where to break the cycle. Weak pointers are a solution to this problem. For example, only parent- child is a strong pointer, rest are all weak pointers. Links formed using weak pointers do not form edges in the cycle. > An important decision we've made is that every variable in the > language IS a strong pointer. That is, there is no pointer* > notation. Since an isolated strong pointer behaves just like a simple > scoped variable, we thought it would be an elegant solution. Besides, > this will make it easier to implement full closures and such when the > time comes. To compensate, we have introduced the identity assignment > operator (<-- as opposed to <-) and the identity comparison operators > (==, !== as opposed to =, !=). > What we're pondering is: should our language also feature weak > pointers? A weak pointer cannot be the only pointer to an object > (since an object requires at least one strong pointer reference to > exist). And when an object is deallocated because of a lack of strong > pointer references, all weak pointers referencing it become null. They > WOULD need their own type notation. Weak pointers can be simulated (at quite some runtime cost) by have a "weakref" object that acually holds a strong pointer, and gets called back when the pointee is deleted. > While this may sound good, weak pointers bring all kinds of conceptual > nightmares. Like, what should happen if we return a weak pointer from a > function, or pass it to a function (with in, out or inout)? Having a (first-class) "weakref" object solves this problem (assuming your language has support for first-class objects). > We would appreciate opinions on this. In particular, can you think of any > good use-cases for weak pointers? Would it be a big loss if we left them > out? Note that it is impossible to 'simulate' them with strong pointers, so > it is an important issue. Like the DOM tree example above. It is usually a common requirement to have one object own another, and the second object having a link to the first for navigation. -JW. |
|
#10
| |||
| |||
| On Sep 2, 3:32 pm, torb...@pc-003.diku.dk (Torben Fgidius Mogensen) wrote: > Michiel Helvensteijn <m.helvenste...@gmail.com> writes: > > We are aware of the fact that this does not completely eliminate > > memory leaks, since the pointers can indirectly reference > > themselves. We're not sure what to do about this. We may leave it to > > the programmer to avoid. Or we may actually try to find isolated > > cycles at any pointer de/re-allocation (but only in debug mode, of > > course). > > Why not use a better garbage collection method? Reference counting is > slow and (as you say) does not handle cycles. A simple and reasonably > fast GC is the two-space stop-and-copy. It can move objects around, > so you should not allow comparison of pointers except for equality > (i.e., avoid < and similar operators) and avoid casting integers to > pointers and vice-versa. I read this paper http://www.research.ibm.com/people/d...Concurrent.pdf, which seems to claim that a pretty efficient GC can be made that handles cyclic dependencies. Having only been exposed to java's different GCs, I've come to dread the deferred collection of hundreds of Mb when a full GC finally feel it's time to kick in. I believe that most of the time its better to have a higher average GC cost, than having a lower _average_ time spent in GC where the a single pause still can be long enough to cause an animation to stutter. Working with IntelliJ IDEA, I sometimes have it freeze for around 30 seconds while GC-ing something on the order of 400 Mb. From this and other java apps as well as my own work it is obvious that selecting the optimal tracing GC and GC-settings to use is not trivial. So if I could choose between something a bit slower but more predictable versus (on average) fast but unpredictable, I think most applications would prefer the former. That's why I wonder if reference counting isn't worth considering, even with the obvious drawbacks. |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.