Dict sharing vs. duplication - TCL

This is a discussion on Dict sharing vs. duplication - TCL ; Hi, Now that 8.5 is out thanks to intensive work from our beloved Conspicuous Tcl Illuminati, I'm getting prepared to use dicts really... Looking at the dict manpage, I first discovered that no obvious naming scheme tells in-place operations from ...

+ Reply to Thread
Page 1 of 11 1 2 3 ... LastLast
Results 1 to 10 of 104

Dict sharing vs. duplication

  1. Default Dict sharing vs. duplication

    Hi,

    Now that 8.5 is out thanks to intensive work from our beloved
    Conspicuous Tcl Illuminati, I'm getting prepared to use dicts
    really...

    Looking at the dict manpage, I first discovered that no obvious naming
    scheme tells in-place operations from functional ones. Of course
    that's not worse than lappend/lreplace, and reusing the same names
    makes sense !

    Then, I wondered about what copy-on-write optimization became in dict
    world.
    Indeed, if $x is a dict value,

    set y $x

    copies it, in an efficient way (by reference, incrementing just the
    refcount).
    Now, the dict value is "shared" (refcount>=2). So as soon as someone
    touches it:

    dict incr x foo

    it gets duplicated, so that $x and $y can have separate destinies.
    That's copy-on-write.

    Now suppose I have a huge dict, stored in some global, and that I pass
    its value back and forth to various routines. Its refcount will go up
    with each level of proc call where it is passed as an argument, since
    the argument is a local holding a bona fide reference. Suddenly, under
    a stack depth of (say) three, I decide to update the dict, eg with a
    [dict incr] on the global variable. Won't the huge dict be immediately
    duplicated ?

    If the above story holds true, then passing a dict value is not a
    direct replacement for passing an array name for upvarring: it has
    stringent restrictions, and careful refcounting is needed as soon as r/
    w operations occur...
    Disclaimer: this is no criticism. This semantics is traditional[*] for
    all first-class values in Tcl; however if again it's the case, it
    should just be advertised so that people don't get caught by
    surprise...

    Or am I missing something ?

    -Alex
    [*] But is it necessary ? What would break in Tcl if we allowed
    mutable shared values, eg lists or even strings ?
    Granted, one would need to backpropagate strep invalidation to all
    referers. This would have a cost. But are there other drawbacks ?

  2. Default Re: Dict sharing vs. duplication

    Alexandre Ferrieux schrieb:
    > Hi,
    >
    > Now that 8.5 is out thanks to intensive work from our beloved
    > Conspicuous Tcl Illuminati, I'm getting prepared to use dicts
    > really...
    >
    > Looking at the dict manpage, I first discovered that no obvious naming
    > scheme tells in-place operations from functional ones. Of course
    > that's not worse than lappend/lreplace, and reusing the same names
    > makes sense !
    >
    > Then, I wondered about what copy-on-write optimization became in dict
    > world.
    > Indeed, if $x is a dict value,
    >
    > set y $x
    >
    > copies it, in an efficient way (by reference, incrementing just the
    > refcount).
    > Now, the dict value is "shared" (refcount>=2). So as soon as someone
    > touches it:
    >
    > dict incr x foo
    >
    > it gets duplicated, so that $x and $y can have separate destinies.
    > That's copy-on-write.
    >
    > Now suppose I have a huge dict, stored in some global, and that I pass
    > its value back and forth to various routines. Its refcount will go up
    > with each level of proc call where it is passed as an argument, since
    > the argument is a local holding a bona fide reference. Suddenly, under
    > a stack depth of (say) three, I decide to update the dict, eg with a
    > [dict incr] on the global variable. Won't the huge dict be immediately
    > duplicated ?

    The Tcl_Obj representing your 'foo' entry at least needs to be
    duplicated and incremented. And as this changes the whole dict, you also
    get a duplicate of the hash table structure.

    DictIncrCmd in generic/tclDictObj.c tells you the details about this,
    there is some optimization in place so that huge string reps aren't
    copied. The code in DupDictInternalRep then tells you that the other
    values aren't copied, but only refcount increased.

    So yes, there is a copy, but its kept close to the minimum possible. You
    need a new hashtable struct...

    Michael


  3. Default Re: Dict sharing vs. duplication

    On Dec 20, 3:42 pm, Michael Schlenker <schl...@uni-oldenburg.de>
    wrote:
    > Alexandre Ferrieux schrieb:
    >
    > > Hi,

    >
    > > Now that 8.5 is out thanks to intensive work from our beloved
    > > Conspicuous Tcl Illuminati, I'm getting prepared to use dicts
    > > really...

    >
    > > Looking at the dict manpage, I first discovered that no obvious naming
    > > scheme tells in-place operations from functional ones. Of course
    > > that's not worse than lappend/lreplace, and reusing the same names
    > > makes sense !

    >
    > > Then, I wondered about what copy-on-write optimization became in dict
    > > world.
    > > Indeed, if $x is a dict value,

    >
    > > set y $x

    >
    > > copies it, in an efficient way (by reference, incrementing just the
    > > refcount).
    > > Now, the dict value is "shared" (refcount>=2). So as soon as someone
    > > touches it:

    >
    > > dict incr x foo

    >
    > > it gets duplicated, so that $x and $y can have separate destinies.
    > > That's copy-on-write.

    >
    > > Now suppose I have a huge dict, stored in some global, and that I pass
    > > its value back and forth to various routines. Its refcount will go up
    > > with each level of proc call where it is passed as an argument, since
    > > the argument is a local holding a bona fide reference. Suddenly, under
    > > a stack depth of (say) three, I decide to update the dict, eg with a
    > > [dict incr] on the global variable. Won't the huge dict be immediately
    > > duplicated ?

    >
    > The Tcl_Obj representing your 'foo' entry at least needs to be
    > duplicated and incremented. And as this changes the whole dict, you also
    > get a duplicate of the hash table structure.
    >
    > DictIncrCmd in generic/tclDictObj.c tells you the details about this,
    > there is some optimization in place so that huge string reps aren't
    > copied. The code in DupDictInternalRep then tells you that the other
    > values aren't copied, but only refcount increased.
    >
    > So yes, there is a copy, but its kept close to the minimum possible. You
    > need a new hashtable struct...
    >
    > Michael


    This is interesting but I would like to ask: is this just a
    performance convenience that we should not think about and rely upon?
    It seems that if you want to use a dict as a single reference, you
    should just pass around the name of the dict, or use upvar, etc. Dict
    offers the potential to make a copy, which is cool, it seems great
    that the implementation tries to minimize copying, I just wonder if
    Tcl'ers should think about this. In Tcl the best way to not duplicate
    objects is to refer to them directly or through upvar.

  4. Default Structures vs. Dicts Re: Dict sharing vs. duplication

    Alexandre Ferrieux wrote:
    > Hi,
    >
    > Now that 8.5 is out thanks to intensive work from our beloved
    > Conspicuous Tcl Illuminati, I'm getting prepared to use dicts
    > really...


    I think dicts have a place in Tcl for some data patterns. They
    serialize automatically to a string rep, and that's quite useful for
    some things, but I can't help but desire to have [structure] be in the
    core too, and I think others may see a reason for them too.

    structure is the basis for NexTk.


    proc Foo-check {inst arg} {
    #When this returns 1 then the structure key will be set.
    #When this returns 0 or an error the structure key will not be set.
    expr {$arg eq "foo" || $arg eq "bar"}
    }

    proc Foo-dance {inst args} {
    puts "Foo is dancing with $args"
    }

    proc Foo {inst args} {
    structure $inst

    $inst -text "" -metadata ""

    structure-make-method $inst dance[list Foo-dance $inst]
    #This prevents typos from creating new data structure keys.
    structure-lock-creation $inst

    if {[llength $args]} {
    if {[catch {$inst {*}$args} err]} {
    structure-free $inst ;#or rename $inst {}
    return -code error $err
    }
    }

    structure-key-callback $inst -metadata[list Foo-check $inst]

    return $inst
    }


    % Foo .f -text "Hello"
    ..f
    % .f -text
    Hello
    % .f -metadata
    % .f -metadata foo
    1
    % .f -metadata
    foo
    % .f -metadata bozovalue
    0
    % .f -metadata
    foo
    % .f dance Happy Elves
    Foo is dancing with Happy Elves


    There are other neat things, like traces for keys, and per key/structure
    locking. Structure traces are more like Tcl traces, in that they run
    after the variable/key has been set. The structure callbacks make the
    common pattern of verifying an object's valid input patterns much
    easier.

    I basically use [structure] as a prototype object system (prototype in
    OO terms, rather than as in a prototype that is for testing.) Each
    ntk_window is a structure instance from which a window/widget tree is
    formed based on a _children[list] member.


    George

  5. Default Re: Dict sharing vs. duplication

    On Dec 21, 12:42 am, Michael Schlenker <schl...@uni-oldenburg.de>
    wrote:
    > Alexandre Ferrieux schrieb:
    >
    > > Won't the huge dict be immediately
    > > duplicated ?

    >
    > So yes, there is a copy, but its kept close to the minimum possible. You
    > need a new hashtable struct...


    Yeah, the "minimum possible" here is still way too much.
    The point of my post was to check whether a smart trick existed to
    avoid duplication entirely, just like there is never any automatic
    duplication of an array. It seems that no such trick exists, right ?

    -Alex

  6. Default Re: Dict sharing vs. duplication

    On Dec 21, 3:29 am, "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
    >
    > In Tcl the best way to not duplicate
    > objects is to refer to them directly or through upvar.


    What we've just shown is that [array/upvar] is the *only* way to be
    sure to avoid duplications. As soon as your dict value gets bound to a
    formal parameter or other variable, you're at risk.

    -Alex



  7. Default Re: Dict sharing vs. duplication

    On Dec 20, 11:39 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
    wrote:
    > On Dec 21, 3:29 am, "tom.rmadilo" <tom.rmad...@gmail.com> wrote:


    > > In Tcl the best way to not duplicate
    > > objects is to refer to them directly or through upvar.

    >
    > What we've just shown is that [array/upvar] is the *only* way to be
    > sure to avoid duplications. As soon as your dict value gets bound to a
    > formal parameter or other variable, you're at risk.


    I admit I was somewhat confused by dict, but they still have names
    just like anything else in Tcl, and if you pass the name only, it
    shouldn't 'bind' anything. I'm actually more confused by the
    terminology of bound values and formal parameters applied to Tcl.

    The way I seem to look at this is that a dict is somewhat like an
    array, but you can easily get its value and pass it around. I'm not a
    fan (yet) by any means, but the ability to easily get the value could
    be important in certain situations. For one, I'm thinking of the error
    code stuff. It occurs to me that these are array-like, but if an array
    were created with an inner proc, it would go out of scope and
    evaporate, whereas a dict could be passed back to the caller and added
    to as things unwind. I'm just guessing here, but it seems like the
    ability to pass array-link stuff back and add to it is something
    unique.

    It is pretty surprising that you could get partial non-copy of data
    with a dict (or partial reference, whatever). Maybe it saves memory or
    whatever, but it seems to me it is much safer to decide where you are
    going to keep the single copy of your data, and just refer to it by
    name (or via upvar). Thinking about saving space by guessing what
    internals are doing, or not, and when, seems like a diversion. Arrays
    work very well, their behavior is well understood. Dicts are pretty
    new and I think it will take some time to understand when to use them.
    I'm a pretty verbose writer, but dicts seem way too verbose for my
    taste, and using 'for' when you get 'foreach' is somewhat strange.

    As a data structure I don't find much of a use for this one. I use an
    array to enforce unique keys and I use lists to enforce order. I
    combine the two in a number of ways: array of lists, list of array
    names, etc. But dict is similar to a structure, but is also somewhat
    squishy for the way I think about data. The one thing that I like is
    that they are easy to serialize, so you can easily write them out or
    pass them around.

  8. Default Re: Dict sharing vs. duplication

    Alexandre Ferrieux wrote:
    > On Dec 21, 12:42�am, Michael Schlenker <schl...@uni-oldenburg.de>
    > wrote:
    > > Alexandre Ferrieux schrieb:
    > >
    > > > Won't the huge dict be immediately
    > > > duplicated ?

    > >
    > > So yes, there is a copy, but its kept close to the minimum possible. You
    > > need a new hashtable struct...

    >
    > Yeah, the "minimum possible" here is still way too much.
    > The point of my post was to check whether a smart trick existed to
    > avoid duplication entirely, just like there is never any automatic
    > duplication of an array. It seems that no such trick exists, right ?


    K combinator when passing the value, so you do not retain a copy...

    Michael

  9. Default Re: Dict sharing vs. duplication

    schlenk@uni-oldenburg.de wrote:
    > Alexandre Ferrieux wrote:
    >> On Dec 21, 12:42�am, Michael Schlenker <schl...@uni-oldenburg.de>
    >> wrote:
    >>> Alexandre Ferrieux schrieb:
    >>>
    >>>> Won't the huge dict be immediately
    >>>> duplicated ?
    >>> So yes, there is a copy, but its kept close to the minimum possible. You
    >>> need a new hashtable struct...

    >> Yeah, the "minimum possible" here is still way too much.
    >> The point of my post was to check whether a smart trick existed to
    >> avoid duplication entirely, just like there is never any automatic
    >> duplication of an array. It seems that no such trick exists, right ?

    >
    > K combinator when passing the value, so you do not retain a copy...


    Sadly that will not work, as TWO refCounts are added when passing an arg
    to a proc:
    - one for the ref in the objc/objv array (Tcl stack within TEBC)
    - one for the ref of the proc's variable

    That means that having more refs in the caller's context (which you
    remove with the K trick) is irrelevant to the COW behaviour.

    Avoiding this double-ref is not easy, and will not happen soon (except
    if someone comes up with a great idea). It would involve either of
    (a) a non-trivial change in api semantics (callee manages objc/objv,
    as opposed to caller)
    (b) some elaborate (aka cliff-hanging, aka hacky) special handling
    for argument variables at runtime (as opposed to all other vars). This
    would have a (perceptible?) cost in runtime, and a (definite!) cost in
    added complexity.

    I have been thinking about changing the api only for procs (a), but not
    as a hi-prio project.

  10. Default Re: Dict sharing vs. duplication

    On Dec 21, 2:39 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
    wrote:

    >
    > What we've just shown is that [array/upvar] is the *only* way to be
    > sure to avoid duplications. As soon as your dict value gets bound to a
    > formal parameter or other variable, you're at risk.



    Weird - I thought the whole reason for dict was so that we had a data
    structure that was treated as a first class object - instead, we have
    Just Another Second Class Structure (JASCS)?


+ Reply to Thread
Page 1 of 11 1 2 3 ... LastLast