TIP #313 Published for Peter Spjuth - TCL

This is a discussion on TIP #313 Published for Peter Spjuth - TCL ; On Feb 15, 11:17 pm, Donald Arseneau <a...@triumf.ca> wrote: > The actual TIP313-author (Peter Spjuth) wrote: > > mutually exclusive with other search mode flags such as *-sorted*. > > Would it make sense to say that *-sorted* is implied and ...

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3
Results 21 to 29 of 29

TIP #313 Published for Peter Spjuth

  1. Default Re: TIP #313 Published for Peter Spjuth

    On Feb 15, 11:17 pm, Donald Arseneau <a...@triumf.ca> wrote:
    > The actual TIP313-author (Peter Spjuth) wrote:
    > > mutually exclusive with other search mode flags such as *-sorted*.

    >
    > Would it make sense to say that *-sorted* is implied and therefore
    > superfluous, but it would be accepted if given.


    That would make -bisect a modifier to -sorted instead of a search
    mode,
    and that would make perfect sense. Not sure if is better though...
    I'll look into what this would imply implementation and documentation
    wise to see which looks better of the two.

    /Peter

  2. Default Re: TIP #313 Published for Peter Spjuth

    On Fri, 15 Feb 2008 07:31:10 -0800 (PST),
    fredericbonnet@free.fr wrote:

    > Great idea. I have a few suggestions, though.
    > I've used the C++ STL a lot for the past few years. What I appreciate
    > the most is the overall consistency in the names, and the variety of
    > algorithms. Here a Tcl list with sorted elements would be close to a
    > STL "set" container. The STL "set" provides two methods, "lower_bound"
    > and "upper_bound", that respectively find "the first element whose key
    > is not less than k", and "the first element whose key is greater than
    > k" (it also provides "equal_range" that gives the pair of bounds).
    > This distinction is especially useful with bags or "multisets" (i.e.
    > sorted sets with with multiple elements having the same value), or
    > with reals.


    I strongly agree. The TIP even in its present form, with option 2 as
    the default, is quite nice, and touches on a oft-wanted idea. Although
    the name -bisect is non-obvious. It sounds like it does something quite
    different (which I'll get to at the end, mostly out of interest sake)...

    The ability to find BOTH ends is important, for many purposes that I've
    run across myself. Most of them can be handled by "-all" or "-not
    -all", ignoring the fact that the list is sorted entirely, rather than
    accept the hit incurred by a second-stage search to find the other end.

    Presently, we have -sorted which corresponds to option 3 (first
    where element is greater than or equal to key), except that it returns
    -1 if it fails to match at all. I think a modifier to select which of
    the four given would be the perfect solution, thus:

    -sorted=-1 - gives the present -sorted behaviour
    -sorted=1-4 - gives the four cases listed in the tip
    -sorted=0 - gives Alexandre fastest-possible match.

    I do think offering all four possibilities is good, even though two are
    trivial modifications of the other two, on the basis that it leaves the
    implementation free to change exactly which case it finds for either
    end, without having to worry about consistency.

    And Alexandre's suggestion is good where there's the possibility of a
    lot of equal values, to just stop at the first match (lsearch should be
    doing that anyhow, where -inline is given).

    The way I see it in my head (which at present, may not be saying much),
    all six options should functionally boil down quite neatly. The
    numeric values could also further be replaced by names, in the same
    manner as [return -code].


    As an aside, how hard WOULD it be to obtain BOTH ends in the one
    search? It would mean stopping the search at the match/mismatch point,
    handling cases -1 and 0, otherwise continuing to find one end if
    requested, and then the other end from the same state, if requested, and
    finally applying the adjustments if requested. If not exceedingly
    difficult, "inner" and "outer" modifiers would be absolutely magic:
    set inner [lrange $list {*}[lsearch -sorted=inner $list VAL]]
    set outer [lreplace $list {*}[[lsearch -sorted=inner $list VAL]]
    or combined with the -inline switch
    set inner [lsearch -inline -sorted=inner $list VAL]
    not sure exactly how -sorted=outer would interact with -inline, but it
    also raises the possibility of how I imagined -bisect to mean when I
    first heard it recommended just pre-TIP:
    set lower [lrange $list 0 [lsearch -sorted=1 $list VAL]]
    set upper [lrange $list [lsearch -sorted=4 $list VAL]]
    set bisect[list $lower $upper]
    (except that I'd personally include the list of found values in between)


    Fredderic

  3. Default Re: TIP #313 Published for Peter Spjuth

    Fredderic wrote:
    > The ability to find BOTH ends is important, for many purposes that I've
    > run across myself. Most of them can be handled by "-all" or "-not
    > -all", ignoring the fact that the list is sorted entirely, rather than
    > accept the hit incurred by a second-stage search to find the other end.


    Can you discuss those use cases further? My experience is that while I
    often need "the set of all elements that match the key," I generally
    need them because I'm doing something to every one. If that's the case,
    then I'm incurring linear time anyway, and there simply isn't a
    significantly better way to do it than "find the first" and then "walk
    through the rest." Similarly, if I'm looking to delete all the
    elements that match the key, I have to copy the tail of the list, if
    not the entire list; again, a linear-time operation (as opposed to
    STL's sets, where it may be possible to merge subtrees).

    I don't want to see what should be a simple operation (find an interval
    bound) start sprouting options, like Ptolemaic epicycles, to cover
    entirely imaginary use cases, particularly to save a second operation
    with a logarithmic time bound. (In fact, having a logarithmic time bound
    is good enough that I never troubled to do the C implementation of
    BSearch. So far, other performance issues have been the "long pole in
    the tent".)

    --
    73 de ke9tv/2, Kevin

  4. Default Re: TIP #313 Published for Peter Spjuth

    Fredderic wrote:
    > Presently, we have -sorted which corresponds to option 3 (first
    > where element is greater than or equal to key), except that it returns
    > -1 if it fails to match at all. I think a modifier to select which of
    > the four given would be the perfect solution, thus:
    >
    > -sorted=-1 - gives the present -sorted behaviour
    > -sorted=1-4 - gives the four cases listed in the tip
    > -sorted=0 - gives Alexandre fastest-possible match.


    Oh God! No!

    The [lsearch] command is already excessively complex IMO, and TIP#313 is
    pushing the envelope even further. What we don't need on top is an extra
    chunk of excessive rococo curlicues on top (magic numbers, no less!)
    that make an already-difficult API even worse.

    Donal.

  5. Default Re: TIP #313 Published for Peter Spjuth

    Donal K. Fellows wrote:
    > The [lsearch] command is already excessively complex IMO, and TIP#313 is
    > pushing the envelope even further. What we don't need on top is an extra
    > chunk of excessive rococo curlicues on top (magic numbers, no less!)
    > that make an already-difficult API even worse.


    First, let us rule out entirely Peter's cases 1 and 3 (first element
    greater than the key, and first element greater than or equal to the
    key). They are inconsistent with the convention of returning -1
    for a failed search, and they can be got trivially by adding 1 to
    the index returned from cases 2 and 4. (Proof below.)

    So we're down to two cases:

    Last element <= key
    Last element < key

    The tcl::clock::BSearch procedure chooses 'last element <= key'. It
    is indeed imaginable that 'last element < key' is also a useful
    thing to request.

    If we have 'last element <= key', then 'first element > key' is
    trivial to obtain - simply add 1 to the returned index. Similarly,
    if we have 'last element < key', then 'first element >= key' is
    obtained by adding 1 to the index. We've therefore accumulated
    a complete set of operations.

    Alexandre's case 5 (return an undefined value somewhere in
    the correct range) is motivated by performance. I refuse
    to worry about a constant factor on what is already a O(log n)
    operation. Fredderic's desire to return both ends
    in a single search operation is also motivated by performance,
    and again is a constant factor on a O(log n) operation.
    To my mind, simplicity wins over these constant factors.

    If keys are well ordered (e.g., if they are integers), then
    "last element < key" is also trivial to get from "last element
    <= key-1". Alas, for the case of string keys, we don't have
    a natural well-ordering.

    If forced to choose only one of the two, "laet element <= key"
    has obvious use cases:

    - if you are looking up or interpolating a function value
    from a table of values, it finds a consistent starting
    point. This is the case in tcllib's ::math::interpolate,
    KHIM's BSearch (find whether a code point is or is not
    in a tabulated set of intervals), and ::tcl::clock::BSearch
    (find the last Daylight Saving Time transition at or
    before a given time)

    - if you are finding the insertion point to place an element
    in an ordered list or priority queue (being aware, of
    course, that this operation is fundamentally a O(N)
    operation anyway), this operation has found it; simply
    add 1.

    The case for the other end of the interval is less compelling:

    - if you are performing some operation on "all elements
    matching the key", then you'll be taking time proportional
    to the number of matching elements. Starting from the
    last element that matches, and looping to the left until
    finding an element that doesn't match, is also linear
    in the number of matches. There's no performance difference
    to be had here.

    - if you are deleting all elements matching the key, the
    situation is even worse. The deletion will copy at least
    all the elements to the right of the key, and if the list
    is shared, will copy the entire list; either way, it's
    O(N) where N is the size of the list. Again, a linear
    search from the returned match point to find the other
    end of the interval adds only a constant factor.

    I'm not entirely convinced that the additional complexity of
    more -options to [lsearch] to get the left side of the interval
    is justified by the performance to be gained; as far as I'm
    concerned, the jury is still out on Peter's case 4.

    In short, Peter's proposal, exactly as written, handles the
    common use cases with a minimum of additional syntax.

    --
    73 de ke9tv/2, Kevin

    ----------------------------------------------------------------------

    To prove that Peter's cases 1 and 3 are inconsistent with the
    [lsearch] convention of returning -1 for failure, while cases
    2 and 4 are consistent with it:

    Assume without loss of generality that the list comprises only
    elements 'a', 'b', and 'c', and that the search is looking for
    element 'b'.

    Case 0: All of a, b, c are present. {a a b b c c} is an example.
    Peter's case 1 returns 4 (index of the first 'c')
    Peter's case 2 returns 3 (index of the last 'b')
    Peter's case 3 returns 2 (index of the first 'b')
    Peter's case 4 returns 1 (index of the last 'a')

    Case 1: The list ends with the last 'b' {a a b b}
    Peter's case 1 needs to return the index of the (nonexistent)
    element following the last 'b' and hence should still return
    4. This is inconsistent with the usual rule that [lsearch]
    returns -1 for a nonexistent element. Let us tentatively rule
    out Peter's case 1.
    Peter's case 2 returns 3 (index of the last 'b')
    Peter's case 3 returns 2 (index of the first 'b')
    Peter's case 4 returns 1 (index of the last 'a')

    Case 2: The list contains no 'b' but has elements both before and
    after 'b'. {a a c c}
    Peter's case 1 returns 2 (index of the first element > b)
    Peter's case 2 returns 1 (index of the last element <= b)
    Peter's case 3 returns 2 (index of the first element >= b)
    Peter's case 4 returns 1 (index of the last element <= b)

    Case 3: The list contains only elements < b {a a}
    Peter's case 1 returns the index of the (nonexistent)
    following the last 'b' and hence should return 2; the same
    problem as with the list in case 1 above.
    Peter's case 2 returns 1 (index of the last element <= b)
    Peter's case 3 needs to return the index of the (nonexistent)
    first element >= b (hence 2), which is inconsistent with
    the rule that [lsearch] returns -1 for a nonexistent element.
    We tentatively reject Peter's case 3.
    Peter's case 4 returns 1 (index of the last element < b).

    OK, we're halfway there...

    Case 4: The list contains no elements before the first 'b' {b b c c}
    Peter's case 1 returns 2 (index of the first element > b)
    Peter's case 2 returns 1 (index of the last element <= b)
    Peter's case 3 returns 0 (index of the first element >= b)
    Peter's case 4 needs to return the index of the (nonexistent)
    last element that is less than 'b', and hence needs to return
    -1. This value is consistent with the convention of having
    [lsearch] return -1 for failure, so we can retain Peter's
    case 4.

    Case 5: The list contains only b's {b b}
    Peter's case 1 needs to return 2 (the index of the nonexistent
    element following the last 'b') - inconsistent with
    [lindex]'s -1.
    Peter's case 2 returns 1 (last element <= b)
    Peter's case 3 returns 0 (first element >= b)
    Peter's case 4 returns -1 (either [lsearch] failure, or
    index of the nonexistent element before the first b)

    Case 6: The list contains only elements > b {c c}
    Peter's case 1 returns 0 (index of the first element > b)
    Peter's case 2 returns -1 (either [lsearch] failure, or index
    of the nonexistent last element <= b)
    Peter's case 3 returns 0 (index of the first element >= b)
    Peter's case 4 returns -1 (either [lsearch] failure, or else
    index of the nonexistent last element < b)

    Case 7: The list is empty.
    Here we'd like Peter's cases 1 and 3 to return 0 (the index
    of the (nonexistent) first element of the list), while
    cases 2 and 4 would return -1 (the index of the nonexistent
    last element preceding the list). Cases 2 and 4 remain
    consistent with [lsearch]'s convention of 'return -1 on
    failure'.

  6. Default Re: TIP #313 Published for Peter Spjuth

    Kevin Kenny wrote:
    > In short, Peter's proposal, exactly as written, handles the
    > common use cases with a minimum of additional syntax.


    As things stand, the [lsearch] command is now really three commands in
    one, corresponding to doing a linear search, a binary search (which #313
    proposes to improve), and filtering. I keep thinking it would be nice if
    these weren't overloaded on the same command, as it makes [lsearch]
    difficult to learn.

    I don't dispute that Peter's chosen a good algorithm. That's not my
    objection at all.

    Donal.

  7. Default Re: TIP #313 Published for Peter Spjuth

    Donal K. Fellows wrote:
    > As things stand, the [lsearch] command is now really three commands in
    > one, corresponding to doing a linear search, a binary search (which #313
    > proposes to improve), and filtering. I keep thinking it would be nice if
    > these weren't overloaded on the same command, as it makes [lsearch]
    > difficult to learn.


    Quite so, except that it's really mor like half-a-dozen because
    there are several mutually exclusive filters there, too.

    I agree that [lsearch] is a mess, and I'd prefer doing this piece
    with a new command. (I'd also be inclined to provide new commands
    for the various mutually exclusive modes of [lsearch] also.) But
    I don't want to lay all that as a requirement on Peter in order to
    push this thing forward.

    (And I knew that *you* agreed that Peter had chosen a good
    algorithm. It was just easier to address your comments, Alexandre's,
    and Fredderic's in a single post. I'm lazy.)

    --
    73 de ke9tv/2, Kevin

  8. Default Re: TIP #313 Published for Peter Spjuth

    On Sat, 23 Feb 2008 12:10:42 -0500,
    Kevin Kenny <kennykb@acm.org> wrote:

    > Donal K. Fellows wrote:
    >> The [lsearch] command is already excessively complex IMO, and
    >> TIP#313 is pushing the envelope even further. What we don't need on
    >> top is an extra chunk of excessive rococo curlicues on top (magic
    >> numbers, no less!) that make an already-difficult API even worse.

    > First, let us rule out entirely Peter's cases 1 and 3 (first element
    > greater than the key, and first element greater than or equal to the
    > key). They are inconsistent with the convention of returning -1
    > for a failed search, and they can be got trivially by adding 1 to
    > the index returned from cases 2 and 4.


    We already have an option that returns -1 for a failed search. And the
    insertion point for a value greater than any other in the list IS the
    index of the non-existent item just after the list.

    As demonstrated the last item >= to the searched value is trivial to
    obtain from this index, but furthermore it's handy for counting the
    items in between, as it avoids the rather annoyingly common error of
    forgetting to add an extra 1.

    And since we already have a test for whether the item is present in the
    list, which returns the left-most item if it is, the insertion point
    just to the right of the list is the natural compliment.


    Having to use [lsearch] to linearly hunt down one end of an equal set
    from the other, is bad if you're not interested in the intervening
    items, which is why I still believe being able to find both ends (not
    necessarily at the same time, though that would be awfully nice) is
    still important.

    I do accept that if you've already decided on the absolute simplest
    implementation for the new option, then either "first >" or "last >="
    will do, but neither should return -1, ever. We've already got that,
    so maintaining consistency with -sorted is merely duplication of
    existing functionality.


    Fredderic

  9. Default Re: TIP #313 Published for Peter Spjuth

    Fredderic wrote:
    > I do accept that if you've already decided on the absolute simplest
    > implementation for the new option, then either "first >" or "last >="
    > will do, but neither should return -1, ever. We've already got that,
    > so maintaining consistency with -sorted is merely duplication of
    > existing functionality.


    I presume that you mean "last<=", since "last>=" would always be the
    end of the list, presuming that any item is greater than or equal
    to the key.

    "Last<=" should indeed return -1 if element 0 is greater than the
    key; -1 is the index of the nonexistent last element less than or
    equal to the key.

    --
    73 de ke9tv/2, Kevin

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3