Natural sorting optimizing... and a working solution :) - Delphi

This is a discussion on Natural sorting optimizing... and a working solution :) - Delphi ; Hi, I have been working with a natural sorting solution today and i was wondering if someone could have a look and maybe suggest how to optimize it. It works, but it is ugly code. It will for sure not ...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 26

Natural sorting optimizing... and a working solution :)

  1. Default Natural sorting optimizing... and a working solution :)

    Hi,

    I have been working with a natural sorting solution today and i was
    wondering if someone could have a look and maybe suggest how to optimize it.
    It works, but it is ugly code. It will for sure not work with tiburian
    strings. How should I work with strings when I need to extract each char?
    Well anyway it is a good starting point if you are looking for this kind of
    sorting. I have looked very hard without finding a single delphi
    implementation so here is your chance
    I am using the code to sort filenames.

    Here is the code:

    function CompareNatural(s1, s2: String): Integer;
    var
    n1, n2, j: Integer;
    c1, c2: Char;
    t1, t2: String;
    b: Boolean;

    function ExtractNr(start: Integer; var txt: String): Integer;
    var
    n: Integer;
    s: String;
    begin
    Result:= 0;
    n:= start;
    while (n <= Length(txt)) and (txt[n] > #47) and (txt[n] < #58) do
    n:= n + 1;
    s:= Copy(txt, start, (n - start));
    Delete(txt, start - 1, (n - start) + 1);
    if s <> '' then
    Result:= StrToInt(s)
    end;

    begin
    t1:= LowerCase(s1);
    t2:= LowerCase(s2);
    j:= 1;
    b:= False;
    while (not b) do
    begin
    result:= 0;
    c1:= t1[j];
    c2:= t2[j];
    if c1 <> c2 then
    begin // Check if number...
    if ((c1 > #47) and (c1 < #58)) and ((c2 > #47) and (c2 < #58)) then
    begin
    n1:= ExtractNr(j, s1);
    n2:= ExtractNr(j, s2);
    if n1 = n2 then
    Result := 0
    else
    begin
    if n1 > n2 then
    Result := 1
    else
    Result := -1;
    b:= True;
    end;
    end else
    begin
    if c1 = c2 then // Check if chars are equal
    Result := 0
    else
    begin
    if c1 > c2 then
    Result := 1
    else
    Result := -1;
    b:= True;
    end;
    end;
    b:= b or (j = Min(Length(t1), Length(t2)));
    end;
    b:= b or (j = Min(Length(t1), Length(t2)));
    j:= j + 1;
    end;
    if result = 0 then
    begin
    if Length(t1) <> Length(t2) then
    if Length(t1) > Length(t2) then
    Result:= 1
    else
    Result:= -1;
    end;
    end;


  2. Default Re: Natural sorting optimizing... and a working solution :)

    "Roy Magne Klever" <a@b.c> wrote in message
    news:48878d77$1@newsgroups.borland.com...
    >
    > I have been working with a natural sorting solution today and i was
    > wondering if someone could have a look and maybe suggest how to optimize
    > it.


    Your code currently errs if the first two characters are the same, it skips
    all processing and just compares length.

    If there is a number, can it appear at any starting position or is it always
    at the beginnning of the string?

    If it can start later in the string then you waste time comparing characters
    just to compare the numbers. If they always start at the beginning then
    replace your code searching for numbers with a call to Val() - it will give
    you back the number if there is a valid one at the start of the string (and
    an error position for the character following if any).

    If the string does not contain a string then use the built-in routines - you
    want case insensitive so try calling CompareText or AnsiCompareText.

    --
    Wayne Niddery - TeamB (www.teamb.com)
    Winwright, Inc. (www.winwright.ca)


  3. Default Re: Natural sorting optimizing... and a working solution :)

    > Your code currently errs if the first two characters are the same, it
    > skips all processing and just compares length.


    > If there is a number, can it appear at any starting position or is it
    > always at the beginnning of the string?


    Hi! I could not reproduse that error. I tested withe the following strings:

    pic1, pic011p001, pic2, pic10p002, pic11p2

    and it sorted to:

    pic1, pic2, pic10p002, pic011p001, pic11p2

    which is very correct according to what i want but the nubers can be
    every where as you see from the sample ... it extracts the nuber and if
    equal it continues to compare the string ... until it finds another number
    or the result is reached.

    but your suggestions are good . I can check before I start number
    extraction.

    ---
    Roy




  4. Default Re: Natural sorting optimizing... and a working solution :)

    Wayne Niddery (TeamB) wrote:
    > "Roy Magne Klever" <a@b.c> wrote in message
    > news:48878d77$1@newsgroups.borland.com...
    >>
    >> I have been working with a natural sorting solution today and i was
    >> wondering if someone could have a look and maybe suggest how to
    >> optimize it.

    >
    > Your code currently errs if the first two characters are the same, it
    > skips all processing and just compares length.


    No, if the characters are the same, it checks whether it's at the end of
    either of the strings (j = Min(Length(t1), Length(t2))). If it's not,
    then b remains False and the loop goes around again, looking at the next
    character.

    > If there is a number, can it appear at any starting position or is it
    > always at the beginnning of the string?
    >
    > If it can start later in the string then you waste time comparing
    > characters just to compare the numbers.


    No, if the leading characters are different, then the numbers don't
    matter and they're never inspected.

    > If they always start at the
    > beginning then replace your code searching for numbers with a call to
    > Val() - it will give you back the number if there is a valid one at the
    > start of the string (and an error position for the character following
    > if any).
    >
    > If the string does not contain a string then use the built-in routines -
    > you want case insensitive so try calling CompareText or AnsiCompareText.
    >


    The function fails when either string is empty. It also fails when s1 =
    'p1q8' and s2 = 'p0001q5'. It should return 1, but it returns -1 instead.

    The reason is that it does all its comparisons with t1 and t2, but it
    extracts its numbers from s1 and s2. After the first set of numbers is
    extracted, the index j is no longer accurate for all four strings. I
    expect there should be two separate index variables, and only two strings.

    It also fails when s1 = '9876543210' and s2 = '9876543210'. It should
    return 0, but instead it doesn't return any value.

    --
    Rob

  5. Default Re: Natural sorting optimizing... and a working solution :)

    Hi Rob,

    You are right! It fails if either string is empty but no string will be
    empty ... since filenames can not be empty. I figured out the mistake about
    diffrent strings s1, s2 versus t1, t2. Bad mistake. Correcting it fixes
    much. I will add quick check and empty string detection. But still, the code
    is messy.

    > The function fails when either string is empty. It also fails when s1 =
    > 'p1q8' and s2 = 'p0001q5'. It should return 1, but it returns -1 instead.


    Fixed by correcting the s1, s2 to t1, t2.

    > The reason is that it does all its comparisons with t1 and t2, but it
    > extracts its numbers from s1 and s2. After the first set of numbers is
    > extracted, the index j is no longer accurate for all four strings. I
    > expect there should be two separate index variables, and only two strings.


    Good observed, I just a long time to discover that...

    > It also fails when s1 = '9876543210' and s2 = '9876543210'. It should
    > return 0, but instead it doesn't return any value.


    Ok will check...

    Thanks for taking a look and giving feedback, I get ideas

    ---
    Roy


  6. Default Re: Natural sorting optimizing... and a working solution :)

    Roy Magne Klever wrote:
    > I have been working with a natural sorting solution today and i was
    > wondering if someone could have a look and maybe suggest how to optimize
    > it. It works,


    No it doesn't. You haven't tested it thoroughly.

    > but it is ugly code.


    That has nothing to do with optimization.

    > It will for sure not work with
    > tiburian strings.


    I think it would, provided two things are true: Lowercase() is defined
    suitably, and you only want to consider the characters '0' through '9',
    not any other number-like characters.

    > How should I work with strings when I need to extract
    > each char?


    You mean in Tiburon? The same way you always do. When "string" changes,
    so will "Char."

    > Well anyway it is a good starting point if you are looking
    > for this kind of sorting. I have looked very hard without finding a
    > single delphi implementation so here is your chance


    I remember discussing this on the newsgroups a few years ago.

    http://groups.google.com/group/borla...9f251fa595dbdb

    The site where the final result was posted doesn't seem to exist
    anymore. It's been nearly five years, though.

    The Windows API function for comparing strings while considering
    embedded numeric values is StrCmpLogical. Others have written about that
    function, and sometimes when they implement a equivalent function, they
    give it that name. For instance:

    http://groups.google.com/group/borla...41d49f8bbba577

    > I am using the code to sort filenames.
    >
    > Here is the code:
    >
    > function CompareNatural(s1, s2: String): Integer;
    > var
    > n1, n2, j: Integer;
    > c1, c2: Char;
    > t1, t2: String;
    > b: Boolean;
    >
    > function ExtractNr(start: Integer; var txt: String): Integer;
    > var
    > n: Integer;
    > s: String;
    > begin
    > Result:= 0;
    > n:= start;
    > while (n <= Length(txt)) and (txt[n] > #47) and (txt[n] < #58) do
    > n:= n + 1;
    > s:= Copy(txt, start, (n - start));
    > Delete(txt, start - 1, (n - start) + 1);
    > if s <> '' then
    > Result:= StrToInt(s)
    > end;
    >
    > begin
    > t1:= LowerCase(s1);
    > t2:= LowerCase(s2);
    > j:= 1;
    > b:= False;
    > while (not b) do
    > begin
    > result:= 0;
    > c1:= t1[j];
    > c2:= t2[j];
    > if c1 <> c2 then
    > begin // Check if number...
    > if ((c1 > #47) and (c1 < #58)) and ((c2 > #47) and (c2 < #58)) then
    > begin
    > n1:= ExtractNr(j, s1);
    > n2:= ExtractNr(j, s2);
    > if n1 = n2 then
    > Result := 0
    > else
    > begin
    > if n1 > n2 then
    > Result := 1
    > else
    > Result := -1;
    > b:= True;
    > end;
    > end else
    > begin
    > if c1 = c2 then // Check if chars are equal
    > Result := 0


    This is pointless. You already know they're not equal. It's how you go
    into this "Check if number" block in the first place.

    > else
    > begin
    > if c1 > c2 then
    > Result := 1
    > else
    > Result := -1;
    > b:= True;
    > end;
    > end;
    > b:= b or (j = Min(Length(t1), Length(t2)));
    > end;
    > b:= b or (j = Min(Length(t1), Length(t2)));


    You have the same line of code twice. The first one is unnecessary.

    > j:= j + 1;
    > end;
    > if result = 0 then
    > begin
    > if Length(t1) <> Length(t2) then
    > if Length(t1) > Length(t2) then
    > Result:= 1
    > else
    > Result:= -1;
    > end;
    > end;
    >



    --
    Rob

  7. Default Re: Natural sorting optimizing... and a working solution :)

    Hi Rob,

    Yes you are right it did not work but it seemd to do so it my test scenario.
    Well testing isnt always as easy as it sounds ... :-)

    With ugly code I ment it seams more like a hack than usable code ... and by
    optimising I was thinking about cleanig up the code.

    > It will for sure not work with tiburian strings.


    > I think it would, provided two things are true: Lowercase() is defined
    > suitably, and you only want to consider the characters '0' through '9',
    > not any other number-like characters.


    Well time will show :-)

    > How should I work with strings when I need to extract each char?


    > You mean in Tiburon? The same way you always do. When "string" changes, so
    > will "Char."


    Yepp, general speaking ... and I guess time will show.

    > I remember discussing this on the newsgroups a few years ago.
    > http://groups.google.com/group/borla...9f251fa595dbdb


    Thanks for the link, I have really been looking for an solution but could
    not find any delphi or pascal solution. I think it will give me som ideas.

    > The site where the final result was posted doesn't seem to exist anymore.
    > It's been nearly five years, though.


    Yes I noticed...

    > The Windows API function for comparing strings while considering embedded
    > numeric values is StrCmpLogical. Others have written about that function,
    > and sometimes when they implement a equivalent function, they give it that
    > name. For instance:
    >
    > http://groups.google.com/group/borla...41d49f8bbba577


    Ok good to know :-) The name for this sorting seems to vary a lot ...


    Kind Regards
    Roy M Klever


  8. Default Re: Natural sorting optimizing... and a working solution :)

    Roy Magne Klever wrote:
    > You are right! It fails if either string is empty but no string will be
    > empty ... since filenames can not be empty. I figured out the mistake
    > about diffrent strings s1, s2 versus t1, t2. Bad mistake. Correcting it
    > fixes much. I will add quick check and empty string detection. But
    > still, the code is messy.


    Since ExtractNr does not use any of the variables or parameters of
    CompareNatural, it doesn't need to be a nested procedure. Move it
    outside of CompareNatural, and they both become easier to read because
    they aren't getting in each other's ways.

    You can use the Sign function from the Math unit to simplify some of
    your result calculations. You have this:

    if n1 = n2 then
    Result := 0
    else
    begin
    if n1 > n2 then
    Result := 1
    else
    Result := -1;
    end;

    You can replace all that with this:

    Result := Sign(n1 - n2);

    And then later:

    Result := Sign(c1 - c2);
    b := Result = 0;

    And even later:

    Result := Sign(Length(t1) - Length(t2));

    >> The function fails when either string is empty. It also fails when s1
    >> = 'p1q8' and s2 = 'p0001q5'. It should return 1, but it returns -1
    >> instead.

    >
    > Fixed by correcting the s1, s2 to t1, t2.


    If that's all you did to fix it, then it's not fixed yet. Your function
    should now return 0 for those inputs. After ExtractNr, t1 is "q8" and t2
    is "q5", and n1 = n2, so Result is set to 0. J is still 2, which is
    equal to the lengths of both the strings now, so b is set to True. The
    loop ends, and since the strings' lengths are equal, your function
    returns 0.

    I also wonder why you make ExtractNr start deleting characters from
    *before* the first number. I'd be especially wary when the first
    character of the string is a number -- how much of the string gets
    deleted when the start index you give to Delete is zero?

    --
    Rob

  9. Default Re: Natural sorting optimizing... and a working solution :)

    Roy Magne Klever wrote:

    > I have been working with a natural sorting solution today and i was
    > wondering if someone could have a look and maybe suggest how to
    > optimize it.


    Usually optimization is reserved for performance improvement but unless
    you've got millions of strings to sort readability and reduced code
    complexity is far more important. The following has a cyclomatic code
    complexity of 8 according to SourceMonitor and could probably be
    improved further with some more thought.

    function NaturalCompare(s1, s2: String): Integer;

    procedure ParseAndAddToStringlist(const s : string; SL : tStringList);
    // pic011p001 becomes ['pic', '011', 'p', '001']
    var
    c : char;
    Temp : string;
    WasLetter, WasNumber : boolean;
    begin
    Temp := '';
    WasNumber := False;
    WasLetter := False;
    for c in s do begin
    if (c in ['0'..'9']) then begin
    if WasLetter then begin
    Sl.Add(temp);
    Temp := c;
    end else Temp := Temp + c;
    WasLetter := False;
    WasNumber := True;
    end else begin
    if WasNumber then begin
    Sl.Add(temp);
    Temp := c;
    end else Temp := Temp + c;
    WasLetter := True;
    WasNumber := False;
    end;
    end;
    SL.Add(temp);
    end;

    var
    sl1, sl2 : tStringList;
    i, i1, i2 : integer;
    BothNumbers : boolean;
    begin
    Result := 0;
    sl1 := TStringList.Create;
    sl2 := TStringList.Create;
    try
    ParseAndAddToStringlist(s1, sl1);
    ParseAndAddToStringlist(s2, sl2);
    // Make sure both string lists have same number of elements
    while sl1.Count < sl2.Count do sl1.add('');
    while sl2.Count < sl1.Count do sl2.add('');
    for i := 0 to sl1.count-1 do begin
    BothNumbers := TryStrToInt(sl1[i], i1) and TryStrToInt(sl2[i],
    i2);
    if BothNumbers then begin
    Result := i1 - i2;
    end else begin
    Result := CompareStr(sl1[i], SL2[i]);
    end;
    if Result <> 0 then Exit;
    end;
    finally
    sl1.Free;
    sl2.Free;
    end;
    end;

    --
    -Mike (TeamB)

  10. Default Re: Natural sorting optimizing... and a working solution :)

    Hi Rob,

    Yes, I am a bit quick to declare victory ... I found a few bugs when testing
    again. Will be doing some more testing before i declare this one for tested.
    Thanks for your time.

    > Since ExtractNr does not use any of the variables or parameters of
    > CompareNatural, it doesn't need to be a nested procedure.


    Yeah I guess so.

    > You can use the Sign function from the Math unit to simplify some of your
    > result calculations. You have this:


    > Result := Sign(n1 - n2);


    Nice, I did not know about this routine. Sure will do.

    > If that's all you did to fix it, then it's not fixed yet. Your function
    > should now return 0 for those inputs. After ExtractNr, t1 is "q8" and t2
    > is "q5", and n1 = n2, so Result is set to 0. J is still 2, which is equal
    > to the lengths of both the strings now, so b is set to True. The loop
    > ends, and since the strings' lengths are equal, your function returns 0.


    Yes this is one of the bugs I found and have now corrected.

    > I also wonder why you make ExtractNr start deleting characters from
    > *before* the first number. I'd be especially wary when the first character
    > of the string is a number -- how much of the string gets deleted when the
    > start index you give to Delete is zero?


    Yes this one is nasty... Sometimes it can be smart to take a break, I guess
    :-)

    Kind Regards
    Roy M Klever



+ Reply to Thread
Page 1 of 3 1 2 3 LastLast