
Natural sorting optimizing... and a working solution :)
Hi,
I have been working with a natural sorting solution today and i was
wondering if someone could have a look and maybe suggest how to optimize it.
It works, but it is ugly code. It will for sure not work with tiburian
strings. How should I work with strings when I need to extract each char?
Well anyway it is a good starting point if you are looking for this kind of
sorting. I have looked very hard without finding a single delphi
implementation so here is your chance
I am using the code to sort filenames.
Here is the code:
function CompareNatural(s1, s2: String): Integer;
var
n1, n2, j: Integer;
c1, c2: Char;
t1, t2: String;
b: Boolean;
function ExtractNr(start: Integer; var txt: String): Integer;
var
n: Integer;
s: String;
begin
Result:= 0;
n:= start;
while (n <= Length(txt)) and (txt[n] > #47) and (txt[n] < #58) do
n:= n + 1;
s:= Copy(txt, start, (n  start));
Delete(txt, start  1, (n  start) + 1);
if s <> '' then
Result:= StrToInt(s)
end;
begin
t1:= LowerCase(s1);
t2:= LowerCase(s2);
j:= 1;
b:= False;
while (not b) do
begin
result:= 0;
c1:= t1[j];
c2:= t2[j];
if c1 <> c2 then
begin // Check if number...
if ((c1 > #47) and (c1 < #58)) and ((c2 > #47) and (c2 < #58)) then
begin
n1:= ExtractNr(j, s1);
n2:= ExtractNr(j, s2);
if n1 = n2 then
Result := 0
else
begin
if n1 > n2 then
Result := 1
else
Result := 1;
b:= True;
end;
end else
begin
if c1 = c2 then // Check if chars are equal
Result := 0
else
begin
if c1 > c2 then
Result := 1
else
Result := 1;
b:= True;
end;
end;
b:= b or (j = Min(Length(t1), Length(t2)));
end;
b:= b or (j = Min(Length(t1), Length(t2)));
j:= j + 1;
end;
if result = 0 then
begin
if Length(t1) <> Length(t2) then
if Length(t1) > Length(t2) then
Result:= 1
else
Result:= 1;
end;
end;

Re: Natural sorting optimizing... and a working solution :)
"Roy Magne Klever" <a@b.c> wrote in message
news:48878d77$1@newsgroups.borland.com...
>
> I have been working with a natural sorting solution today and i was
> wondering if someone could have a look and maybe suggest how to optimize
> it.
Your code currently errs if the first two characters are the same, it skips
all processing and just compares length.
If there is a number, can it appear at any starting position or is it always
at the beginnning of the string?
If it can start later in the string then you waste time comparing characters
just to compare the numbers. If they always start at the beginning then
replace your code searching for numbers with a call to Val()  it will give
you back the number if there is a valid one at the start of the string (and
an error position for the character following if any).
If the string does not contain a string then use the builtin routines  you
want case insensitive so try calling CompareText or AnsiCompareText.

Wayne Niddery  TeamB (www.teamb.com)
Winwright, Inc. (www.winwright.ca)

Re: Natural sorting optimizing... and a working solution :)
> Your code currently errs if the first two characters are the same, it
> skips all processing and just compares length.
> If there is a number, can it appear at any starting position or is it
> always at the beginnning of the string?
Hi! I could not reproduse that error. I tested withe the following strings:
pic1, pic011p001, pic2, pic10p002, pic11p2
and it sorted to:
pic1, pic2, pic10p002, pic011p001, pic11p2
which is very correct according to what i want but the nubers can be
every where as you see from the sample ... it extracts the nuber and if
equal it continues to compare the string ... until it finds another number
or the result is reached.
but your suggestions are good . I can check before I start number
extraction.

Roy

Re: Natural sorting optimizing... and a working solution :)
Wayne Niddery (TeamB) wrote:
> "Roy Magne Klever" <a@b.c> wrote in message
> news:48878d77$1@newsgroups.borland.com...
>>
>> I have been working with a natural sorting solution today and i was
>> wondering if someone could have a look and maybe suggest how to
>> optimize it.
>
> Your code currently errs if the first two characters are the same, it
> skips all processing and just compares length.
No, if the characters are the same, it checks whether it's at the end of
either of the strings (j = Min(Length(t1), Length(t2))). If it's not,
then b remains False and the loop goes around again, looking at the next
character.
> If there is a number, can it appear at any starting position or is it
> always at the beginnning of the string?
>
> If it can start later in the string then you waste time comparing
> characters just to compare the numbers.
No, if the leading characters are different, then the numbers don't
matter and they're never inspected.
> If they always start at the
> beginning then replace your code searching for numbers with a call to
> Val()  it will give you back the number if there is a valid one at the
> start of the string (and an error position for the character following
> if any).
>
> If the string does not contain a string then use the builtin routines 
> you want case insensitive so try calling CompareText or AnsiCompareText.
>
The function fails when either string is empty. It also fails when s1 =
'p1q8' and s2 = 'p0001q5'. It should return 1, but it returns 1 instead.
The reason is that it does all its comparisons with t1 and t2, but it
extracts its numbers from s1 and s2. After the first set of numbers is
extracted, the index j is no longer accurate for all four strings. I
expect there should be two separate index variables, and only two strings.
It also fails when s1 = '9876543210' and s2 = '9876543210'. It should
return 0, but instead it doesn't return any value.

Rob

Re: Natural sorting optimizing... and a working solution :)
Hi Rob,
You are right! It fails if either string is empty but no string will be
empty ... since filenames can not be empty. I figured out the mistake about
diffrent strings s1, s2 versus t1, t2. Bad mistake. Correcting it fixes
much. I will add quick check and empty string detection. But still, the code
is messy.
> The function fails when either string is empty. It also fails when s1 =
> 'p1q8' and s2 = 'p0001q5'. It should return 1, but it returns 1 instead.
Fixed by correcting the s1, s2 to t1, t2.
> The reason is that it does all its comparisons with t1 and t2, but it
> extracts its numbers from s1 and s2. After the first set of numbers is
> extracted, the index j is no longer accurate for all four strings. I
> expect there should be two separate index variables, and only two strings.
Good observed, I just a long time to discover that...
> It also fails when s1 = '9876543210' and s2 = '9876543210'. It should
> return 0, but instead it doesn't return any value.
Ok will check...
Thanks for taking a look and giving feedback, I get ideas

Roy

Re: Natural sorting optimizing... and a working solution :)
Roy Magne Klever wrote:
> I have been working with a natural sorting solution today and i was
> wondering if someone could have a look and maybe suggest how to optimize
> it. It works,
No it doesn't. You haven't tested it thoroughly.
> but it is ugly code.
That has nothing to do with optimization.
> It will for sure not work with
> tiburian strings.
I think it would, provided two things are true: Lowercase() is defined
suitably, and you only want to consider the characters '0' through '9',
not any other numberlike characters.
> How should I work with strings when I need to extract
> each char?
You mean in Tiburon? The same way you always do. When "string" changes,
so will "Char."
> Well anyway it is a good starting point if you are looking
> for this kind of sorting. I have looked very hard without finding a
> single delphi implementation so here is your chance
I remember discussing this on the newsgroups a few years ago.
http://groups.google.com/group/borla...9f251fa595dbdb
The site where the final result was posted doesn't seem to exist
anymore. It's been nearly five years, though.
The Windows API function for comparing strings while considering
embedded numeric values is StrCmpLogical. Others have written about that
function, and sometimes when they implement a equivalent function, they
give it that name. For instance:
http://groups.google.com/group/borla...41d49f8bbba577
> I am using the code to sort filenames.
>
> Here is the code:
>
> function CompareNatural(s1, s2: String): Integer;
> var
> n1, n2, j: Integer;
> c1, c2: Char;
> t1, t2: String;
> b: Boolean;
>
> function ExtractNr(start: Integer; var txt: String): Integer;
> var
> n: Integer;
> s: String;
> begin
> Result:= 0;
> n:= start;
> while (n <= Length(txt)) and (txt[n] > #47) and (txt[n] < #58) do
> n:= n + 1;
> s:= Copy(txt, start, (n  start));
> Delete(txt, start  1, (n  start) + 1);
> if s <> '' then
> Result:= StrToInt(s)
> end;
>
> begin
> t1:= LowerCase(s1);
> t2:= LowerCase(s2);
> j:= 1;
> b:= False;
> while (not b) do
> begin
> result:= 0;
> c1:= t1[j];
> c2:= t2[j];
> if c1 <> c2 then
> begin // Check if number...
> if ((c1 > #47) and (c1 < #58)) and ((c2 > #47) and (c2 < #58)) then
> begin
> n1:= ExtractNr(j, s1);
> n2:= ExtractNr(j, s2);
> if n1 = n2 then
> Result := 0
> else
> begin
> if n1 > n2 then
> Result := 1
> else
> Result := 1;
> b:= True;
> end;
> end else
> begin
> if c1 = c2 then // Check if chars are equal
> Result := 0
This is pointless. You already know they're not equal. It's how you go
into this "Check if number" block in the first place.
> else
> begin
> if c1 > c2 then
> Result := 1
> else
> Result := 1;
> b:= True;
> end;
> end;
> b:= b or (j = Min(Length(t1), Length(t2)));
> end;
> b:= b or (j = Min(Length(t1), Length(t2)));
You have the same line of code twice. The first one is unnecessary.
> j:= j + 1;
> end;
> if result = 0 then
> begin
> if Length(t1) <> Length(t2) then
> if Length(t1) > Length(t2) then
> Result:= 1
> else
> Result:= 1;
> end;
> end;
>

Rob

Re: Natural sorting optimizing... and a working solution :)
Hi Rob,
Yes you are right it did not work but it seemd to do so it my test scenario.
Well testing isnt always as easy as it sounds ... :)
With ugly code I ment it seams more like a hack than usable code ... and by
optimising I was thinking about cleanig up the code.
> It will for sure not work with tiburian strings.
> I think it would, provided two things are true: Lowercase() is defined
> suitably, and you only want to consider the characters '0' through '9',
> not any other numberlike characters.
Well time will show :)
> How should I work with strings when I need to extract each char?
> You mean in Tiburon? The same way you always do. When "string" changes, so
> will "Char."
Yepp, general speaking ... and I guess time will show.
> I remember discussing this on the newsgroups a few years ago.
> http://groups.google.com/group/borla...9f251fa595dbdb
Thanks for the link, I have really been looking for an solution but could
not find any delphi or pascal solution. I think it will give me som ideas.
> The site where the final result was posted doesn't seem to exist anymore.
> It's been nearly five years, though.
Yes I noticed...
> The Windows API function for comparing strings while considering embedded
> numeric values is StrCmpLogical. Others have written about that function,
> and sometimes when they implement a equivalent function, they give it that
> name. For instance:
>
> http://groups.google.com/group/borla...41d49f8bbba577
Ok good to know :) The name for this sorting seems to vary a lot ...
Kind Regards
Roy M Klever

Re: Natural sorting optimizing... and a working solution :)
Roy Magne Klever wrote:
> You are right! It fails if either string is empty but no string will be
> empty ... since filenames can not be empty. I figured out the mistake
> about diffrent strings s1, s2 versus t1, t2. Bad mistake. Correcting it
> fixes much. I will add quick check and empty string detection. But
> still, the code is messy.
Since ExtractNr does not use any of the variables or parameters of
CompareNatural, it doesn't need to be a nested procedure. Move it
outside of CompareNatural, and they both become easier to read because
they aren't getting in each other's ways.
You can use the Sign function from the Math unit to simplify some of
your result calculations. You have this:
if n1 = n2 then
Result := 0
else
begin
if n1 > n2 then
Result := 1
else
Result := 1;
end;
You can replace all that with this:
Result := Sign(n1  n2);
And then later:
Result := Sign(c1  c2);
b := Result = 0;
And even later:
Result := Sign(Length(t1)  Length(t2));
>> The function fails when either string is empty. It also fails when s1
>> = 'p1q8' and s2 = 'p0001q5'. It should return 1, but it returns 1
>> instead.
>
> Fixed by correcting the s1, s2 to t1, t2.
If that's all you did to fix it, then it's not fixed yet. Your function
should now return 0 for those inputs. After ExtractNr, t1 is "q8" and t2
is "q5", and n1 = n2, so Result is set to 0. J is still 2, which is
equal to the lengths of both the strings now, so b is set to True. The
loop ends, and since the strings' lengths are equal, your function
returns 0.
I also wonder why you make ExtractNr start deleting characters from
*before* the first number. I'd be especially wary when the first
character of the string is a number  how much of the string gets
deleted when the start index you give to Delete is zero?

Rob

Re: Natural sorting optimizing... and a working solution :)
Roy Magne Klever wrote:
> I have been working with a natural sorting solution today and i was
> wondering if someone could have a look and maybe suggest how to
> optimize it.
Usually optimization is reserved for performance improvement but unless
you've got millions of strings to sort readability and reduced code
complexity is far more important. The following has a cyclomatic code
complexity of 8 according to SourceMonitor and could probably be
improved further with some more thought.
function NaturalCompare(s1, s2: String): Integer;
procedure ParseAndAddToStringlist(const s : string; SL : tStringList);
// pic011p001 becomes ['pic', '011', 'p', '001']
var
c : char;
Temp : string;
WasLetter, WasNumber : boolean;
begin
Temp := '';
WasNumber := False;
WasLetter := False;
for c in s do begin
if (c in ['0'..'9']) then begin
if WasLetter then begin
Sl.Add(temp);
Temp := c;
end else Temp := Temp + c;
WasLetter := False;
WasNumber := True;
end else begin
if WasNumber then begin
Sl.Add(temp);
Temp := c;
end else Temp := Temp + c;
WasLetter := True;
WasNumber := False;
end;
end;
SL.Add(temp);
end;
var
sl1, sl2 : tStringList;
i, i1, i2 : integer;
BothNumbers : boolean;
begin
Result := 0;
sl1 := TStringList.Create;
sl2 := TStringList.Create;
try
ParseAndAddToStringlist(s1, sl1);
ParseAndAddToStringlist(s2, sl2);
// Make sure both string lists have same number of elements
while sl1.Count < sl2.Count do sl1.add('');
while sl2.Count < sl1.Count do sl2.add('');
for i := 0 to sl1.count1 do begin
BothNumbers := TryStrToInt(sl1[i], i1) and TryStrToInt(sl2[i],
i2);
if BothNumbers then begin
Result := i1  i2;
end else begin
Result := CompareStr(sl1[i], SL2[i]);
end;
if Result <> 0 then Exit;
end;
finally
sl1.Free;
sl2.Free;
end;
end;

Mike (TeamB)

Re: Natural sorting optimizing... and a working solution :)
Hi Rob,
Yes, I am a bit quick to declare victory ... I found a few bugs when testing
again. Will be doing some more testing before i declare this one for tested.
Thanks for your time.
> Since ExtractNr does not use any of the variables or parameters of
> CompareNatural, it doesn't need to be a nested procedure.
Yeah I guess so.
> You can use the Sign function from the Math unit to simplify some of your
> result calculations. You have this:
> Result := Sign(n1  n2);
Nice, I did not know about this routine. Sure will do.
> If that's all you did to fix it, then it's not fixed yet. Your function
> should now return 0 for those inputs. After ExtractNr, t1 is "q8" and t2
> is "q5", and n1 = n2, so Result is set to 0. J is still 2, which is equal
> to the lengths of both the strings now, so b is set to True. The loop
> ends, and since the strings' lengths are equal, your function returns 0.
Yes this is one of the bugs I found and have now corrected.
> I also wonder why you make ExtractNr start deleting characters from
> *before* the first number. I'd be especially wary when the first character
> of the string is a number  how much of the string gets deleted when the
> start index you give to Delete is zero?
Yes this one is nasty... Sometimes it can be smart to take a break, I guess
:)
Kind Regards
Roy M Klever