Objectmix
Tags Register Mark Forums Read

Numeric array index sort for gawk : awk

This is a discussion on Numeric array index sort for gawk within the awk forums in Programming Languages category; I need to sort an array index in numeric sequence. However, since all array indices are strings and "... integer values are always converted to strings as integers, no matter what the value of CONVFMT may happen to be...", asorti() sorts integer indices in alphabetic order not numeric order. In other words, this program: BEGIN{ pos[2] = "a" pos[10] = "a" pos[30] = "a" pos[110] = "a" n = asorti(pos,sorted) for(i=1;i<=n;i++) print sorted[i] } prints: 10 110 2 30 I want: 2 10 30 110 It can probably be hacked by making the integers into floats and using the right ...


Object Mix > Programming Languages > awk > Numeric array index sort for gawk

Reply

 

LinkBack Thread Tools
  #1  
Old 09-02-2008, 09:31 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Numeric array index sort for gawk

I need to sort an array index in numeric sequence. However, since all
array indices are strings and "... integer values are always converted
to strings as integers, no matter what
the value of CONVFMT may happen to be...", asorti() sorts integer
indices in alphabetic order not numeric order. In other words, this program:


BEGIN{
pos[2] = "a"
pos[10] = "a"
pos[30] = "a"
pos[110] = "a"
n = asorti(pos,sorted)
for(i=1;i<=n;i++) print sorted[i]
}

prints:

10
110
2
30

I want:

2
10
30
110

It can probably be hacked by making the integers into floats and using
the right CONVFMT, but it would look messy.

Instead, here are 3 small functions that provide ascending and
descending numeric array index sorts using the external "sort" command
and gawk's co-process operator, so there are no temporary files or
extra arrays.


function asortina(arr1,arr2, cmd){
cmd = "sort +0n"
return __asort(arr1,arr2,cmd)
}

function asortind(arr1,arr2, cmd){
cmd = "sort +0nr"
return __asort(arr1,arr2,cmd)
}

function __asort(arr1,arr2,cmd, i,n,m){
for(i in arr1) { print i |& cmd }
close(cmd,"to")
while((cmd |& getline m) > 0)
arr2[++n] = m
close(cmd,"from")
return n
}



Jim Hart
  #2  
Old 09-03-2008, 05:47 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Numeric array index sort for gawk

On Wednesday 3 September 2008 04:31, jh wrote:

> I need to sort an array index in numeric sequence. However, since all
> array indices are strings and "... integer values are always converted
> to strings as integers, no matter what
> the value of CONVFMT may happen to be...", asorti() sorts integer
> indices in alphabetic order not numeric order. In other words, this
> program:
>
>
> BEGIN{
> pos[2] = "a"
> pos[10] = "a"
> pos[30] = "a"
> pos[110] = "a"
> n = asorti(pos,sorted)
> for(i=1;i<=n;i++) print sorted[i]
> }
>
> prints:
>
> 10
> 110
> 2
> 30
>
> I want:
>
> 2
> 10
> 30
> 110
>
> It can probably be hacked by making the integers into floats and using
> the right CONVFMT, but it would look messy.


Or just by doing

BEGIN{
pos[2] = "a"
pos[10] = "a"
pos[30] = "a"
pos[110] = "a"
for (i in pos) sorted[j++]=i+0
n = asort(sorted)
for(i=1;i<=n;i++) print sorted[i]
}

  #3  
Old 09-03-2008, 06:22 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Numeric array index sort for gawk

That's much better. Thank you!

pk wrote:

> Or just by doing
>
> BEGIN{
> pos[2] = "a"
> pos[10] = "a"
> pos[30] = "a"
> pos[110] = "a"
> for (i in pos) sorted[j++]=i+0
> n = asort(sorted)
> for(i=1;i<=n;i++) print sorted[i]
> }
>

  #4  
Old 09-03-2008, 07:11 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Numeric array index sort for gawk

In article <faCdnaKPtpfr7iPVnZ2dnUVZ_hednZ2d@neonova.net>,
jh <jhart@mail.avcnet.org> wrote:
>That's much better. Thank you!
>
>pk wrote:
>
>> Or just by doing
>>
>> BEGIN{
>> pos[2] = "a"
>> pos[10] = "a"
>> pos[30] = "a"
>> pos[110] = "a"
>> for (i in pos) sorted[j++]=i+0
>> n = asort(sorted)
>> for(i=1;i<=n;i++) print sorted[i]
>> }
>>


The man page does not make it that clear (yes, I know it can be read in,
but it is not stated explicitly) that different rules apply for the
sorting algorithms of asort() and asorti(). But it does seem to be true.

  #5  
Old 09-03-2008, 01:34 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Numeric array index sort for gawk

Kenny McCormack escribió:
> In article <faCdnaKPtpfr7iPVnZ2dnUVZ_hednZ2d@neonova.net>,
> jh <jhart@mail.avcnet.org> wrote:
>> That's much better. Thank you!
>>
>> pk wrote:
>>
>>> Or just by doing
>>>
>>> BEGIN{
>>> pos[2] = "a"
>>> pos[10] = "a"
>>> pos[30] = "a"
>>> pos[110] = "a"
>>> for (i in pos) sorted[j++]=i+0
>>> n = asort(sorted)
>>> for(i=1;i<=n;i++) print sorted[i]
>>> }
>>>

>
> The man page does not make it that clear (yes, I know it can be read in,
> but it is not stated explicitly) that different rules apply for the
> sorting algorithms of asort() and asorti(). But it does seem to be true.
>


I assume the sorting algorithm is the same, but asort sorts values of
mixed types (numbers and/or strings) while asorti sorts index values,
that are always strings. The latter is documented in section "7.7 Using
Numbers to Subscript Arrays" of "GAWK: Effective AWK Programming". But
you probably already know that.

--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
  #6  
Old 09-03-2008, 02:29 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Numeric array index sort for gawk

In article <g9mlg1$6qd$1@heraldo.rediris.es>,
Manuel Collado <m.collado@lml.ls.fi.upm.es> wrote:
....
>I assume the sorting algorithm is the same, but asort sorts values of
>mixed types (numbers and/or strings) while asorti sorts index values,
>that are always strings. The latter is documented in section "7.7 Using
>Numbers to Subscript Arrays" of "GAWK: Effective AWK Programming". But
>you probably already know that.


Yes. I was just making the (small) point that it could be made more
explicit.

Note, BTW, and apropos of nothing, that TAWK sorts numeric indices as
numbers. I.e., if you do "for i in A", it looks at the indices, and, if
they are all numbers, sorts accordingly. Just another fine TAWK feature...

GAWK *could* (but doesn't) do the same thing.

  #7  
Old 09-03-2008, 10:10 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Numeric array index sort for gawk


"Kenny McCormack" <gazelle@shell.xmission.com> wrote in message
news:g9moi0$dja$1@news.xmission.com...

> Note, BTW, and apropos of nothing, that TAWK sorts numeric indices as
> numbers. I.e., if you do "for i in A", it looks at the indices, and, if
> they are all numbers, sorts accordingly. Just another fine TAWK

feature...

Yes. Beyond that, if you happen to know that the indices are consecutive
integers, you can skip invoking the sort by using "for ( i = min; i <max;
i++ )" instead of "for i in A". That "in" keyword invokes the sort, which
loses you time if you don't need that done.

You can gain even if numeric indices are not consecutive, just monotonically
increasing (or decreasing). Just skip the indices that aren't there: "if !(i
in A) continue".

You can even be really really tricky and use the presence or absence of a
particular numeric indice to tell you meta information about the data that
is/is not at that indice, but that's really straying far off the point.

- Anton Treuenfels


Reply

Thread Tools



All times are GMT -5. The time now is 08:30 AM.

Managed by Infnx Pvt Ltd.