Numeric array index sort for gawk : awk
This is a discussion on Numeric array index sort for gawk within the awk forums in Programming Languages category; I need to sort an array index in numeric sequence. However, since all array indices are strings and "... integer values are always converted to strings as integers, no matter what the value of CONVFMT may happen to be...", asorti() sorts integer indices in alphabetic order not numeric order. In other words, this program: BEGIN{ pos[2] = "a" pos[10] = "a" pos[30] = "a" pos[110] = "a" n = asorti(pos,sorted) for(i=1;i<=n;i++) print sorted[i] } prints: 10 110 2 30 I want: 2 10 30 110 It can probably be hacked by making the integers into floats and using the right ...
![]() |
| | LinkBack | Thread Tools |
|
#1
| |||
| |||
| array indices are strings and "... integer values are always converted to strings as integers, no matter what the value of CONVFMT may happen to be...", asorti() sorts integer indices in alphabetic order not numeric order. In other words, this program: BEGIN{ pos[2] = "a" pos[10] = "a" pos[30] = "a" pos[110] = "a" n = asorti(pos,sorted) for(i=1;i<=n;i++) print sorted[i] } prints: 10 110 2 30 I want: 2 10 30 110 It can probably be hacked by making the integers into floats and using the right CONVFMT, but it would look messy. Instead, here are 3 small functions that provide ascending and descending numeric array index sorts using the external "sort" command and gawk's co-process operator, so there are no temporary files or extra arrays. function asortina(arr1,arr2, cmd){ cmd = "sort +0n" return __asort(arr1,arr2,cmd) } function asortind(arr1,arr2, cmd){ cmd = "sort +0nr" return __asort(arr1,arr2,cmd) } function __asort(arr1,arr2,cmd, i,n,m){ for(i in arr1) { print i |& cmd } close(cmd,"to") while((cmd |& getline m) > 0) arr2[++n] = m close(cmd,"from") return n } Jim Hart |
|
#2
| |||
| |||
| On Wednesday 3 September 2008 04:31, jh wrote: > I need to sort an array index in numeric sequence. However, since all > array indices are strings and "... integer values are always converted > to strings as integers, no matter what > the value of CONVFMT may happen to be...", asorti() sorts integer > indices in alphabetic order not numeric order. In other words, this > program: > > > BEGIN{ > pos[2] = "a" > pos[10] = "a" > pos[30] = "a" > pos[110] = "a" > n = asorti(pos,sorted) > for(i=1;i<=n;i++) print sorted[i] > } > > prints: > > 10 > 110 > 2 > 30 > > I want: > > 2 > 10 > 30 > 110 > > It can probably be hacked by making the integers into floats and using > the right CONVFMT, but it would look messy. Or just by doing BEGIN{ pos[2] = "a" pos[10] = "a" pos[30] = "a" pos[110] = "a" for (i in pos) sorted[j++]=i+0 n = asort(sorted) for(i=1;i<=n;i++) print sorted[i] } |
|
#3
| |||
| |||
| That's much better. Thank you! pk wrote: > Or just by doing > > BEGIN{ > pos[2] = "a" > pos[10] = "a" > pos[30] = "a" > pos[110] = "a" > for (i in pos) sorted[j++]=i+0 > n = asort(sorted) > for(i=1;i<=n;i++) print sorted[i] > } > |
|
#4
| |||
| |||
| In article <faCdnaKPtpfr7iPVnZ2dnUVZ_hednZ2d@neonova.net>, jh <jhart@mail.avcnet.org> wrote: >That's much better. Thank you! > >pk wrote: > >> Or just by doing >> >> BEGIN{ >> pos[2] = "a" >> pos[10] = "a" >> pos[30] = "a" >> pos[110] = "a" >> for (i in pos) sorted[j++]=i+0 >> n = asort(sorted) >> for(i=1;i<=n;i++) print sorted[i] >> } >> The man page does not make it that clear (yes, I know it can be read in, but it is not stated explicitly) that different rules apply for the sorting algorithms of asort() and asorti(). But it does seem to be true. |
|
#5
| |||
| |||
| Kenny McCormack escribió: > In article <faCdnaKPtpfr7iPVnZ2dnUVZ_hednZ2d@neonova.net>, > jh <jhart@mail.avcnet.org> wrote: >> That's much better. Thank you! >> >> pk wrote: >> >>> Or just by doing >>> >>> BEGIN{ >>> pos[2] = "a" >>> pos[10] = "a" >>> pos[30] = "a" >>> pos[110] = "a" >>> for (i in pos) sorted[j++]=i+0 >>> n = asort(sorted) >>> for(i=1;i<=n;i++) print sorted[i] >>> } >>> > > The man page does not make it that clear (yes, I know it can be read in, > but it is not stated explicitly) that different rules apply for the > sorting algorithms of asort() and asorti(). But it does seem to be true. > I assume the sorting algorithm is the same, but asort sorts values of mixed types (numbers and/or strings) while asorti sorts index values, that are always strings. The latter is documented in section "7.7 Using Numbers to Subscript Arrays" of "GAWK: Effective AWK Programming". But you probably already know that. -- Manuel Collado - http://lml.ls.fi.upm.es/~mcollado |
|
#6
| |||
| |||
| In article <g9mlg1$6qd$1@heraldo.rediris.es>, Manuel Collado <m.collado@lml.ls.fi.upm.es> wrote: .... >I assume the sorting algorithm is the same, but asort sorts values of >mixed types (numbers and/or strings) while asorti sorts index values, >that are always strings. The latter is documented in section "7.7 Using >Numbers to Subscript Arrays" of "GAWK: Effective AWK Programming". But >you probably already know that. Yes. I was just making the (small) point that it could be made more explicit. Note, BTW, and apropos of nothing, that TAWK sorts numeric indices as numbers. I.e., if you do "for i in A", it looks at the indices, and, if they are all numbers, sorts accordingly. Just another fine TAWK feature... GAWK *could* (but doesn't) do the same thing. |
|
#7
| |||
| |||
| "Kenny McCormack" <gazelle@shell.xmission.com> wrote in message news:g9moi0$dja$1@news.xmission.com... > Note, BTW, and apropos of nothing, that TAWK sorts numeric indices as > numbers. I.e., if you do "for i in A", it looks at the indices, and, if > they are all numbers, sorts accordingly. Just another fine TAWK feature... Yes. Beyond that, if you happen to know that the indices are consecutive integers, you can skip invoking the sort by using "for ( i = min; i <max; i++ )" instead of "for i in A". That "in" keyword invokes the sort, which loses you time if you don't need that done. You can gain even if numeric indices are not consecutive, just monotonically increasing (or decreasing). Just skip the indices that aren't there: "if !(i in A) continue". You can even be really really tricky and use the presence or absence of a particular numeric indice to tell you meta information about the data that is/is not at that indice, but that's really straying far off the point. - Anton Treuenfels |



