| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| GAWK: Effective AWK Programming says, in the section on processing command line options, "This function highlights one of the greatest weaknesses in awk, which is that it is very poor at manipulating single characters. Repeated calls to substr are necessary for accessing individual characters..." Here's a portable 2 command equivalent of GAWK's ability to split with "": gsub(/./,"&" SUBSEP,stringVar) n = split(stringVar,arr,SUBSEP) |
|
#2
| |||
| |||
| On Wednesday 3 September 2008 13:32, jh wrote: > GAWK: Effective AWK Programming says, in the section on processing > command line options, "This function highlights one of the greatest > weaknesses in awk, which is that it is very poor at manipulating single > characters. Repeated calls to substr are necessary for accessing > individual characters..." > > Here's a portable 2 command equivalent of GAWK's ability to split with "": > > gsub(/./,"&" SUBSEP,stringVar) > n = split(stringVar,arr,SUBSEP) That adds an extra SUBSEP at the end of the string, and thus an empty array element is created. The following *should* also work (works at least with gawk in POSIX mode): gawk --posix 'BEGIN{s="astring";n=split(s,arr,//); for(i=1;i<=n;i++) print arr[i]}' a s t r i n g Note that the man says that field splitting performed by split() is identical to the splitting done by FS, but the above command shows that it's not 100% true, since FS will never match an empty string. $ echo 'astring' | gawk --posix -v FS='' '{$1=$1; for(i=1;i<=NF;i++)print $i}' astring There was a recent thread about a similar issue: see http://groups.google.com/group/comp....8846069c606ba7 (or http://tinyurl.com/55l9f7 if the above wraps) |
|
#3
| |||
| |||
| In article <3pqdnaJl770r6CPVnZ2dnUVZ_v3inZ2d@neonova.net>, jh <jhart@mail.avcnet.org> wrote: > GAWK: Effective AWK Programming says, in the section on processing > command line options, "This function highlights one of the greatest > weaknesses in awk, which is that it is very poor at manipulating single > characters. Repeated calls to substr are necessary for accessing > individual characters..." > > Here's a portable 2 command equivalent of GAWK's ability to split with "": > > gsub(/./,"&" SUBSEP,stringVar) > n = split(stringVar,arr,SUBSEP) Doesn't this work for the K&R awk (except maybe for the default Solaris awk ) n = split(var,array,"") But then again, it has been years since I've had access to plan old K&R awk. Bob Harris |
|
#4
| |||
| |||
| That's the point, to be able to emulate, in standard awk, gawk's n=split(var,array,""). Disappointingly, there's a bug in some awk versions. My Mac OS X 10.4 has 2 versions, the one that came with it, version 20040207, and one I compiled, version 20070501. Both exhibit anomalous behavior, making the tip useless. I recommend that no one use it without testing it on their version of awk!! This code: BEGIN{RS=SUBSEP #read the whole file at once} { #Make sure there are no SUBSEPs in the line to be tested p = gsub(SUBSEP,"",$0) print "SUBSEPs in $0: " p # Put a SUBSEP after each character m = gsub(/./,"&" SUBSEP,$0) print "m=" m # Split the characters into an array n = split($0,__chars,SUBSEP) n-- # One too many chars because of SUBSEP at end print "n=" n # Return $0 to its original form gsub(SUBSEP,"",$0) # Put the $0 back the way it was print "lenght=" length($0) } when run with the 2 versions of AWK and fed a file containing 483 characters, yields the following results: SUBSEPs in $0: 0 m=483 n=520 length=483 The gsubs work fine, but the split doesn't. Running the same code with gawk version 3.1.5 yields: SUBSEPs in $0: 0 m=483 n=483 length=483 But, there's no reason to use gawk for this since it can do it in one statement. Jim Hart Ed Morton wrote: > On 9/3/2008 9:29 PM, Bob Harris wrote: >> In article <3pqdnaJl770r6CPVnZ2dnUVZ_v3inZ2d@neonova.net>, >> jh <jhart@mail.avcnet.org> wrote: >> >> >>> GAWK: Effective AWK Programming says, in the section on processing >>> command line options, "This function highlights one of the greatest >>> weaknesses in awk, which is that it is very poor at manipulating single >>> characters. Repeated calls to substr are necessary for accessing >>> individual characters..." >>> >>> Here's a portable 2 command equivalent of GAWK's ability to split with "": >>> >>> gsub(/./,"&" SUBSEP,stringVar) >>> n = split(stringVar,arr,SUBSEP) >> >> Doesn't this work for the K&R awk (except maybe for the default >> Solaris awk ) >> >> n = split(var,array,"") > > No: > > $ oawk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}' > foo > $ nawk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}' > foo > $ /usr/xpg4/bin/awk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}' > foo > $ gawk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}' > f > o > o > > Regards, > > Ed. > |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.