Splitting a string into characters portably

This is a discussion on Splitting a string into characters portably within the awk forums in Programming Languages category; GAWK: Effective AWK Programming says, in the section on processing command line options, "This function highlights one of the greatest weaknesses in awk, which is that it is very poor at manipulating single characters. Repeated calls to substr are necessary for accessing individual characters..." Here's a portable 2 command equivalent of GAWK's ability to split with "": gsub(/./,"&" SUBSEP,stringVar) n = split(stringVar,arr,SUBSEP)...

Go Back   Application Development Forum > Programming Languages > awk

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 09-03-2008, 07:32 AM
jh
Guest
 
Default Splitting a string into characters portably

GAWK: Effective AWK Programming says, in the section on processing
command line options, "This function highlights one of the greatest
weaknesses in awk, which is that it is very poor at manipulating single
characters. Repeated calls to substr are necessary for accessing
individual characters..."

Here's a portable 2 command equivalent of GAWK's ability to split with "":

gsub(/./,"&" SUBSEP,stringVar)
n = split(stringVar,arr,SUBSEP)
Reply With Quote
  #2  
Old 09-03-2008, 07:51 AM
pk
Guest
 
Default Re: Splitting a string into characters portably

On Wednesday 3 September 2008 13:32, jh wrote:

> GAWK: Effective AWK Programming says, in the section on processing
> command line options, "This function highlights one of the greatest
> weaknesses in awk, which is that it is very poor at manipulating single
> characters. Repeated calls to substr are necessary for accessing
> individual characters..."
>
> Here's a portable 2 command equivalent of GAWK's ability to split with "":
>
> gsub(/./,"&" SUBSEP,stringVar)
> n = split(stringVar,arr,SUBSEP)


That adds an extra SUBSEP at the end of the string, and thus an empty array
element is created.

The following *should* also work (works at least with gawk in POSIX mode):

gawk --posix 'BEGIN{s="astring";n=split(s,arr,//);
for(i=1;i<=n;i++) print arr[i]}'
a
s
t
r
i
n
g

Note that the man says that field splitting performed by split() is
identical to the splitting done by FS, but the above command shows that
it's not 100% true, since FS will never match an empty string.

$ echo 'astring' | gawk --posix -v FS='' '{$1=$1;
for(i=1;i<=NF;i++)print $i}'
astring


There was a recent thread about a similar issue: see

http://groups.google.com/group/comp....8846069c606ba7

(or http://tinyurl.com/55l9f7 if the above wraps)

Reply With Quote
  #3  
Old 09-03-2008, 10:29 PM
Bob Harris
Guest
 
Default Re: Splitting a string into characters portably

In article <3pqdnaJl770r6CPVnZ2dnUVZ_v3inZ2d@neonova.net>,
jh <jhart@mail.avcnet.org> wrote:

> GAWK: Effective AWK Programming says, in the section on processing
> command line options, "This function highlights one of the greatest
> weaknesses in awk, which is that it is very poor at manipulating single
> characters. Repeated calls to substr are necessary for accessing
> individual characters..."
>
> Here's a portable 2 command equivalent of GAWK's ability to split with "":
>
> gsub(/./,"&" SUBSEP,stringVar)
> n = split(stringVar,arr,SUBSEP)


Doesn't this work for the K&R awk (except maybe for the default
Solaris awk )

n = split(var,array,"")

But then again, it has been years since I've had access to plan
old K&R awk.

Bob Harris
Reply With Quote
  #4  
Old 09-14-2008, 12:40 PM
jh
Guest
 
Default Re: Splitting a string into characters portably - AWK BUG!

That's the point, to be able to emulate, in standard awk, gawk's
n=split(var,array,"").

Disappointingly, there's a bug in some awk versions. My Mac OS X 10.4
has 2 versions, the one that came with it, version 20040207, and one I
compiled, version 20070501. Both exhibit anomalous behavior, making the
tip useless. I recommend that no one use it without testing it on their
version of awk!!

This code:

BEGIN{RS=SUBSEP #read the whole file at once}
{
#Make sure there are no SUBSEPs in the line to be tested
p = gsub(SUBSEP,"",$0)
print "SUBSEPs in $0: " p

# Put a SUBSEP after each character
m = gsub(/./,"&" SUBSEP,$0)
print "m=" m

# Split the characters into an array
n = split($0,__chars,SUBSEP)
n-- # One too many chars because of SUBSEP at end
print "n=" n

# Return $0 to its original form
gsub(SUBSEP,"",$0) # Put the $0 back the way it was
print "lenght=" length($0)
}

when run with the 2 versions of AWK and fed a file containing 483
characters, yields the following results:

SUBSEPs in $0: 0
m=483
n=520
length=483


The gsubs work fine, but the split doesn't.


Running the same code with gawk version 3.1.5 yields:

SUBSEPs in $0: 0
m=483
n=483
length=483

But, there's no reason to use gawk for this since it can do it in one
statement.

Jim Hart

Ed Morton wrote:
> On 9/3/2008 9:29 PM, Bob Harris wrote:
>> In article <3pqdnaJl770r6CPVnZ2dnUVZ_v3inZ2d@neonova.net>,
>> jh <jhart@mail.avcnet.org> wrote:
>>
>>
>>> GAWK: Effective AWK Programming says, in the section on processing
>>> command line options, "This function highlights one of the greatest
>>> weaknesses in awk, which is that it is very poor at manipulating single
>>> characters. Repeated calls to substr are necessary for accessing
>>> individual characters..."
>>>
>>> Here's a portable 2 command equivalent of GAWK's ability to split with "":
>>>
>>> gsub(/./,"&" SUBSEP,stringVar)
>>> n = split(stringVar,arr,SUBSEP)

>>
>> Doesn't this work for the K&R awk (except maybe for the default
>> Solaris awk )
>>
>> n = split(var,array,"")

>
> No:
>
> $ oawk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}'
> foo
> $ nawk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}'
> foo
> $ /usr/xpg4/bin/awk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}'
> foo
> $ gawk 'BEGIN{n=split("foo",a,"");for (i=1;i<=n;i++) print a[i]; exit}'
> f
> o
> o
>
> Regards,
>
> Ed.
>

Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 03:50 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.