Parse File 2x & format counter

This is a discussion on Parse File 2x & format counter within the awk forums in Programming Languages category; I'm running gawk on Windows XP. I have a text file for mailing in sequence to print 1 per sheet. I need to reorder the file so it prints 4 up on a sheet - so that it can be cut apart and stacked 1 on top of the next. For a 100 record file, I would want records 1, 26, 51 and 76 to print on sheet 1, records 2, 27, 52 and 77 to print on sheet 2. I mistakenly attacked this thinking I could write lines off into 4 alternating files, then combine them at the end ...

Go Back   Application Development Forum > Programming Languages > awk

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-02-2008, 04:33 PM
Jim Dornbos
Guest
 
Default Parse File 2x & format counter

I'm running gawk on Windows XP. I have a text file for mailing in
sequence to print 1 per sheet. I need to reorder the file so it prints 4
up on a sheet - so that it can be cut apart and stacked 1 on top of the
next. For a 100 record file, I would want records 1, 26, 51 and 76 to
print on sheet 1, records 2, 27, 52 and 77 to print on sheet 2.

I mistakenly attacked this thinking I could write lines off into 4
alternating files, then combine them at the end using something like:

{if (i < numberup) {n=i ".uuu" ; print $0>n ; i++}
else {i=0 ; n=i ".uuu" ; print $0>n ; i++}}

END {
system("type *.uuu>output.txt")
}

What I found is that I need to write them off into (NR/number up) files
(25 in the case of my 100 record file printing 4 up) then combine them
at the end.

Two places I'm having trouble with - First - I'm reading the file a 2nd
time fine - but ideally the first time through, I'd just like it to
count lines and give me the total number of records - doing no other
processing. Then on the second time through spit the lines off into
their respective files. How can I tell if I'm on my 2nd pass through the
file?

Second - My system command deals with the numeric named files
alphabetically - and as a result, combines file 1.uuu, then 10.uuu, then
100.uuu. I think I need to format that counter padded with leading zeros
so my system command will combine them in the correct order. I could use
a pointer to how I can format that counter.

On the other hand - that system command feels like a work around. Is
there a better way I should consider using to slice this file up and
write it out in a new order? The current job has ~70,000 records. The
number up varies from job to job as well.

TIA for your help.

Jim

Reply With Quote
  #2  
Old 08-02-2008, 05:28 PM
Hermann Peifer
Guest
 
Default Re: Parse File 2x & format counter

Jim Dornbos wrote:

> Two places I'm having trouble with - First - I'm reading the file a 2nd
> time fine - but ideally the first time through, I'd just like it to
> count lines and give me the total number of records - doing no other
> processing. Then on the second time through spit the lines off into
> their respective files. How can I tell if I'm on my 2nd pass through the
> file?


Typically one would do:

# Rule for 1st pass
NR == FNR { action statement(s) ; next }

# Rule for 2nd pass
{ action statement(s) }

For the simple count of records, you could also do:

NR != FNR && FNR == 1 { total = NR - FNR }

# Other rule(s) for 2nd pass
{ ... }

>
> Second - My system command deals with the numeric named files
> alphabetically - and as a result, combines file 1.uuu, then 10.uuu, then
> 100.uuu. I think I need to format that counter padded with leading zeros
> so my system command will combine them in the correct order. I could use
> a pointer to how I can format that counter.


n = sprintf("%07d", i) ".uuu" # or, alternatively:
n = sprintf("%.7i", i) ".uuu"

>
> On the other hand - that system command feels like a work around. Is
> there a better way I should consider using to slice this file up and
> write it out in a new order? The current job has ~70,000 records. The
> number up varies from job to job as well.
>


A 1-pass solution should work with the given number of records. It could be something like:

# Remember all records in array a, indexed by NR
{ a[NR] = $0 }

# Re-order records in END rule
END {
total = NR
...
}
Reply With Quote
  #3  
Old 08-02-2008, 05:37 PM
Hermann Peifer
Guest
 
Default Re: Parse File 2x & format counter

Hermann Peifer wrote:
>
> # Rule for 2nd pass
> { action statement(s) }
>


Make that:
NR != FNR { action statement(s) }

Hermann
Reply With Quote
  #4  
Old 08-02-2008, 05:38 PM
Kenny McCormack
Guest
 
Default Re: Parse File 2x & format counter

In article <4894D37C.1010108@gmx.net>, Hermann Peifer <peifer@gmx.net> wrote:
>Hermann Peifer wrote:
>>
>> # Rule for 2nd pass
>> { action statement(s) }
>>

>
>Make that:
>NR != FNR { action statement(s) }
>
>Hermann


No, you got it right the first time.

Reply With Quote
  #5  
Old 08-02-2008, 06:01 PM
Hermann Peifer
Guest
 
Default Re: Parse File 2x & format counter

Kenny McCormack wrote:
> In article <4894D37C.1010108@gmx.net>, Hermann Peifer <peifer@gmx.net> wrote:
>> Hermann Peifer wrote:
>>> # Rule for 2nd pass
>>> { action statement(s) }
>>>

>> Make that:
>> NR != FNR { action statement(s) }
>>
>> Hermann

>
> No, you got it right the first time.
>


You are right. My only excuse is that it is Saturday night, close to midnight here in Europe and I had a couple of beers.

What I wanted to correct was:

> For the simple count of records, you could also do:
>
> NR != FNR && FNR == 1 { total = NR - FNR }
>
> # Other rule(s) for 2nd pass
> { ... }
>


Make that:

# Other rule(s) for 2nd pass
NR != FNR { ... }

Hermann
Reply With Quote
  #6  
Old 08-02-2008, 10:29 PM
Jim Dornbos
Guest
 
Default Re: Parse File 2x & format counter

Hermann...

Thanks for the help. I made another trip through the help files to
understand your suggestions. I went ahead with the solution to write out
files and ended up with:

# order-byfile.awk --- reorder input file for output n-up
# Usage gawk -f order.awk <input file name> <number up>

BEGIN {
i=0
nup = ARGV[2]
ARGV[2]=ARGV[1]
}

NR != FNR && FNR == 1 { total = NR - FNR ; shts = (int(total/nup)+1) }
NR != FNR {
if (i < shts) {n=sprintf("%06d", i) ".uuu" ; print $0>n ; i++}
else {i=0 ; n=sprintf("%06d", i) ".uuu" ; print $0>n ; i++}
}

END { system("type *.uuu>output.txt") }

That took about 5 minutes to run on my 70,000 records (a file of about
35 megs). I shortened the 7 leading 0s to 6 - figuring if I ever needed
7 digits to pad my filenames, I'm screwed anyway.

For good measure, I went back to make a 1 pass version using an array to
store the input. I ended up with:

# order-byarray.awk --- reorder input file for output n-up
# Usage gawk -f order.awk <input file name> <number up>

BEGIN {
x=1
y=1
z=0
nup = ARGV[2]
ARGV[2]=""
}

{ a[NR] = $0 }

END { shts = (int(NR/nup)+1)
while ( x < (shts+1) ) {
while (z < nup) {print a[y] >"output.txt" ; y=y+shts ; z++ }
z=0 ; x++ ; y=x }
}

This one ran the 70,000 record file in 10 seconds or so.

Thanks for your help.
Jim
Reply With Quote
  #7  
Old 08-03-2008, 04:52 AM
Hermann Peifer
Guest
 
Default Re: Parse File 2x & format counter

Jim Dornbos wrote:

> # order-byarray.awk --- reorder input file for output n-up
> # Usage gawk -f order.awk <input file name> <number up>
>
> BEGIN {
> x=1
> y=1
> z=0
> nup = ARGV[2]
> ARGV[2]=""
> }
>
> { a[NR] = $0 }
>
> END { shts = (int(NR/nup)+1)
> while ( x < (shts+1) ) {
> while (z < nup) {print a[y] >"output.txt" ; y=y+shts ; z++ }
> z=0 ; x++ ; y=x }
> }
>
> This one ran the 70,000 record file in 10 seconds or so.
>
> Thanks for your help.


You are welcome.

You could run the script as follows:
# Usage gawk -v nup=<number up> -f order.awk <input file name>

.... and then shorten the BEGIN rule to:
BEGIN { x = y = 1 }

Hermann
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 02:16 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.