| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I'm running gawk on Windows XP. I have a text file for mailing in sequence to print 1 per sheet. I need to reorder the file so it prints 4 up on a sheet - so that it can be cut apart and stacked 1 on top of the next. For a 100 record file, I would want records 1, 26, 51 and 76 to print on sheet 1, records 2, 27, 52 and 77 to print on sheet 2. I mistakenly attacked this thinking I could write lines off into 4 alternating files, then combine them at the end using something like: {if (i < numberup) {n=i ".uuu" ; print $0>n ; i++} else {i=0 ; n=i ".uuu" ; print $0>n ; i++}} END { system("type *.uuu>output.txt") } What I found is that I need to write them off into (NR/number up) files (25 in the case of my 100 record file printing 4 up) then combine them at the end. Two places I'm having trouble with - First - I'm reading the file a 2nd time fine - but ideally the first time through, I'd just like it to count lines and give me the total number of records - doing no other processing. Then on the second time through spit the lines off into their respective files. How can I tell if I'm on my 2nd pass through the file? Second - My system command deals with the numeric named files alphabetically - and as a result, combines file 1.uuu, then 10.uuu, then 100.uuu. I think I need to format that counter padded with leading zeros so my system command will combine them in the correct order. I could use a pointer to how I can format that counter. On the other hand - that system command feels like a work around. Is there a better way I should consider using to slice this file up and write it out in a new order? The current job has ~70,000 records. The number up varies from job to job as well. TIA for your help. Jim |
|
#2
| |||
| |||
| Jim Dornbos wrote: > Two places I'm having trouble with - First - I'm reading the file a 2nd > time fine - but ideally the first time through, I'd just like it to > count lines and give me the total number of records - doing no other > processing. Then on the second time through spit the lines off into > their respective files. How can I tell if I'm on my 2nd pass through the > file? Typically one would do: # Rule for 1st pass NR == FNR { action statement(s) ; next } # Rule for 2nd pass { action statement(s) } For the simple count of records, you could also do: NR != FNR && FNR == 1 { total = NR - FNR } # Other rule(s) for 2nd pass { ... } > > Second - My system command deals with the numeric named files > alphabetically - and as a result, combines file 1.uuu, then 10.uuu, then > 100.uuu. I think I need to format that counter padded with leading zeros > so my system command will combine them in the correct order. I could use > a pointer to how I can format that counter. n = sprintf("%07d", i) ".uuu" # or, alternatively: n = sprintf("%.7i", i) ".uuu" > > On the other hand - that system command feels like a work around. Is > there a better way I should consider using to slice this file up and > write it out in a new order? The current job has ~70,000 records. The > number up varies from job to job as well. > A 1-pass solution should work with the given number of records. It could be something like: # Remember all records in array a, indexed by NR { a[NR] = $0 } # Re-order records in END rule END { total = NR ... } |
|
#3
| |||
| |||
| Hermann Peifer wrote: > > # Rule for 2nd pass > { action statement(s) } > Make that: NR != FNR { action statement(s) } Hermann |
|
#4
| |||
| |||
| In article <4894D37C.1010108@gmx.net>, Hermann Peifer <peifer@gmx.net> wrote: >Hermann Peifer wrote: >> >> # Rule for 2nd pass >> { action statement(s) } >> > >Make that: >NR != FNR { action statement(s) } > >Hermann No, you got it right the first time. |
|
#5
| |||
| |||
| Kenny McCormack wrote: > In article <4894D37C.1010108@gmx.net>, Hermann Peifer <peifer@gmx.net> wrote: >> Hermann Peifer wrote: >>> # Rule for 2nd pass >>> { action statement(s) } >>> >> Make that: >> NR != FNR { action statement(s) } >> >> Hermann > > No, you got it right the first time. > You are right. My only excuse is that it is Saturday night, close to midnight here in Europe and I had a couple of beers. What I wanted to correct was: > For the simple count of records, you could also do: > > NR != FNR && FNR == 1 { total = NR - FNR } > > # Other rule(s) for 2nd pass > { ... } > Make that: # Other rule(s) for 2nd pass NR != FNR { ... } Hermann |
|
#6
| |||
| |||
| Hermann... Thanks for the help. I made another trip through the help files to understand your suggestions. I went ahead with the solution to write out files and ended up with: # order-byfile.awk --- reorder input file for output n-up # Usage gawk -f order.awk <input file name> <number up> BEGIN { i=0 nup = ARGV[2] ARGV[2]=ARGV[1] } NR != FNR && FNR == 1 { total = NR - FNR ; shts = (int(total/nup)+1) } NR != FNR { if (i < shts) {n=sprintf("%06d", i) ".uuu" ; print $0>n ; i++} else {i=0 ; n=sprintf("%06d", i) ".uuu" ; print $0>n ; i++} } END { system("type *.uuu>output.txt") } That took about 5 minutes to run on my 70,000 records (a file of about 35 megs). I shortened the 7 leading 0s to 6 - figuring if I ever needed 7 digits to pad my filenames, I'm screwed anyway. For good measure, I went back to make a 1 pass version using an array to store the input. I ended up with: # order-byarray.awk --- reorder input file for output n-up # Usage gawk -f order.awk <input file name> <number up> BEGIN { x=1 y=1 z=0 nup = ARGV[2] ARGV[2]="" } { a[NR] = $0 } END { shts = (int(NR/nup)+1) while ( x < (shts+1) ) { while (z < nup) {print a[y] >"output.txt" ; y=y+shts ; z++ } z=0 ; x++ ; y=x } } This one ran the 70,000 record file in 10 seconds or so. Thanks for your help. Jim |
|
#7
| |||
| |||
| Jim Dornbos wrote: > # order-byarray.awk --- reorder input file for output n-up > # Usage gawk -f order.awk <input file name> <number up> > > BEGIN { > x=1 > y=1 > z=0 > nup = ARGV[2] > ARGV[2]="" > } > > { a[NR] = $0 } > > END { shts = (int(NR/nup)+1) > while ( x < (shts+1) ) { > while (z < nup) {print a[y] >"output.txt" ; y=y+shts ; z++ } > z=0 ; x++ ; y=x } > } > > This one ran the 70,000 record file in 10 seconds or so. > > Thanks for your help. You are welcome. You could run the script as follows: # Usage gawk -v nup=<number up> -f order.awk <input file name> .... and then shorten the BEGIN rule to: BEGIN { x = y = 1 } Hermann |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.