How can I check if a file exists in gawk?

This is a discussion on How can I check if a file exists in gawk? within the awk forums in Programming Languages category; On Mar 27, 8:05 pm, stan <smo...@exis.net> wrote: > Ed Morton wrote: > > > On 3/27/2008 11:48 AM, Boltar wrote: > >> On Mar 27, 4:36 pm, Ed Morton <mor...@lsupcaemnt.com> wrote: > > >>>On 3/27/2008 11:27 AM, Boltar wrote: > > <snip> > > >> but I guess that just another one of awks irritating little quirks. > > > It is not an awk quirk, it's one of the many getline features that aren't > > intuitively obvious and all of which need to be thoroughly understood before > > deciding whether or not to use getline. > ...

Go Back   Application Development Forum > Programming Languages > awk

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #11  
Old 03-29-2008, 04:12 PM
mjc
Guest
 
Default Re: How can I check if a file exists in gawk?

On Mar 27, 8:05 pm, stan <smo...@exis.net> wrote:
> Ed Morton wrote:
>
> > On 3/27/2008 11:48 AM, Boltar wrote:
> >> On Mar 27, 4:36 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:

>
> >>>On 3/27/2008 11:27 AM, Boltar wrote:

>
> <snip>
>
> >> but I guess that just another one of awks irritating little quirks.

>
> > It is not an awk quirk, it's one of the many getline features that aren't
> > intuitively obvious and all of which need to be thoroughly understood before
> > deciding whether or not to use getline.

>
> Personally, I found that I needed a firm fixed rule that every time I
> type getline I have to get up and go get a cup of coffee. Without any
> exceptions that I can remember I always find that I was trying to write
> a c progrm in awk and with a little reflection I can find a solution
> that uses awk instead of fighting against it.


I'm the reverse - I LIKE getline.

I have a utility subroutine for checking if a file exists:

function exists(file , line)
{
if ( (getline line < file) > 0 )
{
close(file);
return 1;
}
else
{
return 0;
}
}

I tend to use gawk as a general purpose programming language, so the
fact that it is C-like is a plus to me.

When the pattern-matching paradigm is appropriate, I use it; when it
isn't, I use getline to read from any number of files.

imho, the contortions needed to not use getline when reading multiple
files are less clear than using getline for all but one.

Please don't vote me off the island.

martin cohen
Reply With Quote
  #12  
Old 03-29-2008, 05:18 PM
Ed Morton
Guest
 
Default Re: How can I check if a file exists in gawk?



On 3/29/2008 3:12 PM, mjc wrote:
> On Mar 27, 8:05 pm, stan <smo...@exis.net> wrote:
>
>>Ed Morton wrote:
>>
>>
>>>On 3/27/2008 11:48 AM, Boltar wrote:
>>>
>>>>On Mar 27, 4:36 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>>>
>>>>>On 3/27/2008 11:27 AM, Boltar wrote:
>>>>

>><snip>
>>
>>>>but I guess that just another one of awks irritating little quirks.
>>>
>>>It is not an awk quirk, it's one of the many getline features that aren't
>>>intuitively obvious and all of which need to be thoroughly understood before
>>>deciding whether or not to use getline.

>>
>>Personally, I found that I needed a firm fixed rule that every time I
>>type getline I have to get up and go get a cup of coffee. Without any
>>exceptions that I can remember I always find that I was trying to write
>>a c progrm in awk and with a little reflection I can find a solution
>>that uses awk instead of fighting against it.

>
>
> I'm the reverse - I LIKE getline.
>
> I have a utility subroutine for checking if a file exists:
>
> function exists(file , line)
> {
> if ( (getline line < file) > 0 )
> {
> close(file);
> return 1;
> }
> else
> {
> return 0;
> }
> }
>
> I tend to use gawk as a general purpose programming language, so the
> fact that it is C-like is a plus to me.


Deciding to explicitly write

while read line {
split line into field1 field2 field3....
}

when that's already provided by the tool by default doesn't make that tool any
more general purpose or C-like.

> When the pattern-matching paradigm is appropriate, I use it; when it
> isn't, I use getline to read from any number of files.


I don't see the inverse relationship between pattern-matching and explicitly
reading input.

> imho, the contortions needed to not use getline when reading multiple
> files are less clear than using getline for all but one.


What contortions? Could you give a small example of the problem?

> Please don't vote me off the island.


getline has it's uses, see http://tinyurl.com/yn9ka9 for a list.

Ed.

Reply With Quote
  #13  
Old 03-30-2008, 11:22 AM
mjc
Guest
 
Default Re: How can I check if a file exists in gawk?

On Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>

.....
> What contortions? Could you give a small example of the problem?
>

......
> Ed.


I wrote a program in gawk to compare the results of a computation
(written in assembly language with debugging info) with a simulation
of the computation. In addition, the program also read the assembly
listing so it would know what debugging info could be written and read
its own source so it would know what debugging info was being looked
for.

So, there were four files being read - the first three (especially the
first) needed to be completely read before the last was started.

I read the first three in the BEGIN block using getline in three
separate loops. Since there was no overlap, I used the standard
splitting of $0 (i.e., getline < file). Patterns were matched using
combinations of "if ( $x == ..." and "if ( match($x, ...)".

The fourth file was read in the standard pattern-matching from stdin
paradigm. These patterns were what the program looked for when it read
itself so it could find out which assembly language debugging
statements were being looked for.

At each pattern-match, the assembly-language debug output was compared
with the corresponding simulation results (read from the first file)
and statistics about that particular part of the computation gathered
(min, max, and mean error). If the results were too bad, an error
message was written and also saved to be written at the end where it
could be readily noticed.

At the end, the program compared the assembly listing info with its
own listing info to tell which debugging statement had not been
reached (in 3rd and not in 2nd file) and which debugging statements
had not been looked for (in 2nd and not in 3rd file).

Finally, the statistics about the final errors were output.

This was the first time I had a gawk program that read its own source
- I found that somewhat amusing.

btw, I always use the "-lint" option, and ignore the "variable
shadows" and "nonstandard" messages. The other messages I often find
very helpful.

I suppose I could check when the record number resets to 1 to see when
a new file starts and look at FILENAME to see what the file is, but I
find using getline in this case much more straightforward. In
particular, I would have to have every pattern check for which file
the pattern applied to.

That's my story, a trifle gory, but I don't worry because it's not an
allegory.

martin cohen
Reply With Quote
  #14  
Old 03-30-2008, 11:35 AM
Kenny McCormack
Guest
 
Default Re: How can I check if a file exists in gawk?

In article <853cada4-f51a-47b8-b39e-ea056c00d1d4@c26g2000prf.googlegroups.com>,
mjc <mjcohen@acm.org> wrote:
>On Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>>

>....
>> What contortions? Could you give a small example of the problem?
>>

>.....
>> Ed.

>
>I wrote a program in gawk to compare the results of a computation
>(written in assembly language with debugging info) with a simulation
>of the computation. In addition, the program also read the assembly
>listing so it would know what debugging info could be written and read
>its own source so it would know what debugging info was being looked
>for.


Basically, getline is only _needed_ when you are reading more than one
file at a time (i.e., "in parallel"). Now, having said that, some
people find its use (in those cases where it is not necessary) to be
aesthetically appealing - and others don't. Obviously, there's no
accounting for taste. I agree with Ed's basic position on the matter,
which is that using getline appeals to people who don't quite get "awk
qua awk".

Your example certainly fits the classic "people think they need getline,
but they don't" archetype. That is, your program should (in the Ed/Kenny
sense of the word "should") be written:

ARGIND == 1 {
# Do stuff for file 1
next
}
ARGIND == 2 {
# Do stuff for file 2
next
}
ARGIND == 3 {
# Do stuff for file 3
next
}
{
# else do stuff for file 4
# Note that for this, the last file, you can also use all the usual
# AWK pattern/action stuff
}

Note: I hope I got the ARGIND stuff right (ARGIND is, AFAIK,
gawk-specific). I normally use TAWK, which has a variable called ARGI,
which is the same as gawk's ARGIND, except that it is higher by one (so
the first file is: ARGI == 2 {})

Notes:
1) Yes, it is unfortunate that you can only use the "automatic patterns"
in the last file. But this is (obviously) the same as if using getline.
2) The real point of using the above style in place of getline is that
you specify the files to be read on the command line, rather than
hard-coding them into the script. This is a Good Thing, although many
see it as a minus at first sight.

Reply With Quote
  #15  
Old 03-30-2008, 04:30 PM
Kenny McCormack
Guest
 
Default Re: How can I check if a file exists in gawk?

In article <47EFDC32.8020103@lsupcaemnt.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>So, you had something like this:
>
>BEGIN {
> while ((getline < ARGV[1]) > 0) {
> do first file stuff
> }
> close(ARGV[1])


As I pointed in my previous response, when people use getline, they
usually hard-code the name of the file they are reading from in the
script. This is generally a Bad Thing, but has superficial appeal.

Reply With Quote
  #16  
Old 03-30-2008, 04:56 PM
Janis Papanagnou
Guest
 
Default Re: How can I check if a file exists in gawk?

Ed Morton wrote:
>
> So, you had something like this:
>
> BEGIN {
> while ((getline < ARGV[1]) > 0) {
> do first file stuff
> }
> close(ARGV[1])
> while ((getline < ARGV[2]) > 0) {
> do second file stuff
> }
> close(ARGV[2])
> while ((getline < ARGV[3]) > 0) {
> do second file stuff
> }
> close(ARGV[3])
> ARGV[1]=ARGV[2]=ARGV[3]=""
> }
> /pattern/ { pattern match in fourth file }
> END { do the end stuff }
>
> when all you really needed was:
>
> ARGIND == 1 { do first file stuff; next }
> ARGIND == 2 { do second file stuff; next }
> ARGIND == 3 { do third file stuff; next }
> /pattern/ { pattern match in fourth file }
> END { do the end stuff }
>
> Replace "ARGIND == N" with "FILENAME == ARGV[N]" if you want a solution that
> isn't gawk-specific.


Once, in a similar case, I've used something like

awk -f prog.awk phase=1 file1 phase=2 file2 fileX fileY fileZ

Where prog.awk had been something like

phase == 1 { do first file stuff ; next }
phase == 2 { do second file stuff ; next }
/whatever/ { do rest of the files}

I've done that to avoid the filename comparison and GNU specifics.

Janis

>
> Ed.
>

Reply With Quote
  #17  
Old 09-02-2008, 04:28 PM
Junior Member
 
Join Date: Sep 2008
Fred Borkle is on a distinguished road
Default Re: How can I check if a file exists in gawk?

Hey how about an actual answer to his question:

Read the first line and see if it fails. In this case if an atch file exists, include it, otherwise don't leave a big blank area:

if ( ( getline atchline < ( FNAME ".atch" )) > 0 ) {
print "<tr><td width=30% valign=top>" > BFILE ;
print atchline > BFILE;
while ( ( getline atchline < ( FNAME ".atch" )) > 0 )
print atchline > BFILE;
print "</td>" > BFILE ;
}
else {
print "<tr><td width=3% valign=top>" > BFILE ;
print "</td>" > BFILE ;
}
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 02:09 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.