Gawk Newbe Syntax Trouble

This is a discussion on Gawk Newbe Syntax Trouble within the awk forums in Programming Languages category; I have been struggling with what must be a simple GAWK task. The text below is basically one record of many in a file that I want to extract information from: & DEL FI0454C ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW PNTSRVTP FI0454C TPS NRA_TPS PARENT FI0454C NRA_T1 ENTNAM FI0454C FI0454C DISPLAY FI0454C NRA_WELLPRESS.htm & I want to extract the 2nd and 3rd fields from the DISPLAY record IF the 3rd record of the PNTSRVTP record is TPS. Below is my code: #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } } $1 ~ /PNTSRVTP/ { ...

Go Back   Application Development Forum > Programming Languages > awk

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 09-04-2008, 10:49 AM
industcontrols@iinet.net.au
Guest
 
Default Gawk Newbe Syntax Trouble

I have been struggling with what must be a simple GAWK task.

The text below is basically one record of many in a file that I want
to extract information from:
&
DEL FI0454C
ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW
PNTSRVTP FI0454C TPS NRA_TPS
PARENT FI0454C NRA_T1
ENTNAM FI0454C FI0454C
DISPLAY FI0454C NRA_WELLPRESS.htm
&

I want to extract the 2nd and 3rd fields from the DISPLAY record IF
the 3rd record of the PNTSRVTP record is TPS.

Below is my code:

#$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } }
$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } }

#$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } }
$1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } }

I tried checking for TPS and storing the name field and then when you
find the next DISPLAY field that matches the tagname store the $2 & $3
fields. I also tried using a boolean expression to perform a similar
duty but both methods leave the same result.

I think that I don't understand how to use the IF statement. I have
read the GAWK: Effective GAWK Programming manual but I cannot see
where I am going wrong.

I have a few questions:
1) It appears that you can only use an IF statement inside braces, as
part of an action, is this correct?
2) Is there a method where you can conditionally look ahead a number
of records?

Thanks in advance for any suggestions.

Craig
Reply With Quote
  #2  
Old 09-04-2008, 12:26 PM
Janis Papanagnou
Guest
 
Default Re: Gawk Newbe Syntax Trouble

industcontrols@iinet.net.au wrote:
> I have been struggling with what must be a simple GAWK task.
>
> The text below is basically one record of many in a file that I want
> to extract information from:
> &
> DEL FI0454C
> ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW
> PNTSRVTP FI0454C TPS NRA_TPS
> PARENT FI0454C NRA_T1
> ENTNAM FI0454C FI0454C
> DISPLAY FI0454C NRA_WELLPRESS.htm
> &
>
> I want to extract the 2nd and 3rd fields from the DISPLAY record IF
> the 3rd record of the PNTSRVTP record is TPS.


It's not clear whether the '&' are record delimiters or just meta
characters of your posting. Is every record always complete and has
all the fields present? Try this...

$1 == "PNTSRVTP" { tps = ($3 == "TPS") }
$1 == "DISPLAY" && tps { print $2, $3 }

>
> Below is my code:
>
> #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } }
> $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } }
>
> #$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } }
> $1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } }
>
> I tried checking for TPS and storing the name field and then when you
> find the next DISPLAY field that matches the tagname store the $2 & $3
> fields. I also tried using a boolean expression to perform a similar
> duty but both methods leave the same result.
>
> I think that I don't understand how to use the IF statement. I have
> read the GAWK: Effective GAWK Programming manual but I cannot see
> where I am going wrong.
>
> I have a few questions:
> 1) It appears that you can only use an IF statement inside braces, as
> part of an action, is this correct?


There's the condition part and the action part in awk programs;

condition { action }

The condition is already a predicate and doesn't need (doesn't allow)
an 'if' statement. The action part is like other programming languages
and supports the (explicit) 'if' statement.

> 2) Is there a method where you can conditionally look ahead a number
> of records?


Yes, but it depends on the record structure and the task whether that
is the way to go. You can redefine the RS="" and FS="\n" then you have
each block as $0 and every line available as $1, $2, ... $NF. But in
your case you would then have to split your lines again to obtain the
individual fields (because $i now identifies complete lines instead of
fields on the line), so the above suggested solution seems advantageous.

Janis

>
> Thanks in advance for any suggestions.
>
> Craig

Reply With Quote
  #3  
Old 09-04-2008, 03:49 PM
Grant
Guest
 
Default Re: Gawk Newbe Syntax Trouble

On Thu, 04 Sep 2008 18:26:36 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

>industcontrols@iinet.net.au wrote:
>> I have been struggling with what must be a simple GAWK task.
>>
>> The text below is basically one record of many in a file that I want
>> to extract information from:
>> &
>> DEL FI0454C
>> ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW
>> PNTSRVTP FI0454C TPS NRA_TPS
>> PARENT FI0454C NRA_T1
>> ENTNAM FI0454C FI0454C
>> DISPLAY FI0454C NRA_WELLPRESS.htm
>> &
>>
>> I want to extract the 2nd and 3rd fields from the DISPLAY record IF
>> the 3rd record of the PNTSRVTP record is TPS.

>
>It's not clear whether the '&' are record delimiters or just meta
>characters of your posting. Is every record always complete and has
>all the fields present? Try this...
>
> $1 == "PNTSRVTP" { tps = ($3 == "TPS") }
> $1 == "DISPLAY" && tps { print $2, $3 }


I think you may need to disarm 'tps' after use, ready for the next
record group, otherwise it makes little sense to have that trigger?

$1 == "PNTSRVTP" { tps = ($3 == "TPS") }
$1 == "DISPLAY" && tps { print $2, $3; tps = "" }

Grant.
--
Cats, no less liquid than their shadows, offer no angles to the wind.
Reply With Quote
  #4  
Old 09-04-2008, 07:19 PM
Janis Papanagnou
Guest
 
Default Re: Gawk Newbe Syntax Trouble

Grant wrote:
> On Thu, 04 Sep 2008 18:26:36 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>
>
>>industcontrols@iinet.net.au wrote:
>>
>>>I have been struggling with what must be a simple GAWK task.
>>>
>>>The text below is basically one record of many in a file that I want
>>>to extract information from:
>>>&
>>>DEL FI0454C
>>>ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW
>>>PNTSRVTP FI0454C TPS NRA_TPS
>>>PARENT FI0454C NRA_T1
>>>ENTNAM FI0454C FI0454C
>>>DISPLAY FI0454C NRA_WELLPRESS.htm
>>>&
>>>
>>>I want to extract the 2nd and 3rd fields from the DISPLAY record IF
>>>the 3rd record of the PNTSRVTP record is TPS.

>>
>>It's not clear whether the '&' are record delimiters or just meta
>>characters of your posting. Is every record always complete and has
>>all the fields present? Try this...
>>
>> $1 == "PNTSRVTP" { tps = ($3 == "TPS") }
>> $1 == "DISPLAY" && tps { print $2, $3 }

>
>
> I think you may need to disarm 'tps' after use, ready for the next
> record group, otherwise it makes little sense to have that trigger?


No, because the trigger considers whether "TPS" is in the third field,
and - that's why I asked "Is every record always complete" - if "PNTSRVTP"
is always present you just have to trigger according to the value of $3 in
the "PNTSRVTP" entry, and you *always* have to consider that record then.

Janis

>
> $1 == "PNTSRVTP" { tps = ($3 == "TPS") }
> $1 == "DISPLAY" && tps { print $2, $3; tps = "" }
>
> Grant.

Reply With Quote
  #5  
Old 09-04-2008, 10:34 PM
Grant
Guest
 
Default Re: Gawk Newbe Syntax Trouble

On Fri, 05 Sep 2008 01:19:07 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

>Grant wrote:
>> On Thu, 04 Sep 2008 18:26:36 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>>
>>
>>>industcontrols@iinet.net.au wrote:
>>>
>>>>I have been struggling with what must be a simple GAWK task.
>>>>
>>>>The text below is basically one record of many in a file that I want
>>>>to extract information from:
>>>>&
>>>>DEL FI0454C
>>>>ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW
>>>>PNTSRVTP FI0454C TPS NRA_TPS
>>>>PARENT FI0454C NRA_T1
>>>>ENTNAM FI0454C FI0454C
>>>>DISPLAY FI0454C NRA_WELLPRESS.htm
>>>>&
>>>>
>>>>I want to extract the 2nd and 3rd fields from the DISPLAY record IF
>>>>the 3rd record of the PNTSRVTP record is TPS.
>>>
>>>It's not clear whether the '&' are record delimiters or just meta
>>>characters of your posting. Is every record always complete and has
>>>all the fields present? Try this...
>>>
>>> $1 == "PNTSRVTP" { tps = ($3 == "TPS") }
>>> $1 == "DISPLAY" && tps { print $2, $3 }

>>
>>
>> I think you may need to disarm 'tps' after use, ready for the next
>> record group, otherwise it makes little sense to have that trigger?

>
>No, because the trigger considers whether "TPS" is in the third field,
>and - that's why I asked "Is every record always complete" - if "PNTSRVTP"
>is always present you just have to trigger according to the value of $3 in
>the "PNTSRVTP" entry, and you *always* have to consider that record then.


Sorry, I missed the boolean assignment Dunno what I was thinking now
I look at the thing again.

Grant.
--
Cats, no less liquid than their shadows, offer no angles to the wind.
Reply With Quote
  #6  
Old 09-05-2008, 08:03 AM
Ed Morton
Guest
 
Default Re: Gawk Newbe Syntax Trouble

On 9/4/2008 9:49 AM, industcontrols@iinet.net.au wrote:
> I have been struggling with what must be a simple GAWK task.
>
> The text below is basically one record of many in a file that I want
> to extract information from:
> &
> DEL FI0454C
> ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW
> PNTSRVTP FI0454C TPS NRA_TPS
> PARENT FI0454C NRA_T1
> ENTNAM FI0454C FI0454C
> DISPLAY FI0454C NRA_WELLPRESS.htm
> &
>
> I want to extract the 2nd and 3rd fields from the DISPLAY record IF
> the 3rd record of the PNTSRVTP record is TPS.
>
> Below is my code:
>
> #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } }
> $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } }
>
> #$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } }
> $1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } }
>
> I tried checking for TPS and storing the name field and then when you
> find the next DISPLAY field that matches the tagname store the $2 & $3
> fields. I also tried using a boolean expression to perform a similar
> duty but both methods leave the same result.
>
> I think that I don't understand how to use the IF statement. I have
> read the GAWK: Effective GAWK Programming manual but I cannot see
> where I am going wrong.
>
> I have a few questions:
> 1) It appears that you can only use an IF statement inside braces, as
> part of an action, is this correct?


Yes

> 2) Is there a method where you can conditionally look ahead a number
> of records?


No

> Thanks in advance for any suggestions.
>
> Craig


If the "&"s are record separators, and each record has the same number and
layout of fields, then all you need is:

awk -v RS='&' '$12=="TPS"{print $21,$22}' file

Ed.

Reply With Quote
  #7  
Old 09-05-2008, 09:50 AM
industcontrols@iinet.net.au
Guest
 
Default Re: Gawk Newbe Syntax Trouble

On Sep 5, 8:03*pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
> On 9/4/2008 9:49 AM, industcontr...@iinet.net.au wrote:
>
>
>
>
>
> > I have been struggling with what must be a simple GAWK task.

>
> > The text below is basically one record of many in a file that I want
> > to extract information from:
> > &
> > DEL * * *FI0454C
> > ADD * * *FI0454C PSA46771 PEN3B CORRECTED WH FLOW
> > PNTSRVTP FI0454C TPS NRA_TPS
> > PARENT * FI0454C NRA_T1
> > ENTNAM * FI0454C FI0454C
> > DISPLAY *FI0454C NRA_WELLPRESS.htm
> > &

>
> > I want to extract the 2nd and 3rd fields from the DISPLAY record IF
> > the 3rd record of the PNTSRVTP record is TPS.

>
> > Below is my code:

>
> > #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } }
> > $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } }

>
> > #$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } }
> > $1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } }

>
> > I tried checking for TPS and storing the name field and then when you
> > find the next DISPLAY field that matches the tagname store the $2 & $3
> > fields. I also tried using a boolean expression to perform a similar
> > duty but both methods leave the same result.

>
> > I think that I don't understand how to use the IF statement. I have
> > read the GAWK: Effective GAWK Programming manual but I cannot see
> > where I am going wrong.

>
> > I have a few questions:
> > 1) It appears that you can only use an IF statement inside braces, as
> > part of an action, is this correct?

>
> Yes
>
> > 2) Is there a method where you can conditionally look ahead a number
> > of records?

>
> No
>
> > Thanks in advance for any suggestions.

>
> > Craig

>
> If the "&"s are record separators, and each record has the same number and
> layout of fields, then all you need is:
>
> awk -v RS='&' '$12=="TPS"{print $21,$22}' file
>
> * * * * Ed.- Hide quoted text -
>
> - Show quoted text -


Thanks Janis, Grant & Ed;

Janis I implemented your code and it worked well, except that
unfortunately the records structure is not always consistent (they
don't always have a PNTSRVTP parameter!) therefore I had to reset the
tps flag after each & character was reached (denoting the end of that
record and the start of a new one).

Ed unfortunately the records are slightly different and I couldn't use
a fixed field counting system (I assume that is what you have
proposed).

Thanks for all your help!

This is the first time that I have used GAWK and I have found it to be
a most useful language in my line of work.

Can I ask some further questions?

My original code was:
$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } }
$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } }

But it doesn't work. Pseudo code would be =>
if field 1 is PNTSRVTP and field 3 is TPS then store field 2 into
variable tagname
if field 1 is DISPLAY and field 2 is == tagname then print fields 2
and 3

This algorithm would work as it is using the tagname as the primary
key and this is unique to each record. The semaphore would not be
required.

Why can't you use compund statements in the predicate field (or maybe
you can?). ie $1 ~ /DISPLAY/ && $3 ~ /TPS/ { Do something }

Can you guys explain why this code doesn't work? Any references to
free downloadable GAWK manuals would be appreciated (I already have
GAWK: Effective GAWK Programming ).

Thanks again

Craig
Reply With Quote
  #8  
Old 09-05-2008, 08:31 PM
Grant
Guest
 
Default Re: Gawk Newbe Syntax Trouble

On Fri, 5 Sep 2008 06:50:35 -0700 (PDT), industcontrols@iinet.net.au wrote:


>Can I ask some further questions?

Of course
>
>My original code was:
>$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } }
>$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } }
>
>But it doesn't work. Pseudo code would be =>
>if field 1 is PNTSRVTP and field 3 is TPS then store field 2 into
>variable tagname


$1 == "PNTSRVTP" && $3 == "TPS" { tagname = $2 }

>if field 1 is DISPLAY and field 2 is == tagname then print fields 2
>and 3


$1 == "DISPLAY" && $2 == tagname { print $2, $3 }

>Why can't you use compund statements in the predicate field (or maybe
>you can?). ie $1 ~ /DISPLAY/ && $3 ~ /TPS/ { Do something }


You can. Be aware '~' is not the same as '==', but ~ // allows regexp
magic.
>
>Can you guys explain why this code doesn't work?


Maybe you didn't hold your tongue right? ) Might be sequencing (state
machine) required to recognise your multiline fields per 'record'.

> Any references to
>free downloadable GAWK manuals would be appreciated (I already have
>GAWK: Effective GAWK Programming ).


That's about all I use, plus the odd query to this group or comp.unix.shell
if the script is shell + awk (gawk)

Grant.
--
Cats, no less liquid than their shadows, offer no angles to the wind.
Reply With Quote
  #9  
Old 09-06-2008, 02:52 PM
Jürgen Kahrs
Guest
 
Default Re: Gawk Newbe Syntax Trouble

Grant wrote:

>> Any references to
>> free downloadable GAWK manuals would be appreciated (I already have
>> GAWK: Effective GAWK Programming ).

>
> That's about all I use, plus the odd query to this group or comp.unix.shell
> if the script is shell + awk (gawk)


Just in case the OP is also interested in other gawk manuals
(and not just those that might help him with his particular
problem here). There is the gawk manual describing the TCP/IP
interface of gawk:

http://www.gnu.org/software/gawk/manual/gawkinet/

And finally, and even further away from the original question,
there is the manual of the XML extension (based on gawk,
but a separate distribution):

http://home.vrweb.de/~juergen.kahrs/...ML/xmlgawk.pdf
http://home.vrweb.de/~juergen.kahrs/...L/xmlgawk.html
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 03:03 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.