| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I have been struggling with what must be a simple GAWK task. The text below is basically one record of many in a file that I want to extract information from: & DEL FI0454C ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW PNTSRVTP FI0454C TPS NRA_TPS PARENT FI0454C NRA_T1 ENTNAM FI0454C FI0454C DISPLAY FI0454C NRA_WELLPRESS.htm & I want to extract the 2nd and 3rd fields from the DISPLAY record IF the 3rd record of the PNTSRVTP record is TPS. Below is my code: #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } } $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } } #$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } } $1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } } I tried checking for TPS and storing the name field and then when you find the next DISPLAY field that matches the tagname store the $2 & $3 fields. I also tried using a boolean expression to perform a similar duty but both methods leave the same result. I think that I don't understand how to use the IF statement. I have read the GAWK: Effective GAWK Programming manual but I cannot see where I am going wrong. I have a few questions: 1) It appears that you can only use an IF statement inside braces, as part of an action, is this correct? 2) Is there a method where you can conditionally look ahead a number of records? Thanks in advance for any suggestions. Craig |
|
#2
| |||
| |||
| industcontrols@iinet.net.au wrote: > I have been struggling with what must be a simple GAWK task. > > The text below is basically one record of many in a file that I want > to extract information from: > & > DEL FI0454C > ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW > PNTSRVTP FI0454C TPS NRA_TPS > PARENT FI0454C NRA_T1 > ENTNAM FI0454C FI0454C > DISPLAY FI0454C NRA_WELLPRESS.htm > & > > I want to extract the 2nd and 3rd fields from the DISPLAY record IF > the 3rd record of the PNTSRVTP record is TPS. It's not clear whether the '&' are record delimiters or just meta characters of your posting. Is every record always complete and has all the fields present? Try this... $1 == "PNTSRVTP" { tps = ($3 == "TPS") } $1 == "DISPLAY" && tps { print $2, $3 } > > Below is my code: > > #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } } > $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } } > > #$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } } > $1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } } > > I tried checking for TPS and storing the name field and then when you > find the next DISPLAY field that matches the tagname store the $2 & $3 > fields. I also tried using a boolean expression to perform a similar > duty but both methods leave the same result. > > I think that I don't understand how to use the IF statement. I have > read the GAWK: Effective GAWK Programming manual but I cannot see > where I am going wrong. > > I have a few questions: > 1) It appears that you can only use an IF statement inside braces, as > part of an action, is this correct? There's the condition part and the action part in awk programs; condition { action } The condition is already a predicate and doesn't need (doesn't allow) an 'if' statement. The action part is like other programming languages and supports the (explicit) 'if' statement. > 2) Is there a method where you can conditionally look ahead a number > of records? Yes, but it depends on the record structure and the task whether that is the way to go. You can redefine the RS="" and FS="\n" then you have each block as $0 and every line available as $1, $2, ... $NF. But in your case you would then have to split your lines again to obtain the individual fields (because $i now identifies complete lines instead of fields on the line), so the above suggested solution seems advantageous. Janis > > Thanks in advance for any suggestions. > > Craig |
|
#3
| |||
| |||
| On Thu, 04 Sep 2008 18:26:36 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote: >industcontrols@iinet.net.au wrote: >> I have been struggling with what must be a simple GAWK task. >> >> The text below is basically one record of many in a file that I want >> to extract information from: >> & >> DEL FI0454C >> ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW >> PNTSRVTP FI0454C TPS NRA_TPS >> PARENT FI0454C NRA_T1 >> ENTNAM FI0454C FI0454C >> DISPLAY FI0454C NRA_WELLPRESS.htm >> & >> >> I want to extract the 2nd and 3rd fields from the DISPLAY record IF >> the 3rd record of the PNTSRVTP record is TPS. > >It's not clear whether the '&' are record delimiters or just meta >characters of your posting. Is every record always complete and has >all the fields present? Try this... > > $1 == "PNTSRVTP" { tps = ($3 == "TPS") } > $1 == "DISPLAY" && tps { print $2, $3 } I think you may need to disarm 'tps' after use, ready for the next record group, otherwise it makes little sense to have that trigger? $1 == "PNTSRVTP" { tps = ($3 == "TPS") } $1 == "DISPLAY" && tps { print $2, $3; tps = "" } Grant. -- Cats, no less liquid than their shadows, offer no angles to the wind. |
|
#4
| |||
| |||
| Grant wrote: > On Thu, 04 Sep 2008 18:26:36 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote: > > >>industcontrols@iinet.net.au wrote: >> >>>I have been struggling with what must be a simple GAWK task. >>> >>>The text below is basically one record of many in a file that I want >>>to extract information from: >>>& >>>DEL FI0454C >>>ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW >>>PNTSRVTP FI0454C TPS NRA_TPS >>>PARENT FI0454C NRA_T1 >>>ENTNAM FI0454C FI0454C >>>DISPLAY FI0454C NRA_WELLPRESS.htm >>>& >>> >>>I want to extract the 2nd and 3rd fields from the DISPLAY record IF >>>the 3rd record of the PNTSRVTP record is TPS. >> >>It's not clear whether the '&' are record delimiters or just meta >>characters of your posting. Is every record always complete and has >>all the fields present? Try this... >> >> $1 == "PNTSRVTP" { tps = ($3 == "TPS") } >> $1 == "DISPLAY" && tps { print $2, $3 } > > > I think you may need to disarm 'tps' after use, ready for the next > record group, otherwise it makes little sense to have that trigger? No, because the trigger considers whether "TPS" is in the third field, and - that's why I asked "Is every record always complete" - if "PNTSRVTP" is always present you just have to trigger according to the value of $3 in the "PNTSRVTP" entry, and you *always* have to consider that record then. Janis > > $1 == "PNTSRVTP" { tps = ($3 == "TPS") } > $1 == "DISPLAY" && tps { print $2, $3; tps = "" } > > Grant. |
|
#5
| |||
| |||
| On Fri, 05 Sep 2008 01:19:07 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote: >Grant wrote: >> On Thu, 04 Sep 2008 18:26:36 +0200, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote: >> >> >>>industcontrols@iinet.net.au wrote: >>> >>>>I have been struggling with what must be a simple GAWK task. >>>> >>>>The text below is basically one record of many in a file that I want >>>>to extract information from: >>>>& >>>>DEL FI0454C >>>>ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW >>>>PNTSRVTP FI0454C TPS NRA_TPS >>>>PARENT FI0454C NRA_T1 >>>>ENTNAM FI0454C FI0454C >>>>DISPLAY FI0454C NRA_WELLPRESS.htm >>>>& >>>> >>>>I want to extract the 2nd and 3rd fields from the DISPLAY record IF >>>>the 3rd record of the PNTSRVTP record is TPS. >>> >>>It's not clear whether the '&' are record delimiters or just meta >>>characters of your posting. Is every record always complete and has >>>all the fields present? Try this... >>> >>> $1 == "PNTSRVTP" { tps = ($3 == "TPS") } >>> $1 == "DISPLAY" && tps { print $2, $3 } >> >> >> I think you may need to disarm 'tps' after use, ready for the next >> record group, otherwise it makes little sense to have that trigger? > >No, because the trigger considers whether "TPS" is in the third field, >and - that's why I asked "Is every record always complete" - if "PNTSRVTP" >is always present you just have to trigger according to the value of $3 in >the "PNTSRVTP" entry, and you *always* have to consider that record then. Sorry, I missed the boolean assignment Dunno what I was thinking nowI look at the thing again. Grant. -- Cats, no less liquid than their shadows, offer no angles to the wind. |
|
#6
| |||
| |||
| On 9/4/2008 9:49 AM, industcontrols@iinet.net.au wrote: > I have been struggling with what must be a simple GAWK task. > > The text below is basically one record of many in a file that I want > to extract information from: > & > DEL FI0454C > ADD FI0454C PSA46771 PEN3B CORRECTED WH FLOW > PNTSRVTP FI0454C TPS NRA_TPS > PARENT FI0454C NRA_T1 > ENTNAM FI0454C FI0454C > DISPLAY FI0454C NRA_WELLPRESS.htm > & > > I want to extract the 2nd and 3rd fields from the DISPLAY record IF > the 3rd record of the PNTSRVTP record is TPS. > > Below is my code: > > #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } } > $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } } > > #$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } } > $1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } } > > I tried checking for TPS and storing the name field and then when you > find the next DISPLAY field that matches the tagname store the $2 & $3 > fields. I also tried using a boolean expression to perform a similar > duty but both methods leave the same result. > > I think that I don't understand how to use the IF statement. I have > read the GAWK: Effective GAWK Programming manual but I cannot see > where I am going wrong. > > I have a few questions: > 1) It appears that you can only use an IF statement inside braces, as > part of an action, is this correct? Yes > 2) Is there a method where you can conditionally look ahead a number > of records? No > Thanks in advance for any suggestions. > > Craig If the "&"s are record separators, and each record has the same number and layout of fields, then all you need is: awk -v RS='&' '$12=="TPS"{print $21,$22}' file Ed. |
|
#7
| |||
| |||
| On Sep 5, 8:03*pm, Ed Morton <mor...@lsupcaemnt.com> wrote: > On 9/4/2008 9:49 AM, industcontr...@iinet.net.au wrote: > > > > > > > I have been struggling with what must be a simple GAWK task. > > > The text below is basically one record of many in a file that I want > > to extract information from: > > & > > DEL * * *FI0454C > > ADD * * *FI0454C PSA46771 PEN3B CORRECTED WH FLOW > > PNTSRVTP FI0454C TPS NRA_TPS > > PARENT * FI0454C NRA_T1 > > ENTNAM * FI0454C FI0454C > > DISPLAY *FI0454C NRA_WELLPRESS.htm > > & > > > I want to extract the 2nd and 3rd fields from the DISPLAY record IF > > the 3rd record of the PNTSRVTP record is TPS. > > > Below is my code: > > > #$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } } > > $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tpsTag = 1 } } > > > #$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } } > > $1 ~ /DISPLAY/ { if (tpsTag>0) { print $2, $3; tpsTag = ! tpsTag } } > > > I tried checking for TPS and storing the name field and then when you > > find the next DISPLAY field that matches the tagname store the $2 & $3 > > fields. I also tried using a boolean expression to perform a similar > > duty but both methods leave the same result. > > > I think that I don't understand how to use the IF statement. I have > > read the GAWK: Effective GAWK Programming manual but I cannot see > > where I am going wrong. > > > I have a few questions: > > 1) It appears that you can only use an IF statement inside braces, as > > part of an action, is this correct? > > Yes > > > 2) Is there a method where you can conditionally look ahead a number > > of records? > > No > > > Thanks in advance for any suggestions. > > > Craig > > If the "&"s are record separators, and each record has the same number and > layout of fields, then all you need is: > > awk -v RS='&' '$12=="TPS"{print $21,$22}' file > > * * * * Ed.- Hide quoted text - > > - Show quoted text - Thanks Janis, Grant & Ed; Janis I implemented your code and it worked well, except that unfortunately the records structure is not always consistent (they don't always have a PNTSRVTP parameter!) therefore I had to reset the tps flag after each & character was reached (denoting the end of that record and the start of a new one). Ed unfortunately the records are slightly different and I couldn't use a fixed field counting system (I assume that is what you have proposed). Thanks for all your help! This is the first time that I have used GAWK and I have found it to be a most useful language in my line of work. Can I ask some further questions? My original code was: $1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } } $1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } } But it doesn't work. Pseudo code would be => if field 1 is PNTSRVTP and field 3 is TPS then store field 2 into variable tagname if field 1 is DISPLAY and field 2 is == tagname then print fields 2 and 3 This algorithm would work as it is using the tagname as the primary key and this is unique to each record. The semaphore would not be required. Why can't you use compund statements in the predicate field (or maybe you can?). ie $1 ~ /DISPLAY/ && $3 ~ /TPS/ { Do something } Can you guys explain why this code doesn't work? Any references to free downloadable GAWK manuals would be appreciated (I already have GAWK: Effective GAWK Programming ). Thanks again Craig |
|
#8
| |||
| |||
| On Fri, 5 Sep 2008 06:50:35 -0700 (PDT), industcontrols@iinet.net.au wrote: >Can I ask some further questions? Of course ![]() > >My original code was: >$1 ~ /PNTSRVTP/ { if ($3 == "TPS") { tagName = $2 } } >$1 ~ /DISPLAY/ { if ($2 == tagName) { print $2, $3 } } > >But it doesn't work. Pseudo code would be => >if field 1 is PNTSRVTP and field 3 is TPS then store field 2 into >variable tagname $1 == "PNTSRVTP" && $3 == "TPS" { tagname = $2 } >if field 1 is DISPLAY and field 2 is == tagname then print fields 2 >and 3 $1 == "DISPLAY" && $2 == tagname { print $2, $3 } >Why can't you use compund statements in the predicate field (or maybe >you can?). ie $1 ~ /DISPLAY/ && $3 ~ /TPS/ { Do something } You can. Be aware '~' is not the same as '==', but ~ // allows regexp magic. > >Can you guys explain why this code doesn't work? Maybe you didn't hold your tongue right? ) Might be sequencing (statemachine) required to recognise your multiline fields per 'record'. > Any references to >free downloadable GAWK manuals would be appreciated (I already have >GAWK: Effective GAWK Programming ). That's about all I use, plus the odd query to this group or comp.unix.shell if the script is shell + awk (gawk) Grant. -- Cats, no less liquid than their shadows, offer no angles to the wind. |
|
#9
| |||
| |||
| Grant wrote: >> Any references to >> free downloadable GAWK manuals would be appreciated (I already have >> GAWK: Effective GAWK Programming ). > > That's about all I use, plus the odd query to this group or comp.unix.shell > if the script is shell + awk (gawk) Just in case the OP is also interested in other gawk manuals (and not just those that might help him with his particular problem here). There is the gawk manual describing the TCP/IP interface of gawk: http://www.gnu.org/software/gawk/manual/gawkinet/ And finally, and even further away from the original question, there is the manual of the XML extension (based on gawk, but a separate distribution): http://home.vrweb.de/~juergen.kahrs/...ML/xmlgawk.pdf http://home.vrweb.de/~juergen.kahrs/...L/xmlgawk.html |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.