Objectmix
Tags Register Mark Forums Read

How to gram awk's regexp submatches? : awk

This is a discussion on How to gram awk's regexp submatches? within the awk forums in Programming Languages category; For example, I have something like this: $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}" awk gets some text to parse. And it match. But I want to get some part of that text (the number). In apache2 module, mod_rewrite it was easy. Submatches goes into variables ($0-$n), but here in awk the $-variables meaning something else right?...


Object Mix > Programming Languages > awk > How to gram awk's regexp submatches?

Reply

 

LinkBack Thread Tools
  #1  
Old 11-19-2007, 05:53 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default How to gram awk's regexp submatches?

For example, I have something like this:

$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
HERE!}"

awk gets some text to parse. And it match. But I want to get some part
of that text (the number).

In apache2 module, mod_rewrite it was easy. Submatches goes into
variables ($0-$n), but here in awk the $-variables meaning something
else right?
  #2  
Old 11-19-2007, 05:59 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: How to gram awk's regexp submatches?

In article <9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com>,
feaber <feaber@gmail.com> wrote:
>For example, I have something like this:
>
>$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
>HERE!}"
>
>awk gets some text to parse. And it match. But I want to get some part
>of that text (the number).
>
>In apache2 module, mod_rewrite it was easy. Submatches goes into
>variables ($0-$n), but here in awk the $-variables meaning something
>else right?


Both GAWK & TAWK have extensions to do this. Standard (vanilla
standard) AWK does not have anything.

  #3  
Old 11-19-2007, 07:24 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: How to gram awk's regexp submatches?

In article
<9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com
>,

feaber <feaber@gmail.com> wrote:

> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
> HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right?


echo "test4325363test" | awk '
match($0,/[0-9]+/) {
print substr($0,RSTART,RLENGTH)
}
'
  #4  
Old 11-19-2007, 10:01 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: How to gram awk's regexp submatches?



On 11/19/2007 4:53 PM, feaber wrote:
> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
> HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right?


This might be what you're looking for (GNU awk):

gawk '{print gensub(/(.*)([0-9]+)(.*)/,"\\2","")}'

Ed.

  #5  
Old 11-20-2007, 06:00 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: How to gram awk's regexp submatches?

Thx Guys!
  #6  
Old 11-20-2007, 06:58 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: How to gram awk's regexp submatches?

Hi feaber, hello netlanders,

On Mon, 19 Nov 2007 14:53:04 -0800, feaber wrote:

> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right?


in awk $i means the i-th field of the input record ($0 is the whole
record without record separator.) Normally a record is the same as a text
line and the record separator is then a newline.

POSIX awk does not support submatches inside parentheses in your sense
but gawk delivers support with the additional array parameter in match()
and with gensub().

(A) match()-Extension
*********************

Gawk's match-extension match(s, re, a) does what you want:

the submatches inside parentheses in re are assigned to the array
elements

a[1], a[2], ...,a[n]

a[i, "start"] is the start position of a[i] with the length
a[i, "length"]

Please observe that (g)awk matching is greedy.
After match("test4325363test", "(.*)([0-9]+)(.*)", a)

a[1] is "test432536"
a[2] is "3"
a[3] is "test"

Therefore use:

echo test4325363test |
gawk 'match($0, "([^0-9]*)([0-9]+)(.*)", a) { print a[2] }'

to extract the number.

Please, note that gawk 3.1.5 has some bugs in the match function.
These should be corrected in gawk 3.1.6 (see ftp://ftp.gnu.org).

(B) gensub()
************

As Ed Morton told you gensub is the other alternative with gawk.
The correct use in your case is (see the greedy argument above):

echo test4325363test |
gawk '/[0-9]/ { print gensub(/([^0-9]*)([0-9]+)(.*)/, "\\2", "1") }'


Hope I could help you,

Steffen "goedel" Schuler
Reply

Thread Tools


Similar Threads

Thread Thread Starter Forum Replies Last Post
RegExp help usenet PHP 5 11-11-2007 09:02 PM
Gram-Charlier series usenet Idl-pvwave 0 04-16-2007 10:52 AM
Gram-Schmidt transformation usenet Idl-pvwave 9 01-05-2007 03:01 PM
Gram-Schmidt orthogonalization usenet Graphics 5 11-25-2006 08:28 PM
awk's NF in Perl usenet Perl 1 03-05-2004 03:04 PM


All times are GMT -5. The time now is 08:41 AM.

Managed by Infnx Pvt Ltd.