How to gram awk's regexp submatches? - awk
This is a discussion on How to gram awk's regexp submatches? - awk ; For example, I have something like this:
$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
HERE!}"
awk gets some text to parse. And it match. But I want to get some part
of that text (the number).
In apache2 module, mod_rewrite ...
-
How to gram awk's regexp submatches?
For example, I have something like this:
$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
HERE!}"
awk gets some text to parse. And it match. But I want to get some part
of that text (the number).
In apache2 module, mod_rewrite it was easy. Submatches goes into
variables ($0-$n), but here in awk the $-variables meaning something
else right? 
-
Re: How to gram awk's regexp submatches?
In article <9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com>,
feaber <feaber@gmail.com> wrote:
>For example, I have something like this:
>
>$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
>HERE!}"
>
>awk gets some text to parse. And it match. But I want to get some part
>of that text (the number).
>
>In apache2 module, mod_rewrite it was easy. Submatches goes into
>variables ($0-$n), but here in awk the $-variables meaning something
>else right? 
Both GAWK & TAWK have extensions to do this. Standard (vanilla
standard) AWK does not have anything.
-
Re: How to gram awk's regexp submatches?
In article
<9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com
>,
feaber <feaber@gmail.com> wrote:
> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
> HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right? 
echo "test4325363test" | awk '
match($0,/[0-9]+/) {
print substr($0,RSTART,RLENGTH)
}
'
-
Re: How to gram awk's regexp submatches?
On 11/19/2007 4:53 PM, feaber wrote:
> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
> HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right? 
This might be what you're looking for (GNU awk):
gawk '{print gensub(/(.*)([0-9]+)(.*)/,"\\2","")}'
Ed.
-
Re: How to gram awk's regexp submatches?
Thx Guys! 
-
Re: How to gram awk's regexp submatches?
Hi feaber, hello netlanders,
On Mon, 19 Nov 2007 14:53:04 -0800, feaber wrote:
> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right? 
in awk $i means the i-th field of the input record ($0 is the whole
record without record separator.) Normally a record is the same as a text
line and the record separator is then a newline.
POSIX awk does not support submatches inside parentheses in your sense
but gawk delivers support with the additional array parameter in match()
and with gensub().
(A) match()-Extension
*********************
Gawk's match-extension match(s, re, a) does what you want:
the submatches inside parentheses in re are assigned to the array
elements
a[1], a[2], ...,a[n]
a[i, "start"] is the start position of a[i] with the length
a[i, "length"]
Please observe that (g)awk matching is greedy.
After match("test4325363test", "(.*)([0-9]+)(.*)", a)
a[1] is "test432536"
a[2] is "3"
a[3] is "test"
Therefore use:
echo test4325363test |
gawk 'match($0, "([^0-9]*)([0-9]+)(.*)", a) { print a[2] }'
to extract the number.
Please, note that gawk 3.1.5 has some bugs in the match function.
These should be corrected in gawk 3.1.6 (see ftp://ftp.gnu.org).
(B) gensub()
************
As Ed Morton told you gensub is the other alternative with gawk.
The correct use in your case is (see the greedy argument above):
echo test4325363test |
gawk '/[0-9]/ { print gensub(/([^0-9]*)([0-9]+)(.*)/, "\\2", "1") }'
Hope I could help you,
Steffen "goedel" Schuler
Similar Threads
-
By Application Development in forum PHP
Replies: 5
Last Post: 11-11-2007, 09:02 PM
-
By Application Development in forum Idl-pvwave
Replies: 0
Last Post: 04-16-2007, 10:52 AM
-
By Application Development in forum Idl-pvwave
Replies: 9
Last Post: 01-05-2007, 03:01 PM
-
By Application Development in forum Graphics
Replies: 5
Last Post: 11-25-2006, 08:28 PM
-
By Application Development in forum Perl
Replies: 1
Last Post: 03-05-2004, 03:04 PM