How to gram awk's regexp submatches? : awk
This is a discussion on How to gram awk's regexp submatches? within the awk forums in Programming Languages category; For example, I have something like this: $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}" awk gets some text to parse. And it match. But I want to get some part of that text (the number). In apache2 module, mod_rewrite it was easy. Submatches goes into variables ($0-$n), but here in awk the $-variables meaning something else right?...
![]() |
| | LinkBack | Thread Tools |
|
#1
| |||
| |||
| $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}" awk gets some text to parse. And it match. But I want to get some part of that text (the number). In apache2 module, mod_rewrite it was easy. Submatches goes into variables ($0-$n), but here in awk the $-variables meaning something else right? ![]() |
|
#2
| |||
| |||
| In article <9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com>, feaber <feaber@gmail.com> wrote: >For example, I have something like this: > >$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER >HERE!}" > >awk gets some text to parse. And it match. But I want to get some part >of that text (the number). > >In apache2 module, mod_rewrite it was easy. Submatches goes into >variables ($0-$n), but here in awk the $-variables meaning something >else right? ![]() Both GAWK & TAWK have extensions to do this. Standard (vanilla standard) AWK does not have anything. |
|
#3
| |||
| |||
| In article <9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com >, feaber <feaber@gmail.com> wrote: > For example, I have something like this: > > $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER > HERE!}" > > awk gets some text to parse. And it match. But I want to get some part > of that text (the number). > > In apache2 module, mod_rewrite it was easy. Submatches goes into > variables ($0-$n), but here in awk the $-variables meaning something > else right? ![]() echo "test4325363test" | awk ' match($0,/[0-9]+/) { print substr($0,RSTART,RLENGTH) } ' |
|
#4
| |||
| |||
| On 11/19/2007 4:53 PM, feaber wrote: > For example, I have something like this: > > $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER > HERE!}" > > awk gets some text to parse. And it match. But I want to get some part > of that text (the number). > > In apache2 module, mod_rewrite it was easy. Submatches goes into > variables ($0-$n), but here in awk the $-variables meaning something > else right? ![]() This might be what you're looking for (GNU awk): gawk '{print gensub(/(.*)([0-9]+)(.*)/,"\\2","")}' Ed. |
|
#5
| |||
| |||
| Thx Guys! ![]() |
|
#6
| |||
| |||
| Hi feaber, hello netlanders, On Mon, 19 Nov 2007 14:53:04 -0800, feaber wrote: > For example, I have something like this: > > $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}" > > awk gets some text to parse. And it match. But I want to get some part > of that text (the number). > > In apache2 module, mod_rewrite it was easy. Submatches goes into > variables ($0-$n), but here in awk the $-variables meaning something > else right? ![]() in awk $i means the i-th field of the input record ($0 is the whole record without record separator.) Normally a record is the same as a text line and the record separator is then a newline. POSIX awk does not support submatches inside parentheses in your sense but gawk delivers support with the additional array parameter in match() and with gensub(). (A) match()-Extension ********************* Gawk's match-extension match(s, re, a) does what you want: the submatches inside parentheses in re are assigned to the array elements a[1], a[2], ...,a[n] a[i, "start"] is the start position of a[i] with the length a[i, "length"] Please observe that (g)awk matching is greedy. After match("test4325363test", "(.*)([0-9]+)(.*)", a) a[1] is "test432536" a[2] is "3" a[3] is "test" Therefore use: echo test4325363test | gawk 'match($0, "([^0-9]*)([0-9]+)(.*)", a) { print a[2] }' to extract the number. Please, note that gawk 3.1.5 has some bugs in the match function. These should be corrected in gawk 3.1.6 (see ftp://ftp.gnu.org). (B) gensub() ************ As Ed Morton told you gensub is the other alternative with gawk. The correct use in your case is (see the greedy argument above): echo test4325363test | gawk '/[0-9]/ { print gensub(/([^0-9]*)([0-9]+)(.*)/, "\\2", "1") }' Hope I could help you, Steffen "goedel" Schuler |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| |
| ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| RegExp help | usenet | PHP | 5 | 11-11-2007 09:02 PM |
| Gram-Charlier series | usenet | Idl-pvwave | 0 | 04-16-2007 10:52 AM |
| Gram-Schmidt transformation | usenet | Idl-pvwave | 9 | 01-05-2007 03:01 PM |
| Gram-Schmidt orthogonalization | usenet | Graphics | 5 | 11-25-2006 08:28 PM |
| awk's NF in Perl | usenet | Perl | 1 | 03-05-2004 03:04 PM |
All times are GMT -5. The time now is 08:41 AM.



