How to gram awk's regexp submatches? - awk

This is a discussion on How to gram awk's regexp submatches? - awk ; For example, I have something like this: $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}" awk gets some text to parse. And it match. But I want to get some part of that text (the number). In apache2 module, mod_rewrite ...

+ Reply to Thread
Results 1 to 6 of 6

How to gram awk's regexp submatches?

  1. Default How to gram awk's regexp submatches?

    For example, I have something like this:

    $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
    HERE!}"

    awk gets some text to parse. And it match. But I want to get some part
    of that text (the number).

    In apache2 module, mod_rewrite it was easy. Submatches goes into
    variables ($0-$n), but here in awk the $-variables meaning something
    else right?

  2. Default Re: How to gram awk's regexp submatches?

    In article <9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com>,
    feaber <feaber@gmail.com> wrote:
    >For example, I have something like this:
    >
    >$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
    >HERE!}"
    >
    >awk gets some text to parse. And it match. But I want to get some part
    >of that text (the number).
    >
    >In apache2 module, mod_rewrite it was easy. Submatches goes into
    >variables ($0-$n), but here in awk the $-variables meaning something
    >else right?


    Both GAWK & TAWK have extensions to do this. Standard (vanilla
    standard) AWK does not have anything.


  3. Default Re: How to gram awk's regexp submatches?

    In article
    <9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com
    >,

    feaber <feaber@gmail.com> wrote:

    > For example, I have something like this:
    >
    > $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
    > HERE!}"
    >
    > awk gets some text to parse. And it match. But I want to get some part
    > of that text (the number).
    >
    > In apache2 module, mod_rewrite it was easy. Submatches goes into
    > variables ($0-$n), but here in awk the $-variables meaning something
    > else right?


    echo "test4325363test" | awk '
    match($0,/[0-9]+/) {
    print substr($0,RSTART,RLENGTH)
    }
    '

  4. Default Re: How to gram awk's regexp submatches?



    On 11/19/2007 4:53 PM, feaber wrote:
    > For example, I have something like this:
    >
    > $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
    > HERE!}"
    >
    > awk gets some text to parse. And it match. But I want to get some part
    > of that text (the number).
    >
    > In apache2 module, mod_rewrite it was easy. Submatches goes into
    > variables ($0-$n), but here in awk the $-variables meaning something
    > else right?


    This might be what you're looking for (GNU awk):

    gawk '{print gensub(/(.*)([0-9]+)(.*)/,"\\2","")}'

    Ed.


  5. Default Re: How to gram awk's regexp submatches?

    Thx Guys!

  6. Default Re: How to gram awk's regexp submatches?

    Hi feaber, hello netlanders,

    On Mon, 19 Nov 2007 14:53:04 -0800, feaber wrote:

    > For example, I have something like this:
    >
    > $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}"
    >
    > awk gets some text to parse. And it match. But I want to get some part
    > of that text (the number).
    >
    > In apache2 module, mod_rewrite it was easy. Submatches goes into
    > variables ($0-$n), but here in awk the $-variables meaning something
    > else right?


    in awk $i means the i-th field of the input record ($0 is the whole
    record without record separator.) Normally a record is the same as a text
    line and the record separator is then a newline.

    POSIX awk does not support submatches inside parentheses in your sense
    but gawk delivers support with the additional array parameter in match()
    and with gensub().

    (A) match()-Extension
    *********************

    Gawk's match-extension match(s, re, a) does what you want:

    the submatches inside parentheses in re are assigned to the array
    elements

    a[1], a[2], ...,a[n]

    a[i, "start"] is the start position of a[i] with the length
    a[i, "length"]

    Please observe that (g)awk matching is greedy.
    After match("test4325363test", "(.*)([0-9]+)(.*)", a)

    a[1] is "test432536"
    a[2] is "3"
    a[3] is "test"

    Therefore use:

    echo test4325363test |
    gawk 'match($0, "([^0-9]*)([0-9]+)(.*)", a) { print a[2] }'

    to extract the number.

    Please, note that gawk 3.1.5 has some bugs in the match function.
    These should be corrected in gawk 3.1.6 (see ftp://ftp.gnu.org).

    (B) gensub()
    ************

    As Ed Morton told you gensub is the other alternative with gawk.
    The correct use in your case is (see the greedy argument above):

    echo test4325363test |
    gawk '/[0-9]/ { print gensub(/([^0-9]*)([0-9]+)(.*)/, "\\2", "1") }'


    Hope I could help you,

    Steffen "goedel" Schuler

+ Reply to Thread

Similar Threads

  1. RegExp help
    By Application Development in forum PHP
    Replies: 5
    Last Post: 11-11-2007, 09:02 PM
  2. Gram-Charlier series
    By Application Development in forum Idl-pvwave
    Replies: 0
    Last Post: 04-16-2007, 10:52 AM
  3. Gram-Schmidt transformation
    By Application Development in forum Idl-pvwave
    Replies: 9
    Last Post: 01-05-2007, 03:01 PM
  4. Gram-Schmidt orthogonalization
    By Application Development in forum Graphics
    Replies: 5
    Last Post: 11-25-2006, 08:28 PM
  5. awk's NF in Perl
    By Application Development in forum Perl
    Replies: 1
    Last Post: 03-05-2004, 03:04 PM