Regular expression to match regular expressions? - awk

This is a discussion on Regular expression to match regular expressions? - awk ; An expression parser (in TAWK) accepts textual regular expressions, which are then converted to internal form by TAWK's regex() function. If the regular expression is ill-formed, then TAWK's run-time package issues an error message on the console (rather than, say, ...

+ Reply to Thread
Results 1 to 7 of 7

Regular expression to match regular expressions?

  1. Default Regular expression to match regular expressions?

    An expression parser (in TAWK) accepts textual regular expressions, which
    are then converted to internal form by TAWK's regex() function.

    If the regular expression is ill-formed, then TAWK's run-time package issues
    an error message on the console (rather than, say, returning a null value
    from regex()).

    Is there a cheap way to verify that a regular expression is properly formed
    *before* feeding it to regex(), so as to avoid that error message?

    A first thought:

    regex = /^\/.+\/i?$/

    - starts and ends with '/' or '/i' (i-gnore case), must be non-null

    A second thought:

    regex = /^\/([^/\\]|\\.)+\/i?$/

    - accept as part of the pattern any single character except '/' or '\', and
    any double character that starts with '\' (escape)
    - BTW, this variant works pretty well for literal strings with embedded
    escape sequences, including null strings. It's clever enough to recognize
    only odd numbers of escape chars as properly escaping a double quote:

    regex = /^"([^"\\]|\\.)*"$/

    A third thought:

    regex = /^\/([^/\\[]|\\.|\[.+\])+\/i?$/

    - which is designed to make sure character classes have a closing bracket
    (which is actually the single mistake I make most often, and the one that
    prompted this line of thought).

    I'm pretty sure there's more to be added, though. So, is there a reasonably
    compact regular expression that will match all properly formed regular
    expressions and fail to match ill-formed ones?

    - Anton Treuenfels



  2. Default Re: Regular expression to match regular expressions?

    > Is there a cheap way to verify that a regular expression is properly formed
    > *before* feeding it to regex(), so as to avoid that error message?


    The language of regular expressions is itself not regular. It is a cfg.
    You can have (((((a))))). How can you manage to match this using
    regular expression.

    Great thought b.t.w.


  3. Default Re: Regular expression to match regular expressions?

    In article
    <1155197771.534028.245820@75g2000cwc.googlegroups.com>,
    "quarkLore" <agarwal.prateek@> wrote:

    > > Is there a cheap way to verify that a regular expression is properly formed
    > > *before* feeding it to regex(), so as to avoid that error message?

    >
    > The language of regular expressions is itself not regular. It is a cfg.
    > You can have (((((a))))). How can you manage to match this using
    > regular expression.
    >
    > Great thought b.t.w.


    This is comp.lang.awk, so

    echo '(((((a)))))' | awk '/(((((a)))))/ { print }'

    works very nicely.

    Your mileage may vary in other languages.

    Bob Harris

  4. Default Re: Regular expression to match regular expressions?

    Bob Harris escreveu:
    > In article
    > <1155197771.534028.245820@75g2000cwc.googlegroups.com>,
    > "quarkLore" <agarwal.prateek@> wrote:
    >
    >
    >>>Is there a cheap way to verify that a regular expression is properly formed
    >>>*before* feeding it to regex(), so as to avoid that error message?

    >>
    >>The language of regular expressions is itself not regular. It is a cfg.
    >>You can have (((((a))))). How can you manage to match this using
    >>regular expression.
    >>
    >>Great thought b.t.w.

    >
    >
    > This is comp.lang.awk, so
    >
    > echo '(((((a)))))' | awk '/(((((a)))))/ { print }'
    >
    > works very nicely.

    I think the issue is not to write a constant expression that matches
    "five left parentesis a five right parentesis" but to write a regular
    expression that could match something like "a given number of left
    parentesis <some stuff> the same number of right parentesis".

    This is not achieviable with REs.

  5. Default Re: Regular expression to match regular expressions?

    Cesar Rabak wrote:
    > Bob Harris escreveu:
    > > In article
    > > <1155197771.534028.245820@75g2000cwc.googlegroups.com>,
    > > "quarkLore" <agarwal.prateek@> wrote:
    > >
    > >
    > >>>Is there a cheap way to verify that a regular expression is properly formed
    > >>>*before* feeding it to regex(), so as to avoid that error message?
    > >>
    > >>The language of regular expressions is itself not regular. It is a cfg.
    > >>You can have (((((a))))). How can you manage to match this using
    > >>regular expression.
    > >>
    > >>Great thought b.t.w.

    > >
    > >
    > > This is comp.lang.awk, so
    > >
    > > echo '(((((a)))))' | awk '/(((((a)))))/ { print }'
    > >
    > > works very nicely.

    > I think the issue is not to write a constant expression that matches
    > "five left parentesis a five right parentesis" but to write a regular
    > expression that could match something like "a given number of left
    > parentesis <some stuff> the same number of right parentesis".
    >
    > This is not achieviable with REs.


    I guess the OP wants to know if it's possible to initialize a RegExp
    object before it can be used in any regex related expressions.

    So in many OO programming languages, you can *new* a RegExp object, or
    use qr/..../ in Perl, and then use this object multiple times with no
    need to worry about the same regex syntax.

    But how to do this in awk? or is it possible to do this with awk? If
    this is what OP wanted, then I also want to know the answer..:-) Thanks
    for any furtuer information..

    Xicheng


  6. Default Re: Regular expression to match regular expressions?

    In article <t8wCg.2267$Qf.723@newsread2.news.pas.earthlink.net>,
    Anton Treuenfels <atreuenfels@earthlink.net> wrote:

    [in tawk]

    % Is there a cheap way to verify that a regular expression is properly formed
    % *before* feeding it to regex(), so as to avoid that error message?

    You don't mention your OS, so I'll assume it's Solaris. I believe that
    tawk has a mechanism for interfacing to C library routines, but I don't
    know the details. Suppose it does, and that you can pass a buffer pointer
    the size of a regex_t to it., then you could load regcomp() from libc,
    and call it something like this:

    rc = regcomp(big_enough_buffer, expression_to_test, REG_EXTENDED+REG_NOSUB);

    then check the return code of regcomp(). You'd have to look at the
    /usr/include/regex.h to find the values for REG_EXTENDED and REG_NOSUB, and
    the size of the first argument.

    Suppose that doesn't work or doesn't seem cheap. You could do something like

    rc = system("awk '/" pat "/' /dev/null 2> /dev/null")

    then check rc (non-zero suggests a bad pattern). Note that you'll have
    to escape any /s which appear in pat, and do something more drastic with 's.
    --

    Patrick TJ McPhee
    North York Canada
    ptjm@interlog.com

  7. Default Re: Regular expression to match regular expressions?


    Bob Harris wrote:
    > In article
    > <1155197771.534028.245820@75g2000cwc.googlegroups.com>,
    > "quarkLore" <agarwal.prateek@> wrote:
    >
    > > > Is there a cheap way to verify that a regular expression is properly formed
    > > > *before* feeding it to regex(), so as to avoid that error message?

    > >
    > > The language of regular expressions is itself not regular. It is a cfg.
    > > You can have (((((a))))). How can you manage to match this using
    > > regular expression.
    > >
    > > Great thought b.t.w.

    >
    > This is comp.lang.awk, so
    >
    > echo '(((((a)))))' | awk '/(((((a)))))/ { print }'


    this is just matching anything with 'a' in it.

    echo 'blah' | awk '/(((((a)))))/ { print }'


    >
    > works very nicely.
    >
    > Your mileage may vary in other languages.
    >
    > Bob Harris



+ Reply to Thread

Similar Threads

  1. php regular expression doesn't match
    By Application Development in forum PHP
    Replies: 13
    Last Post: 10-26-2007, 04:31 AM
  2. Replies: 0
    Last Post: 10-17-2007, 02:03 AM
  3. Replies: 9
    Last Post: 08-24-2007, 06:25 PM
  4. Replies: 0
    Last Post: 05-22-2007, 02:03 AM
  5. Replies: 6
    Last Post: 03-03-2004, 04:42 PM