Regular expression to match regular expressions? - awk
This is a discussion on Regular expression to match regular expressions? - awk ; An expression parser (in TAWK) accepts textual regular expressions, which
are then converted to internal form by TAWK's regex() function.
If the regular expression is ill-formed, then TAWK's run-time package issues
an error message on the console (rather than, say, ...
-
Regular expression to match regular expressions?
An expression parser (in TAWK) accepts textual regular expressions, which
are then converted to internal form by TAWK's regex() function.
If the regular expression is ill-formed, then TAWK's run-time package issues
an error message on the console (rather than, say, returning a null value
from regex()).
Is there a cheap way to verify that a regular expression is properly formed
*before* feeding it to regex(), so as to avoid that error message?
A first thought:
regex = /^\/.+\/i?$/
- starts and ends with '/' or '/i' (i-gnore case), must be non-null
A second thought:
regex = /^\/([^/\\]|\\.)+\/i?$/
- accept as part of the pattern any single character except '/' or '\', and
any double character that starts with '\' (escape)
- BTW, this variant works pretty well for literal strings with embedded
escape sequences, including null strings. It's clever enough to recognize
only odd numbers of escape chars as properly escaping a double quote:
regex = /^"([^"\\]|\\.)*"$/
A third thought:
regex = /^\/([^/\\[]|\\.|\[.+\])+\/i?$/
- which is designed to make sure character classes have a closing bracket
(which is actually the single mistake I make most often, and the one that
prompted this line of thought).
I'm pretty sure there's more to be added, though. So, is there a reasonably
compact regular expression that will match all properly formed regular
expressions and fail to match ill-formed ones?
- Anton Treuenfels
-
Re: Regular expression to match regular expressions?
> Is there a cheap way to verify that a regular expression is properly formed
> *before* feeding it to regex(), so as to avoid that error message?
The language of regular expressions is itself not regular. It is a cfg.
You can have (((((a))))). How can you manage to match this using
regular expression.
Great thought b.t.w.
-
Re: Regular expression to match regular expressions?
In article
<1155197771.534028.245820@75g2000cwc.googlegroups.com>,
"quarkLore" <agarwal.prateek@> wrote:
> > Is there a cheap way to verify that a regular expression is properly formed
> > *before* feeding it to regex(), so as to avoid that error message?
>
> The language of regular expressions is itself not regular. It is a cfg.
> You can have (((((a))))). How can you manage to match this using
> regular expression.
>
> Great thought b.t.w.
This is comp.lang.awk, so
echo '(((((a)))))' | awk '/(((((a)))))/ { print }'
works very nicely.
Your mileage may vary in other languages.
Bob Harris
-
Re: Regular expression to match regular expressions?
Bob Harris escreveu:
> In article
> <1155197771.534028.245820@75g2000cwc.googlegroups.com>,
> "quarkLore" <agarwal.prateek@> wrote:
>
>
>>>Is there a cheap way to verify that a regular expression is properly formed
>>>*before* feeding it to regex(), so as to avoid that error message?
>>
>>The language of regular expressions is itself not regular. It is a cfg.
>>You can have (((((a))))). How can you manage to match this using
>>regular expression.
>>
>>Great thought b.t.w.
>
>
> This is comp.lang.awk, so
>
> echo '(((((a)))))' | awk '/(((((a)))))/ { print }'
>
> works very nicely.
I think the issue is not to write a constant expression that matches
"five left parentesis a five right parentesis" but to write a regular
expression that could match something like "a given number of left
parentesis <some stuff> the same number of right parentesis".
This is not achieviable with REs.
-
Re: Regular expression to match regular expressions?
Cesar Rabak wrote:
> Bob Harris escreveu:
> > In article
> > <1155197771.534028.245820@75g2000cwc.googlegroups.com>,
> > "quarkLore" <agarwal.prateek@> wrote:
> >
> >
> >>>Is there a cheap way to verify that a regular expression is properly formed
> >>>*before* feeding it to regex(), so as to avoid that error message?
> >>
> >>The language of regular expressions is itself not regular. It is a cfg.
> >>You can have (((((a))))). How can you manage to match this using
> >>regular expression.
> >>
> >>Great thought b.t.w.
> >
> >
> > This is comp.lang.awk, so
> >
> > echo '(((((a)))))' | awk '/(((((a)))))/ { print }'
> >
> > works very nicely.
> I think the issue is not to write a constant expression that matches
> "five left parentesis a five right parentesis" but to write a regular
> expression that could match something like "a given number of left
> parentesis <some stuff> the same number of right parentesis".
>
> This is not achieviable with REs.
I guess the OP wants to know if it's possible to initialize a RegExp
object before it can be used in any regex related expressions.
So in many OO programming languages, you can *new* a RegExp object, or
use qr/..../ in Perl, and then use this object multiple times with no
need to worry about the same regex syntax.
But how to do this in awk? or is it possible to do this with awk? If
this is what OP wanted, then I also want to know the answer..:-) Thanks
for any furtuer information..
Xicheng
-
Re: Regular expression to match regular expressions?
In article <t8wCg.2267$Qf.723@newsread2.news.pas.earthlink.net>,
Anton Treuenfels <atreuenfels@earthlink.net> wrote:
[in tawk]
% Is there a cheap way to verify that a regular expression is properly formed
% *before* feeding it to regex(), so as to avoid that error message?
You don't mention your OS, so I'll assume it's Solaris. I believe that
tawk has a mechanism for interfacing to C library routines, but I don't
know the details. Suppose it does, and that you can pass a buffer pointer
the size of a regex_t to it., then you could load regcomp() from libc,
and call it something like this:
rc = regcomp(big_enough_buffer, expression_to_test, REG_EXTENDED+REG_NOSUB);
then check the return code of regcomp(). You'd have to look at the
/usr/include/regex.h to find the values for REG_EXTENDED and REG_NOSUB, and
the size of the first argument.
Suppose that doesn't work or doesn't seem cheap. You could do something like
rc = system("awk '/" pat "/' /dev/null 2> /dev/null")
then check rc (non-zero suggests a bad pattern). Note that you'll have
to escape any /s which appear in pat, and do something more drastic with 's.
--
Patrick TJ McPhee
North York Canada
ptjm@interlog.com
-
Re: Regular expression to match regular expressions?
Bob Harris wrote:
> In article
> <1155197771.534028.245820@75g2000cwc.googlegroups.com>,
> "quarkLore" <agarwal.prateek@> wrote:
>
> > > Is there a cheap way to verify that a regular expression is properly formed
> > > *before* feeding it to regex(), so as to avoid that error message?
> >
> > The language of regular expressions is itself not regular. It is a cfg.
> > You can have (((((a))))). How can you manage to match this using
> > regular expression.
> >
> > Great thought b.t.w.
>
> This is comp.lang.awk, so
>
> echo '(((((a)))))' | awk '/(((((a)))))/ { print }'
this is just matching anything with 'a' in it.
echo 'blah' | awk '/(((((a)))))/ { print }'
>
> works very nicely.
>
> Your mileage may vary in other languages.
>
> Bob Harris
Similar Threads
-
By Application Development in forum PHP
Replies: 13
Last Post: 10-26-2007, 04:31 AM
-
By Application Development in forum Perl
Replies: 0
Last Post: 10-17-2007, 02:03 AM
-
By Application Development in forum Perl
Replies: 9
Last Post: 08-24-2007, 06:25 PM
-
By Application Development in forum Perl
Replies: 0
Last Post: 05-22-2007, 02:03 AM
-
By Application Development in forum Javascript
Replies: 6
Last Post: 03-03-2004, 04:42 PM