positive/negative lookahead issue. greedy = problems? - Javascript

This is a discussion on positive/negative lookahead issue. greedy = problems? - Javascript ; /* * BEGIN EXAMPLES */ var text = 'A Cats Catalog of Cat Catastrophes and Calamities'; /*** * EXAMPLE 1: negative lookahead assertion logic ***/ var newString = text.split(/\s/); for (var i in newString) { var word = newString[i]; if ...

+ Reply to Thread
Results 1 to 7 of 7

positive/negative lookahead issue. greedy = problems?

  1. Default positive/negative lookahead issue. greedy = problems?

    /*
    * BEGIN EXAMPLES
    */

    var text = 'A Cats Catalog of Cat Catastrophes and Calamities';

    /***
    * EXAMPLE 1: negative lookahead assertion logic
    ***/

    var newString = text.split(/\s/);
    for (var i in newString) {
    var word = newString[i];
    if (
    // we'll replace the word Cat under these conditions:
    word.search(/Cat/) == 0 && // *if* word begins with Cat
    word != 'Catalog' && // *but* is not Catalog
    word != 'Catastrophes' // *and* is not Catastrophes
    ) {
    // we'll replace the word Cat with Human
    newString[i] = (word.replace('Cat', 'Human'));
    }
    }
    alert(newString.join(' '));
    // -> A Humans Catalog of Human Catastrophes and Calamities


    /***
    * EXAMPLE 2: the simpler version
    ***/
    var pattern = /(Cat(?!alog|astrophes))/g;
    alert(text.replace(pattern, 'Human'));
    // -> A Humans Catalog of Human Catastrophes and Calamities

    /*
    * END EXAMPLES
    */


    example 1 === example 2. it may appear from this point forward it'll
    be safe to assume my understanding of a negative look ahead may be
    correct. problem is, I feel my understanding may be flawed. here is
    why...

    I always knew about positive/negative look aheads but didn't truly
    understand them. Wrox Professional JavaScript sort of cleared it up
    for me *but* upon deciding to sharpen my skills on my own, I came
    across a peculiar problem.

    Does being greedy work with them?

    e.g., I created this problem entirely on my own. I didn't mean to, it
    just ended up that way. my idea was to match \d+\.\d+ but not if it
    was followed by \.\d+ .

    here's an example:

    var nla = /(\d+\.\d+(?!\.\d))/; // my negative look ahead
    var txt = 'euphoria 72.21.330';
    alert(nla.exec(txt)[1]); // -> 72.2 (completely unexpected)

    what did I expect? nothing. null, nada. I am not at all interested in
    the dozens of other possible solutions. I am most interested in
    understanding this problem. I need enlightenment and very much
    appreciate any insight on it!

  2. Default Re: positive/negative lookahead issue. greedy = problems?

    vbgunz wrote:
    > var nla = /(\d+\.\d+(?!\.\d))/; // my negative look ahead
    > var txt = 'euphoria 72.21.330';
    > alert(nla.exec(txt)[1]); // -> 72.2 (completely unexpected)
    >
    > what did I expect? nothing. null, nada. I am not at all interested in
    > the dozens of other possible solutions. I am most interested in
    > understanding this problem. I need enlightenment and very much
    > appreciate any insight on it!


    Subexpression | Match | Not matched
    ----------------+-------+------------
    \d+ | 72 | .21.330
    \d+. | 72. | 21.330
    \d+.\d+ | 72.21 | .330
    \d+.\d+(?!\.\d) | 72.2 | 1.330


    HTH

    PointedEars
    --
    Prototype.js was written by people who don't know javascript for people
    who don't know javascript. People who don't know javascript are not
    the best source of advice on designing systems that use javascript.
    -- Richard Cornford, cljs, <f806at$ail$1$8300dec7@news.demon.co.uk>

  3. Default Re: positive/negative lookahead issue. greedy = problems?

    vbgunz wrote:
    > var nla = /(\d+\.\d+(?!\.\d))/; // my negative look ahead
    > var txt = 'euphoria 72.21.330';
    > alert(nla.exec(txt)[1]); // -> 72.2 (completely unexpected)
    >
    > what did I expect? nothing. null, nada. I am not at all interested in
    > the dozens of other possible solutions. I am most interested in
    > understanding this problem. I need enlightenment and very much
    > appreciate any insight on it!


    Subexpression | Match | Not matched
    -----------------+-------+------------
    \d+ | 72 | .21.330
    \d+\. | 72. | 21.330
    \d+\.\d+ | 72.21 | .330
    \d+\.\d+(?!\.\d) | 72.2 | 1.330


    HTH

    PointedEars
    --
    Prototype.js was written by people who don't know javascript for people
    who don't know javascript. People who don't know javascript are not
    the best source of advice on designing systems that use javascript.
    -- Richard Cornford, cljs, <f806at$ail$1$8300dec7@news.demon.co.uk>

  4. Default Re: positive/negative lookahead issue. greedy = problems?

    Thomas 'PointedEars' Lahn wrote:

    > Subexpression | Match | Not matched
    > -----------------+-------+------------
    > \d+ | 72 | .21.330
    > \d+\. | 72. | 21.330
    > \d+\.\d+ | 72.21 | .330
    > \d+\.\d+(?!\.\d) | 72.2 | 1.330


    it makes no sense to me. i read up a bit more on positive/negative
    lookaheads and my issue still makes no sense to me. please, read this.

    for text replacement operations I find these lookaheads invaluable. it
    is awesome to know I can check if a pattern is either followed or not
    followed by another pattern (without consuming it). seriously awesome.
    what I do not understand is what is happening in my issue. I know you
    posted a table to help but the table makes no sense to me at all when
    it comes to *why* anything returns

    lines 1 through 3 make perfect sense. 4 throws my baby out of the
    window with the bathwater. I am thinking no match should be made.
    seriously. no match should be made. I cannot understand why any match
    is made.

    (123\.456(?!\.789)) -> no match on 123.456.789 -> PERFECT!
    (\d+\.\d+(?!\.\d+)) -> 2 matches on 123.456.789 -> 123.45/6.789 -> A
    PREFECT WTF!?

    I think this is why I never really used lookaheads. I thought I
    understood them as of yesterday and using primitive examples they're
    very useful *but* when I came up with the problem above, I got thrown
    into a world of hurt. I have no idea why a match is made. no match
    should be made. so. why is there a match being made?

    In the end i would see even 456.789 making sense if it was the only
    thing to return. I am just completely thrown off here. any detailed
    enlightenment would be very much appreciated!

  5. Default Re: positive/negative lookahead issue. greedy = problems?

    vbgunz wrote:
    > Thomas 'PointedEars' Lahn wrote:
    >> Subexpression | Match | Not matched
    >> -----------------+-------+------------
    >> \d+ | 72 | .21.330
    >> \d+\. | 72. | 21.330
    >> \d+\.\d+ | 72.21 | .330
    >> \d+\.\d+(?!\.\d) | 72.2 | 1.330

    >
    > it makes no sense to me. i read up a bit more on positive/negative
    > lookaheads and my issue still makes no sense to me.


    You have asked for an explanation (and explicitly not for a solution), and
    I provided it.

    To be more verbose, the reason for the observed result is that with greedy
    matching (which is the default) the longest possible match for any given
    subexpression wins. Since you have imposed a further restriction on the
    match (that it must not "be followed by a dot followed by a decimal digit"),
    the second longest possible match won. (As you can observe in the last row,
    `72.2' does match `\d+\.\d+' and `1.' does not match `\.\d'.)

    > (123\.456(?!\.789)) -> no match on 123.456.789 -> PERFECT!


    Because you have explicitly requested that a match for "123.456" be not
    followed by ".789" and so you have excluded the one and only possible match.

    > (\d+\.\d+(?!\.\d+)) -> 2 matches on 123.456.789 -> 123.45/6.789 -> A
    > PREFECT WTF!?


    See above. There was more than one possible match because of the modifier,
    and the longest possible one won, given the restrictions imposed. Since you
    have used capturing parentheses, there are two elements in the
    RegExp.prototype.exec() array: one for the matched substring and one for
    the match for the first (and here only) captured substring -- which are
    of course equal here.


    PointedEars
    --
    Anyone who slaps a 'this page is best viewed with Browser X' label on
    a Web page appears to be yearning for the bad old days, before the Web,
    when you had very little chance of reading a document written on another
    computer, another word processor, or another network. -- Tim Berners-Lee

  6. Default Re: positive/negative lookahead issue. greedy = problems?

    vbgunz wrote:
    > (123\.456(?!\.789)) -> no match on 123.456.789 -> PERFECT!
    > (\d+\.\d+(?!\.\d+)) -> 2 matches on 123.456.789 -> 123.45/6.789 -> A
    > PREFECT WTF!?


    It looks like you're thinking of a lookahead as "x followed by y", and
    might find it makes more sense to understand it as "x *immediately*
    followed by y".

    '123.45' is indeed the longest match for digit(s) dot digit(s) not
    immediately followed by dot digit(s). '123.45' is immediately followed
    instead by '6'.

    If you want digit(s) dot digit(s) not followed *anywhere* by dot
    digit(s), you're looking at something like:

    /(\d+\.\d+(?!.*\.\d+))/

    Note the '.*' in the lookahead. That will match '456.789'.

    This scenario, from the perspective of Perl regular expressions, is well
    described at

    http://perldoc.perl.org/perlre.html#Backtracking

  7. Default Re: positive/negative lookahead issue. greedy = problems?

    pr wrote:
    > vbgunz wrote:
    > > (123\.456(?!\.789)) -> no match on 123.456.789 -> PERFECT!
    > > (\d+\.\d+(?!\.\d+)) -> 2 matches on 123.456.789 -> 123.45/6.789 -> A
    > > PREFECT WTF!?


    > It looks like you're thinking of a lookahead as "x followed by y", and
    > might find it makes more sense to understand it as "x *immediately*
    > followed by y".
    >
    > '123.45' is indeed the longest match for digit(s) dot digit(s) not
    > immediately followed by dot digit(s). '123.45' is immediately followed
    > instead by '6'.
    >
    > If you want digit(s) dot digit(s) not followed *anywhere* by dot
    > digit(s), you're looking at something like:
    >
    > /(\d+\.\d+(?!.*\.\d+))/
    >
    > Note the '.*' in the lookahead. That will match '456.789'.
    >
    > This scenario, from the perspective of Perl regular expressions, is well
    > described at
    >
    > http://perldoc.perl.org/perlre.html#Backtracking


    I'd like to thank you both (PointedEars) for your trying to help.
    although it makes a little more sense than it did when I first
    encountered the problem, the perldoc resource looks like a great
    opportunity to clear matters up even more. every link to one of my
    questions is like a gift. clicking one is equivalent to the act of
    unwrapping one. I thank you both again for your time

+ Reply to Thread

Similar Threads

  1. Positive versus Negative line weight
    By Application Development in forum Adobe illustrator
    Replies: 2
    Last Post: 05-24-2007, 10:27 AM
  2. Kruskal and Prim with negative and positive weigths (proof)
    By Application Development in forum Theory
    Replies: 2
    Last Post: 04-02-2007, 08:56 PM
  3. Kruskal and Prim with negative and positive weigths (proof)
    By Application Development in forum Functional
    Replies: 1
    Last Post: 04-02-2007, 07:13 AM
  4. Replies: 6
    Last Post: 03-19-2007, 11:58 AM
  5. negative long types and problems.
    By Application Development in forum c++
    Replies: 3
    Last Post: 01-08-2007, 10:45 PM