The difference between ereg and preg? - PHP

This is a discussion on The difference between ereg and preg? - PHP ; PHP List, Recently I wrote a piece of code to scrape data from an HTML page. Part of that code deleted all the unwanted text from the very top of the page, where it says "<!DOCTYPE", all the way down ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 16

The difference between ereg and preg?

  1. Default The difference between ereg and preg?

    PHP List,

    Recently I wrote a piece of code to scrape data from an HTML page.

    Part of that code deleted all the unwanted text from the very top of the
    page, where it says "<!DOCTYPE", all the way down to the first instance
    of a "<ul>" tag.

    That code looks like this:
    ereg_replace("<!DOCTYPE(.*)<ul>", "", $htmlPage);

    It works fine. But I noticed that on almost all the tutorial pages I
    looked at, they just about always used preg_replace, and not ereg_replace.

    It seemed that the main difference was that preg_replace required
    forward slashes around the regular expression, like so:
    preg_replace("/<!DOCTYPE(.*)<ul>/", "", $htmlPage);

    But that didn't work, and returned an error.

    Since ereg was working, though, I figured I would just stick with it.

    Still, I thought it worth asking:

    Is there any reason why either ereg or preg would be more desirable over
    the other?

    Why does the ereg work for the command above, but preg not?

    Thank you for any advice.

    --
    Dave M G

  2. Default Re: [PHP] The difference between ereg and preg?

    Dave M G wrote:
    > PHP List,
    >
    > Recently I wrote a piece of code to scrape data from an HTML page.
    >
    > Part of that code deleted all the unwanted text from the very top of the
    > page, where it says "<!DOCTYPE", all the way down to the first instance
    > of a "<ul>" tag.
    >
    > That code looks like this:
    > ereg_replace("<!DOCTYPE(.*)<ul>", "", $htmlPage);
    >
    > It works fine. But I noticed that on almost all the tutorial pages I
    > looked at, they just about always used preg_replace, and not ereg_replace.
    >
    > It seemed that the main difference was that preg_replace required
    > forward slashes around the regular expression, like so:
    > preg_replace("/<!DOCTYPE(.*)<ul>/", "", $htmlPage);
    >
    > But that didn't work, and returned an error.
    >
    > Since ereg was working, though, I figured I would just stick with it.
    >
    > Still, I thought it worth asking:
    >
    > Is there any reason why either ereg or preg would be more desirable over
    > the other?
    >
    > Why does the ereg work for the command above, but preg not?
    >
    > Thank you for any advice.
    >
    > --
    > Dave M G
    >

    From what I understand there are 2 schools of thought on regular
    expressions. There is the Perl (Perl Compatible Regular Expressions -
    preg) group and the POSIX (ereg) group. from the little I know they do
    the same thing but with different symbols for things.

    References:
    http://www.php.net/regex
    http://www.php.net/pcre
    --

    life is a game... so have fun.


  3. Default Re: [PHP] The difference between ereg and preg?

    Dave M G wrote:
    > PHP List,
    >
    > Recently I wrote a piece of code to scrape data from an HTML page.
    >
    > Part of that code deleted all the unwanted text from the very top of the
    > page, where it says "<!DOCTYPE", all the way down to the first instance
    > of a "<ul>" tag.
    >
    > That code looks like this:
    > ereg_replace("<!DOCTYPE(.*)<ul>", "", $htmlPage);
    >
    > It works fine. But I noticed that on almost all the tutorial pages I
    > looked at, they just about always used preg_replace, and not ereg_replace.
    >
    > It seemed that the main difference was that preg_replace required
    > forward slashes around the regular expression, like so:
    > preg_replace("/<!DOCTYPE(.*)<ul>/", "", $htmlPage);
    >
    > But that didn't work, and returned an error.
    >
    > Since ereg was working, though, I figured I would just stick with it.
    >
    > Still, I thought it worth asking:
    >
    > Is there any reason why either ereg or preg would be more desirable over
    > the other?
    >
    > Why does the ereg work for the command above, but preg not?


    ! in perl regular expressions means "not" so you need to escape it:

    \!

    pcre expressions are very close to the same as in perl, so you can
    easily move the regular expression from one language to the other
    (whether that's good or not is another thing). If you have any perl
    experience, they are easier to understand. If you don't, they are a
    little harder to understand to start off with but are more powerful once
    you work them out.

    The ereg functions are simpler to use but miss a lot of functionality.

    Also according to this page:
    http://www.php.net/~derick/meeting-n...e-ereg-to-pecl

    ereg will be moved to pecl which means it will be less available (ie
    most hosts won't install / enable it).

    --
    Postgresql & php tutorials
    http://www.designmagick.com/

  4. Default Re: [PHP] The difference between ereg and preg?

    On 04/08/06, Chris <dmagick@gmail.com> wrote:
    >
    > Dave M G wrote:
    > > PHP List,
    > >
    > > Recently I wrote a piece of code to scrape data from an HTML page.
    > >
    > > Part of that code deleted all the unwanted text from the very top of the
    > > page, where it says "<!DOCTYPE", all the way down to the first instance
    > > of a "<ul>" tag.

    >
    > > Is there any reason why either ereg or preg would be more desirable over
    > > the other?
    > >



    Ereg uses POSIX regular expressions which only work on textual data and
    include locale functionalily. preg uses PCRE, which work on binary as well
    as textual data and is a more sophisticated regex engine, incorporating
    minimal matching, backreferences, inline options, lookahead/lookbehind
    assertions, conditional expressions and so on.

    They are often faster than the POSIX equivalents also.




    --
    http://www.web-buddha.co.uk
    http://www.projectkarma.co.uk


  5. Default Re: [PHP] The difference between ereg and preg?

    Chris, Ligaya, Dave,

    Thank you for responding. I understand the difference in principle
    between ereg and preg much better now.

    Chris wrote:
    > ! in perl regular expressions means "not" so you need to escape it:
    > \!

    Still, when including that escape character, the following preg
    expression does not find any matching text:
    preg_replace("/<\!DOCTYPE(.*)<ul>/", "", $htmlPage);

    Whereas this ereg expression does find a match:
    ereg_replace("<!DOCTYPE(.*)<ul>", "", $htmlPage);

    What do I need to do to make the preg expression succeed just as the
    ereg expression does?

    Thank you for your time and advice.

    --
    Dave M G

  6. Default Re: [PHP] The difference between ereg and preg?

    On 04/08/06, Dave M G <martin@autotelic.com> wrote:
    >
    > It seemed that the main difference was that preg_replace required
    > forward slashes around the regular expression, like so:
    > preg_replace("/<!DOCTYPE(.*)<ul>/", "", $htmlPage);


    It requires delimiters - slashes are conventional, but other
    characters can be used.

    >
    > But that didn't work, and returned an error.


    What you've written here shouldn't return you an error - what did the
    message say?

    I suspect you were trying to match until </ul> rather than <ul> and
    the pcre engine thought the forward slash was the end of your regular
    expression.

    > Part of that code deleted all the unwanted text from the very top of the
    > page, where it says "<!DOCTYPE", all the way down to the first instance
    > of a "<ul>" tag.


    It won't quite do that.

    The (.*) matches as much as possible (it's called greedy matching) -
    so it'll match and replace all the way down to the *last* instance of
    a "<ul>" tag.

    To make it match for the shortest length possible, put a question-mark
    after the ".*" like so:

    preg_replace("/<!DOCTYPE(.*?)<ul>/", "", $htmlPage);

    > Since ereg was working, though, I figured I would just stick with it.
    >
    > Still, I thought it worth asking:
    >
    > Is there any reason why either ereg or preg would be more desirable over
    > the other?


    pcre has a performance advantage, has more features and can use the
    many regexps written for perl with few or no changes.

    ereg is a posix standard.

    -robin

  7. Default Re: [PHP] The difference between ereg and preg?

    Dave M G wrote:
    > Chris, Ligaya, Dave,
    >
    > Thank you for responding. I understand the difference in principle
    > between ereg and preg much better now.
    >
    > Chris wrote:
    >> ! in perl regular expressions means "not" so you need to escape it:
    >> \!

    > Still, when including that escape character, the following preg
    > expression does not find any matching text:
    > preg_replace("/<\!DOCTYPE(.*)<ul>/", "", $htmlPage);


    does this one work?:

    preg_replace('#^<\!DOCTYPE(.*)<ul[^>]*>#is', '', $htmlPage);

    >
    > Whereas this ereg expression does find a match:
    > ereg_replace("<!DOCTYPE(.*)<ul>", "", $htmlPage);
    >
    > What do I need to do to make the preg expression succeed just as the
    > ereg expression does?
    >
    > Thank you for your time and advice.
    >
    > --
    > Dave M G
    >


  8. Default Re: [PHP] The difference between ereg and preg?

    Jochem,

    Thank you for responding.

    >
    > does this one work?:
    > preg_replace('#^<\!DOCTYPE(.*)<ul[^>]*>#is', '', $htmlPage);


    Yes, that works. I don't think I would have every figured that out on my
    own - it's certainly much more complicated than the ereg equivalent.

    If I may push for just one more example of how to properly use regular
    expressions with preg:

    It occurs to me that I've been assuming that with regular expressions I
    could only remove or change specified text.

    What if I wanted to get rid of everything *other* than the specified text?

    Can I form an expression that would take $htmlPage and delete everything
    *except* text that is between a <li> tag and a <br> tag?

    Or is that something that requires much more than a single use of
    preg_replace?

    --
    Dave M G

  9. Default Re: [PHP] The difference between ereg and preg?

    Dave M G wrote:
    > Jochem,
    >
    > Thank you for responding.
    >
    >>
    >> does this one work?:
    >> preg_replace('#^<\!DOCTYPE(.*)<ul[^>]*>#is', '', $htmlPage);

    >
    > Yes, that works. I don't think I would have every figured that out on my
    > own - it's certainly much more complicated than the ereg equivalent.


    1. the '^' at the start of the regexp states 'must match start of the string (or line in multiline mode)'
    2. the 'i' after the the closing regexp delimiter states 'match case-insensitively'
    3. the 's' after the the closing regexp delimiter states 'the dot also matches newlines'
    4. the '<u[^>]*>' matches a UL tag with any number of attributes ... the '[^>]*' matches a number
    of characters that are not a '>' character - the square brackets denote a character class (in
    this cass with just one character in it) and the '^' at the start of the character class
    definition negates the class (i.e. turns the character class definition to mean every character
    *not* defined in the class)

    PCRE is alot more powerful [imho], the downside it it has more modifiers
    and syntax to control the meaning of the patterns...

    read and become familiar with these 2 pages:
    http://php.net/manual/en/reference.p....modifiers.php
    http://php.net/manual/en/reference.p...ern.syntax.php

    and remember that writing patterns is often quite a complex - when you build one
    just take i one 'assertion' at a time, ie. build the pattern up step by step...

    if you give it a good go and get stuck, then there is always the list.

    >
    > If I may push for just one more example of how to properly use regular
    > expressions with preg:
    >
    > It occurs to me that I've been assuming that with regular expressions I
    > could only remove or change specified text.


    essentially regexps are pattern syntax for asserting where something matches
    a pattern (or not) - there are various functions that allow you to act upon the
    results of the pattern matching depending on your needs (see below)

    >
    > What if I wanted to get rid of everything *other* than the specified text?
    >
    > Can I form an expression that would take $htmlPage and delete everything
    > *except* text that is between a <li> tag and a <br> tag?


    yes but you wouldn't use preg_replace() but rather preg_match() or preg_match_all()
    which gives you back an array (via 3rd/4th[?] reference argument) which contains
    the texts that matched (and therefore want to keep).

    >
    > Or is that something that requires much more than a single use of
    > preg_replace?
    >
    > --
    > Dave M G
    >


  10. Default Re: [PHP] The difference between ereg and preg?

    Jochem,

    Thank you for responding, and for explaining more about regular expressions.

    > yes but you wouldn't use preg_replace() but rather preg_match() or preg_match_all()
    > which gives you back an array (via 3rd/4th[?] reference argument) which contains
    > the texts that matched (and therefore want to keep).

    I looked up preg_match_all() on php.net, and, in combination with what
    was said before, came up with this syntax:

    preg_match_all( "#^<li[^>]*>(.*)<br[^>]*>#is", $response, $wordList,
    PREG_PATTERN_ORDER );
    var_dump($wordList);

    The idea is to catch all text between <li> and <br> tags.

    Unfortunately, the result I get from var_dump is:

    array(2) { [0]=> array(0) { } [1]=> array(0) { } }

    In other words, it made no matches.

    The text being searched is an entire web page which contains the following:
    (Please note the following includes utf-8 encoded Japanese text.
    Apologies if it comes out as ASCII gibberish)

    <FONT color="red">日本語</FONT>は<FONT color="red">簡単</FONT>だよ<br>
    <ul><li> 日本語 【にほんご】 (n) Japanese language; (P); EP <br>
    <li> 簡単 【かんたん】 (adj-na,n) simple; (P); EP <br>
    </ul><p>

    So, my preg_match_all search should have found:

    日本語 【にほんご】 (n) Japanese language; (P); EP
    簡単 【かんたん】 (adj-na,n) simple; (P); EP

    I've checked and rechecked my syntax, and I can't see why it would fail.

    Have I messed up the regular expression, or the use of preg_match_all?

    --
    Dave M G

+ Reply to Thread
Page 1 of 2 1 2 LastLast