Word frequency ****yser - Perl

This is a discussion on Word frequency ****yser - Perl ; Hi, Does anyone happen to know if there's a convenient module which will ****yse at least two XML files and list the most frequently-used words? (It would have to be able to reject tags and certain words such as "the" ...

+ Reply to Thread
Results 1 to 4 of 4

Word frequency ****yser

  1. Default Word frequency ****yser

    Hi,

    Does anyone happen to know if there's a convenient module which will ****yse
    at least two XML files and list the most frequently-used words?

    (It would have to be able to reject tags and certain words such as "the" and
    "is")



  2. Default Re: Word frequency ****yser

    "DVH" <dvh@dvhdvhdvhdvdh.dvh> wrote:

    > Hi,
    >
    > Does anyone happen to know if there's a convenient module which will
    > ****yse at least two XML files and list the most frequently-used
    > words?
    >
    > (It would have to be able to reject tags


    XML::Parser, which a Char handler.

    > and certain words such as
    > "the" and "is")


    split in the handler on non-words, use a hash for counting. Delete
    afterwards all occurences of the, is, etc.

    Note that this is a very simplistic approach, since it words are hypenated,
    it counts them as two different ones.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    I ploink googlegroups.com :-)


  3. Default Re: Word frequency ****yser

    John Bokma <john@castleamber.com> writes:

    > "DVH" <dvh@dvhdvhdvhdvdh.dvh> wrote:
    >
    >> Hi,
    >>
    >> Does anyone happen to know if there's a convenient module which will
    >> ****yse at least two XML files and list the most frequently-used
    >> words?
    >>
    >> (It would have to be able to reject tags

    >
    > XML::Parser, which a Char handler.
    >
    >> and certain words such as
    >> "the" and "is")

    >
    > split in the handler on non-words, use a hash for counting. Delete
    > afterwards all occurences of the, is, etc.
    >
    > Note that this is a very simplistic approach, since it words are hypenated,
    > it counts them as two different ones.


    Seaching for "word frequency" on search.cpan.org turns up some modules
    that are designed for this sort of thing, and may take some of the
    trickier issues into account.

    ----Scott.

  4. Default Re: Word frequency ****yser

    On Tue, 25 Oct 2005 01:30:58 +1000, Scott W Gifford wrote:

    Hi Folks

    A list of stop word, courtesy of MySQL, can be downloaded from:

    http://savage.net.au/Ron/mysql-stop-words.txt



+ Reply to Thread

Similar Threads

  1. Replies: 0
    Last Post: 10-18-2007, 05:40 AM
  2. Replies: 0
    Last Post: 10-17-2007, 08:03 AM
  3. Replies: 0
    Last Post: 08-10-2007, 08:03 AM
  4. Select the word+realization with highest frequency
    By Application Development in forum awk
    Replies: 1
    Last Post: 06-27-2007, 09:40 AM
  5. Replies: 0
    Last Post: 05-22-2007, 02:03 PM