"reverse templating" or "auto-meta-regex" module for automated screen-scrape learning? - Perl

This is a discussion on "reverse templating" or "auto-meta-regex" module for automated screen-scrape learning? - Perl ; I was discussing screen scraping with some acquaintances recently, and they claimed they'd seen a website which allowed users to select certain regions of a given page via a nice UI... and there was an app behind this that would ...

+ Reply to Thread
Results 1 to 2 of 2

"reverse templating" or "auto-meta-regex" module for automated screen-scrape learning?

  1. Default "reverse templating" or "auto-meta-regex" module for automated screen-scrape learning?

    I was discussing screen scraping with some acquaintances recently, and
    they claimed they'd seen a website which allowed users to select
    certain regions of a given page via a nice UI... and there was an app
    behind this that would then learn from these selections to extract
    data from corresponding regions of similar pages. "Reverse templating"
    and "auto-meta-regex" were the terms we came up with, but there's
    probably a better description. They also claimed there were perl
    modules that did this same thing, but I haven't been able to locate
    them on CPAN -- does anyone know what these might be?

    Thanks!


  2. Default Re: "reverse templating" or "auto-meta-regex" module for automated screen-scrape learning?

    On 2007-09-18 22:09:09 -0400, Weston
    <notsew-reversePreceedingAndRemoveThis@canncentral.org> said:

    > I was discussing screen scraping with some acquaintances recently, and
    > they claimed they'd seen a website which allowed users to select
    > certain regions of a given page via a nice UI... and there was an app
    > behind this that would then learn from these selections to extract
    > data from corresponding regions of similar pages. "Reverse templating"
    > and "auto-meta-regex" were the terms we came up with, but there's
    > probably a better description. They also claimed there were perl
    > modules that did this same thing, but I haven't been able to locate
    > them on CPAN -- does anyone know what these might be?


    Apple created the Web Clip Widget for Mac OS X 10.5 which is what
    popped into my head when I read this. Basically it allows you to
    select a region of a page which corresponds to a table, div or whatever
    and make a widget out of it. They showed this off a long time ago and
    have yet to release it but someone created a knock off right away:

    Dash Clipping
    http://www.fondantfancies.com/blog/3001239/

    On the perl side of things, when it comes to scraping I would say
    HTML::Treebuilder is your best friend. It allows you to parse down to
    the table, div or whatever and play with what is inside of it.

    But it sounds like you are looking for more. Maybe a UI that allows
    you to select the table, div or whatever and it then generate perl code
    that uses HTML::Treebuilder to get you to where you selected in the UI.
    Now that sounds fun. Something tells me someone could work towards
    that using Camelbones to access the WebKit innards.

    Sorry to those that are put off by the talk of Mac stuff... it is what
    I know and where I play.

    --
    David Steinbrunner


+ Reply to Thread

Similar Threads

  1. "once" meta-method without "module_eval" (Pickaxe)
    By Application Development in forum RUBY
    Replies: 6
    Last Post: 11-14-2007, 03:08 PM
  2. Replies: 0
    Last Post: 03-21-2007, 01:26 PM
  3. """""""""""""""""""""Visual C++ 2005 Express"""""""""""""""""
    By Application Development in forum DOTNET
    Replies: 0
    Last Post: 03-12-2006, 03:55 AM
  4. Replies: 4
    Last Post: 11-08-2005, 12:04 PM
  5. <META name="DocAuthor" content="author name"> doesn't work
    By Application Development in forum Inetserver
    Replies: 1
    Last Post: 01-21-2004, 09:37 AM