Objectmix
Tags Register Mark Forums Read

Hpricot Html Parsing : RUBY

This is a discussion on Hpricot Html Parsing within the RUBY forums in Programming Languages category; Hi, I'm getting funky characters, when parsing html using Hpricot. How to remove this funky character? Anyone have a fix / workaround for this? thanks in advance, Suja -- Posted via http://www.ruby-forum.com/ ....


Object Mix > Programming Languages > RUBY > Hpricot Html Parsing

Reply

 

LinkBack Thread Tools
  #1  
Old 09-14-2007, 03:34 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Hpricot Html Parsing

Hi,
I'm getting funky characters, when parsing html using Hpricot.
How to remove this funky character?

Anyone have a fix / workaround for this?

thanks in advance,
Suja
--
Posted via http://www.ruby-forum.com/.

  #2  
Old 09-14-2007, 05:53 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Hpricot Html Parsing

Hi Suja,

two suggestions:
- check the encoding used by the page you're hashpricoting (doh -
think I just invented a verb, or what).
- puts $KCODE to see if you're running in unicode or not. If you are
hashpricoting a page encoded in UTF-8, but KCODE is set to none (or if
the page is in latin1, but KCODE is set to U), then you'll have to
change the encoding using iconv for instance.

cheers

Thibaut

  #3  
Old 09-15-2007, 04:17 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Hpricot Html Parsing

Suja JS wrote:
> Hi,
> I'm getting funky characters, when parsing html using Hpricot.
> How to remove this funky character?
>
> Anyone have a fix / workaround for this?
>
> thanks in advance,
> Suja


Could you describe these 'funky characters'?
--
Posted via http://www.ruby-forum.com/.

  #4  
Old 09-15-2007, 04:22 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Hpricot Html Parsing

Lee Jarvis wrote:
> Suja JS wrote:
>> Hi,
>> I'm getting funky characters, when parsing html using Hpricot.
>> How to remove this funky character?
>>
>> Anyone have a fix / workaround for this?
>>
>> thanks in advance,
>> Suja

>
> Could you describe these 'funky characters'?


Like '�' in this text.
"By Mike Monson CHAMPAIGN � Effective today the city of Champaign is
closing three bridges and posting load limits on three others."
--
Posted via http://www.ruby-forum.com/.

  #5  
Old 09-15-2007, 05:14 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Hpricot Html Parsing

> "By Mike Monson CHAMPAIGN ? Effective today the city of Champaign is
> closing three bridges and posting load limits on three others."


hint hint : http://www.news-gazette.com/news/loc...s_limits_loads

The minus character you see after CHAMPAIGN is not a regular "-".

Reply

Thread Tools


Similar Threads

Thread Thread Starter Forum Replies Last Post
html parsing usenet TCL 6 11-03-2007 11:07 PM
HTML parser Hpricot? and how to get all text usenet RUBY 10 11-03-2007 03:10 AM
Need help parsing HTML with Hpricot... usenet RUBY 3 10-25-2007 02:49 AM
C# parsing html DOM example ? usenet CSharp 1 09-17-2007 04:35 PM
HTML parsing usenet Perl 4 04-18-2005 08:29 AM


All times are GMT -5. The time now is 04:59 PM.

Managed by Infnx Pvt Ltd.