Hpricot Html Parsing : RUBY
This is a discussion on Hpricot Html Parsing within the RUBY forums in Programming Languages category; Hi, I'm getting funky characters, when parsing html using Hpricot. How to remove this funky character? Anyone have a fix / workaround for this? thanks in advance, Suja -- Posted via http://www.ruby-forum.com/ ....
![]() |
| | LinkBack | Thread Tools |
|
#1
| |||
| |||
| I'm getting funky characters, when parsing html using Hpricot. How to remove this funky character? Anyone have a fix / workaround for this? thanks in advance, Suja -- Posted via http://www.ruby-forum.com/. |
|
#2
| |||
| |||
| Hi Suja, two suggestions: - check the encoding used by the page you're hashpricoting (doh - think I just invented a verb, or what). - puts $KCODE to see if you're running in unicode or not. If you are hashpricoting a page encoded in UTF-8, but KCODE is set to none (or if the page is in latin1, but KCODE is set to U), then you'll have to change the encoding using iconv for instance. cheers Thibaut |
|
#3
| |||
| |||
| Suja JS wrote: > Hi, > I'm getting funky characters, when parsing html using Hpricot. > How to remove this funky character? > > Anyone have a fix / workaround for this? > > thanks in advance, > Suja Could you describe these 'funky characters'? -- Posted via http://www.ruby-forum.com/. |
|
#4
| |||
| |||
| Lee Jarvis wrote: > Suja JS wrote: >> Hi, >> I'm getting funky characters, when parsing html using Hpricot. >> How to remove this funky character? >> >> Anyone have a fix / workaround for this? >> >> thanks in advance, >> Suja > > Could you describe these 'funky characters'? Like '�' in this text. "By Mike Monson CHAMPAIGN � Effective today the city of Champaign is closing three bridges and posting load limits on three others." -- Posted via http://www.ruby-forum.com/. |
|
#5
| |||
| |||
| > "By Mike Monson CHAMPAIGN ? Effective today the city of Champaign is > closing three bridges and posting load limits on three others." hint hint : http://www.news-gazette.com/news/loc...s_limits_loads The minus character you see after CHAMPAIGN is not a regular "-". |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| |
| ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| html parsing | usenet | TCL | 6 | 11-03-2007 11:07 PM |
| HTML parser Hpricot? and how to get all text | usenet | RUBY | 10 | 11-03-2007 03:10 AM |
| Need help parsing HTML with Hpricot... | usenet | RUBY | 3 | 10-25-2007 02:49 AM |
| C# parsing html DOM example ? | usenet | CSharp | 1 | 09-17-2007 04:35 PM |
| HTML parsing | usenet | Perl | 4 | 04-18-2005 08:29 AM |
All times are GMT -5. The time now is 04:59 PM.


