Objectmix
Tags Register Mark Forums Read

Regex in Ruby question : RUBY

This is a discussion on Regex in Ruby question within the RUBY forums in Programming Languages category; Another newbie question: I've just started learning about regex "officially" (reading the O'reilly book) and was trying out some of the examples (porting them to Ruby from Perl). From what I can tell, Ruby's regex engine doesn't support "lookbehind", and as I had only a smattering of knowledge about regex before and no prior experience with lookaround, I was curious if this was significant. i.e. is there any problem that can't be solved without lookbehind? (Note that I am not facing such a problem; I'm just curious if lookbehind is just an optional feature that makes certain problems easier, as ...


Object Mix > Programming Languages > RUBY > Regex in Ruby question

Reply

 

LinkBack Thread Tools
  #1  
Old 02-22-2008, 12:31 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Regex in Ruby question

Another newbie question:

I've just started learning about regex "officially" (reading the
O'reilly book) and was trying out some of the examples (porting them to
Ruby from Perl). From what I can tell, Ruby's regex engine doesn't
support "lookbehind", and as I had only a smattering of knowledge about
regex before and no prior experience with lookaround, I was curious if
this was significant.

i.e. is there any problem that can't be solved without lookbehind?

(Note that I am not facing such a problem; I'm just curious if
lookbehind is just an optional feature that makes certain problems
easier, as opposed to being an essential thing.)

Thanks!
--
Posted via http://www.ruby-forum.com/.

Reply With Quote
  #2  
Old 02-22-2008, 01:00 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Regex in Ruby question

> From what I can tell, Ruby's regex engine doesn't
> support "lookbehind"


Ruby 1.9 has look-behind:

(?<=subexp) look-behind
(?<!subexp) negative look-behind

With ruby 1.8, you can install Oniguruma.

Regards,
Thomas.

Reply With Quote
  #3  
Old 02-22-2008, 02:09 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Regex in Ruby question

J. Cooper wrote:
> i.e. is there any problem that can't be solved without lookbehind?


Yes, there are. For example: you want to match any occurence of "bar" except
if it is preceeded by "foo". I.e. you'd want to match "blabar" or "oofbar",
but not "foobar". You can't do that without negative lookbehind.
It might be interesting to note though, that any such problem could also not
be solved by a regular grammar, so "regular" expressions that need lookbehind
aren't, as such, regular anymore.

HTH,
Sebastian
--
Jabber: sepp2k@jabber.org
ICQ: 205544826

Reply With Quote
  #4  
Old 02-22-2008, 02:38 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Regex in Ruby question

> Yes, there are. For example: you want to match any occurence of "bar" except
> if it is preceeded by "foo". I.e. you'd want to match "blabar" or "oofbar",
> but not "foobar".


I think it's important to state that the look-behind matches with zero
width, i.e. the match isn't included in the match.

If it's okay to include the prefix in the match (e.g., in a gsub, the
prefix could then be referenced as a group), this could also be
achieved without lookbehind:

require 'strscan'
# 0 1 2 3
# 0123456789012345678901234567890123
s = StringScanner.new('blabar oofbar foobar ofobar offbar')
# ^ ^ ^ ^
until s.eos?
m = s.scan_until(/([^o]|[^o]o|[^f]oo)(bar)/)
p s.pos
end

# =>
6
13
27
34

pos 20 is missing.

There are of course situations when this isn't possible.

Regards,
Thomas.

Reply With Quote
  #5  
Old 02-22-2008, 03:27 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: Regex in Ruby question

> However, with:
> m = s.scan_until(/(([^o]|^)|([^o]|^)o|([^f]|^)oo)(bar)/)


Oh well. It's probably a good thing we have look-behind now. :-)

Regards,
Thomas.
Reply With Quote
Reply

Thread Tools



All times are GMT -5. The time now is 07:53 PM.