如何拆分包含分隔符和转义分隔符的字符串?

时间:2022-11-16 22:07:37

My string delimiter is ;. Delimiter is escaped in the string as \;. E.g.,

我的字符串分隔符是;。分隔符在字符串中转义为\;。例如。,

irb(main):018:0> s = "a;b;;d\\;e"
=> "a;b;;d\\;e"
irb(main):019:0> s.split(';')
=> ["a", "b", "", "d\\", "e"]

Could someone suggest me regex so the output of split would be ["a", "b", "", "d\\;e"]? I'm using Ruby 1.8.7

有人可以建议我正则表达式,所以拆分的输出将是[“a”,“b”,“”,“d \\; e”]?我正在使用Ruby 1.8.7

2 个解决方案

#1


6  

1.8.7 doesn't have negative lookbehind without Oniguruma (which may be compiled in).

1.8.7没有Oniguruma(可以编译)没有负面的背后隐藏。

1.9.3; yay:

> s = "a;b;c\\;d"
=> "a;b;c\\;d"
> s.split /(?<!\\);/
=> ["a", "b", "c\\;d"]

1.8.7 with Oniguruma doesn't offer a trivial split, but you can get match offsets and pull apart the substrings that way. I assume there's a better way to do this I'm not remembering:

1.8.7与Oniguruma不提供简单的分割,但你可以获得匹配偏移并以这种方式拉开子串。我认为有更好的方法来做到这一点我不记得了:

> require 'oniguruma'
> re = Oniguruma::ORegexp.new "(?<!\\\\);"
> s = "hello;there\\;nope;yestho"
> re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds = re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds.collect {|md| md.offset}
=> [[5, 6], [17, 18]]

Other options include:

其他选择包括:

  • Splitting on ; and post-processing the results looking for trailing \\, or
  • 分裂;和后处理结果寻找尾随\\,或

  • Do a char-by-char loop and maintain some simple state and just split manually.
  • 执行char-by-char循环并保持一些简单状态,然后手动拆分。

#2


2  

As @dave-newton answered, you could use negative lookbehind, but that isn't supported in 1.8. An alternative that will work in both 1.8 and 1.9, is to use String#scan instead of split, with a pattern accepting not (semicolon or backslash) or anychar prefixed by backlash:

正如@ dave-newton回答的那样,你可以使用负面的lookbehind,但1.8中不支持。在1.8和1.9中都可以使用的替代方法是使用String#scan而不是split,使用不接受的模式(分号或反斜杠)或者以反冲为前缀的anychar:

$ irb
>> RUBY_VERSION
=> "1.8.7"
>> s = "a;b;c\\;d"
=> "a;b;c\\;d"
s.scan /(?:[^;\\]|\\.)+/
=> ["a", "b", "c\\;d"]

#1


6  

1.8.7 doesn't have negative lookbehind without Oniguruma (which may be compiled in).

1.8.7没有Oniguruma(可以编译)没有负面的背后隐藏。

1.9.3; yay:

> s = "a;b;c\\;d"
=> "a;b;c\\;d"
> s.split /(?<!\\);/
=> ["a", "b", "c\\;d"]

1.8.7 with Oniguruma doesn't offer a trivial split, but you can get match offsets and pull apart the substrings that way. I assume there's a better way to do this I'm not remembering:

1.8.7与Oniguruma不提供简单的分割,但你可以获得匹配偏移并以这种方式拉开子串。我认为有更好的方法来做到这一点我不记得了:

> require 'oniguruma'
> re = Oniguruma::ORegexp.new "(?<!\\\\);"
> s = "hello;there\\;nope;yestho"
> re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds = re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds.collect {|md| md.offset}
=> [[5, 6], [17, 18]]

Other options include:

其他选择包括:

  • Splitting on ; and post-processing the results looking for trailing \\, or
  • 分裂;和后处理结果寻找尾随\\,或

  • Do a char-by-char loop and maintain some simple state and just split manually.
  • 执行char-by-char循环并保持一些简单状态,然后手动拆分。

#2


2  

As @dave-newton answered, you could use negative lookbehind, but that isn't supported in 1.8. An alternative that will work in both 1.8 and 1.9, is to use String#scan instead of split, with a pattern accepting not (semicolon or backslash) or anychar prefixed by backlash:

正如@ dave-newton回答的那样,你可以使用负面的lookbehind,但1.8中不支持。在1.8和1.9中都可以使用的替代方法是使用String#scan而不是split,使用不接受的模式(分号或反斜杠)或者以反冲为前缀的anychar:

$ irb
>> RUBY_VERSION
=> "1.8.7"
>> s = "a;b;c\\;d"
=> "a;b;c\\;d"
s.scan /(?:[^;\\]|\\.)+/
=> ["a", "b", "c\\;d"]