如何使用regex在+、-、*、/符号之间分割字符串?

I need to tell Ruby in regex to split before and after the + - * / symbols in my program.

我需要告诉regex中的Ruby在程序中的+ - * /符号之前和之后进行拆分。

Examples:

例子:

I need to turn "1+12" into [1.0, "+", 12.0]

我需要将1+12转换为[1.0，+，12.0]

and "6/0.25" into [6.0, "/", 0.25]

和"6/0.25"到[6.0 "/" 0。25]

There could be cases like "3/0.125" but highly unlikely. If first two I listed above are satisfied it should be good.

有可能出现“3/0.125”这样的情况，但可能性很小。如果我上面列出的前两个是满意的，应该是好的。

On the Ruby docs, "hi mom".split(%r{\s*}) #=> ["h", "i", "m", "o", "m"]

在Ruby文档,“嗨妈妈”.split r(% { \ s * })# = >[“h”、“我”、“m”,“o”,“m”)

I looked up a cheat-sheet to try to understand %r{\s*}, and I know that the stuff inside %r{} such as \s are skipped and \s means white space in regex.

我查找了一个cheat-sheet，试图理解%r{\s*}，并且我知道在%r{}内的东西被跳过，\s表示regex中的空白。

5 个解决方案

#1

I think this could be useful:

我认为这是有用的:

"1.2+3.453".split('+').flat_map{|elem| [elem, "+"]}[0...-1]
# => ["1.2", "+", "3.453"]
"1.2+3.453".split('+').flat_map{|elem| [elem.to_f, "+"]}[0...-1]
# => [1.2, "+", 3.453]

Obviously this work only for +. But you can change the split character.

显然，这只适用于+。但是您可以更改分割字符。

EDIT:

编辑:

This version work for every operator

这个版本适用于每个操作符

"1.2+3.453".split(%r{(\+|\-|\/|\*)}).map do |x|
    unless x =~ /(\+|\-|\/|\*)/ then x.to_f else x end
end
# => [1.2, "+", 3.453]

#2

'1.0+23.7'.scan(/(((\d\.?)+)|[\+\-\*\/])/)

#3

instead of splitting, match with capture groups to parse your inputs:

与其拆分，不如与捕获组匹配来解析输入:

(?<operand1>(?:\d+(?:\.\d+)?)|(?:\.\d+))\s*(?<operator>[+\/*-])\s*(?<operand2>(?:\d+(?:\.\d+)?)|(?:\.\d+))

explanation:

解释:

I've used named groups (?<groupName>regex) but they aren't necessary and could just be ()'s - either way, the sub-captures will still be available as 1,2,and 3. Also note the (?:regex) constructs that are for grouping only and do not "remember" anything, and won't mess up your captures)
我已经使用了命名组(? regex)，但它们不是必需的，可能只是()的组——无论哪种方式，子捕获仍然可以作为1、2和3使用。还要注意(?:regex)结构，它们只用于分组，不“记住”任何东西，不会打乱捕获的内容)
(?:\d+(?:\.\d+)?)|(?:\.\d+)) first number: either leading digit(s) followed optionally by a decimal point and digit(s), OR a leading decimal point followed by digit(s)
(?:\d+(?:\ d+ ?)|(?:\ d+))第一个数:前位(s)后跟可选的小数点和数位(s)，前位(s)跟数位(s)
\s* zero or more spaces in between
\s* 0或更多的空格
[+\/*-] operator: character class meaning a plus, division sign, minus, or multiply.
运算符:字符类，表示加、除、减或乘。
\s* zero or more spaces in between
\s* 0或更多的空格
(?:\d+(?:\.\d+)?)|(?:\.\d+) second number: same pattern as first number.
(?:\ d +(?:\ \ d +)?)|(?:\ \ d +)第二个数字:同样的模式作为第一号。

regex demo output:

regex演示输出:

#4

I arrived a little late to this party, and found that many of the good answers had already been taken. So, I set out to expand on the theme slightly and compare the performance and robustness of each of the solutions. It seemed like a fun way to entertain myself this morning.

我来晚了一点，发现许多好的答案已经被采纳了。因此，我开始稍微扩展主题，比较每个解决方案的性能和健壮性。这似乎是一个有趣的娱乐方式，今天早上。

In addition to the 3 examples given in the question, I added test cases for each of the four operators, as well as for some new edge cases. These edge cases included handling of negative numbers and arbitrary spaces between operands, as well as how each of the algorithms handled expected failures.

除了问题中给出的3个示例之外，我还为这4个操作符中的每个添加了测试用例，以及一些新的边缘用例。这些边缘情况包括处理负数和操作数之间的任意空格，以及每种算法如何处理预期的失败。

The answers revolved around 3 methods: split, scan, and match. I also wrote new solutions using each of these 3 methods, specifically respecting the additional edge cases that I added to here. I ran all of the algorithms against this full set of test cases, and ended up with a table of pass/fail results.

答案围绕着三种方法:分裂、扫描和匹配。我还使用这三种方法编写了新的解决方案，特别是考虑到我在这里添加的附加边界情况。我针对这一整套测试用例运行了所有的算法，最后得到了一个pass/fail结果表。

Next, I created a benchmark that created 1,000,000 test strings that each of the solutions would be able to parse properly, and ran each solution against that sample set.

接下来，我创建了一个基准，该基准创建了1,000,000个测试字符串，每个解决方案都可以正确解析这些字符串，并针对该示例集运行每个解决方案。

On first benchmarking, Cary Swoveland's solution performed far better than the others, but didn't pass the added test cases. I made very minor changes to his solution to produce a solution that supported both negative numbers and arbitrary spaces, and included that test as Swoveland+.

在第一次基准测试中，Cary Swoveland的解决方案比其他方案表现得好得多，但是没有通过附加的测试用例。我对他的解决方案做了很小的修改，生成了一个支持负数和任意空格的解决方案，并将该测试包含为Swoveland+。

The final results printed from to the console are here (note: horizontal scroll to see all results):

从控制台打印的最终结果在这里(注意:水平滚动查看所有结果):

| Test Case |  match  |  match  |  scan   |  scan   |partition|  split  |  split  |   split  |  split  |
|           | Gaskill | sweaver | Gaskill | techbio |Swoveland| Gaskill |Swoveland|Swoveland+|  Lilue  |
|------------------------------------------------------------------------------------------------------|
| "1+12"    |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| "6/0.25"  |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| "3/0.125" |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| "30-6"    |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| "3*8"     |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| "20--4"   |  Pass   |   --    |  Pass   |   --    |  Pass   |  Pass   |   --    |   Pass   |  Pass   |
| "33+-9"   |  Pass   |   --    |  Pass   |   --    |  Pass   |  Pass   |   --    |   Pass   |  Pass   |
| "-12*-2"  |  Pass   |   --    |  Pass   |   --    |  Pass   |  Pass   |   --    |   Pass   |  Pass   |
| "-72/-3"  |  Pass   |   --    |  Pass   |   --    |  Pass   |  Pass   |   --    |   Pass   |  Pass   |
| "34 - 10" |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| " 15+ 9"  |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| "4*6 "    |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |  Pass   |   Pass   |  Pass   |
| "b+0.5"   |  Pass   |  Pass   |  Pass   |   --    |   --    |   --    |   --    |    --    |   --    |
| "8---0.5" |  Pass   |  Pass   |  Pass   |   --    |   --    |   --    |   --    |    --    |   --    |
| "8+6+10"  |  Pass   |   --    |  Pass   |   --    |   --    |   --    |   --    |    --    |   --    |
| "15*x"    |  Pass   |  Pass   |  Pass   |   --    |   --    |   --    |   --    |    --    |   --    |
| "1.A^ff"  |  Pass   |  Pass   |  Pass   |   --    |   --    |   --    |   --    |    --    |   --    |


ruby 2.2.5p319 (2016-04-26 revision 54774) [x86_64-darwin14]
============================================================
                                user     system      total        real
match (Gaskill):            4.770000   0.090000   4.860000 (  5.214996)
match (sweaver2112):        4.640000   0.040000   4.680000 (  4.911849)
scan (Gaskill):             7.360000   0.080000   7.440000 (  7.719646)
scan (techbio):            12.930000   0.140000  13.070000 ( 13.791613)
partition (Swoveland):      5.390000   0.050000   5.440000 (  5.648762)
split (Gaskill):            5.150000   0.100000   5.250000 (  5.455094)
split (Swoveland):          3.860000   0.060000   3.920000 (  4.040774)
split (Swoveland+):         4.240000   0.040000   4.280000 (  4.537570)
split (Lilue):              7.540000   0.090000   7.630000 (  8.022252)

In order to keep this post from being far too long, I've included the complete code for this test at https://gist.github.com/mgaskill/96f04e7e1f72a86446f4939ac690759a

为了避免这篇文章太长，我在https://gist. github.com/mgaskill/96f04e7e72a86446f4939ac690759a上包含了这个测试的完整代码

The robustness test cases can be found in the first table above. The Swoveland+ solution is:

健壮性测试用例可以在上面的第一个表中找到。Swoveland +解决方案是:

f,op,l = formula.split(/\b\s*([+\/*-])\s*/)
return [f.to_f, op, l.to_f]

This includes a \b metacharacter prior to splitting on an operator ensures that the previous character is a word character, giving support for negative numbers in the second operand. The \s* metacharacter expressions support arbitrary spaces between operands and operator. These changes incur less than 10% performance overhead for the additional robustness.

这包括一个\b元字符在对运算符进行拆分之前，确保前一个字符是一个字字符，在第二个操作数中支持负数。\s* metacharacter表达式支持操作数和运算符之间的任意空格。这些更改为额外的健壮性带来不到10%的性能开销。

The solutions that I provided are here:

我提供的解决方案如下:

def match_gaskill(formula)
  return [] unless (match = formula.match(/^\s*(-?\d+(?:\.\d+)?)\s*([+\/*-])\s*(-?\d+(?:\.\d+)?)\s*$/))
  return [match[1].to_f, match[2], match[3].to_f]
end

def scan_gaskill(formula)
  return [] unless (match = formula.scan(/^\s*(-?\d+(?:\.\d+)?)\s*([+*\/-])\s*(-?\d+(?:\.\d+)?)\s*$/))[0]
  return [match[0][0].to_f, match[0][1], match[0][2].to_f]
end

def split_gaskill(formula)
  match = formula.split(/(-?\d+(?:\.\d+)?)\s*([+\/*-])\s*(-?\d+(?:\.\d+)?)/)
  return [match[1].to_f, match[2], match[3].to_f]
end

The match and scan solutions are very similar, but perform significantly differently, which is very interesting, because they use the exact same regex to do the work. The split solution is slightly simpler, and only splits on the entire expression, capturing each operand and the operator, separately.

匹配和扫描解决方案非常相似，但是执行起来却非常不同，这非常有趣，因为它们使用完全相同的regex来完成工作。分割解决方案稍微简单一些，只对整个表达式进行分割，分别捕获每个操作数和操作符。

Note that none of the split solutions was able to properly identify failures. Adding this support requires additional parsing of the operands, which significantly increases the overhead of the solution, typically running about 3 times slower.

注意，没有一个分割的解决方案能够正确地识别失败。添加这种支持需要对操作数进行额外的解析，这大大增加了解决方案的开销，通常运行速度要慢3倍。

For both performance and robustness, match is the clear winner. If robustness isn't a concern, but performance is, use split. On the other hand, scan provided complete robustness, but was more than 50% slower than the equivalent match solution.

对于性能和健壮性，match是明显的赢家。如果健壮性不是问题，但是性能是问题，那么使用split。另一方面，扫描提供了完全的健壮性，但是比等效的匹配解决方案慢50%以上。

Also note that using an efficient way to extract the results from the solution into the result array is as important to performance as is the algorithm chosen. The technique of capturing the results array into multiple variables (used in Woveland) outperformed the map solutions dramatically. Early testing showed that the map extraction solution more than doubled the runtimes of even the highest-performing solutions, hence the exceptionally high runtime numbers for Lilue.

还要注意，使用一种有效的方法将结果从解决方案中提取到结果数组中，与所选择的算法一样重要。将结果数组捕获到多个变量(在Woveland中使用)的技术显著优于map解决方案。早期测试表明，即使是性能最好的解决方案，map提取解决方案的运行时间也要增加一倍以上，因此Lilue的运行时间非常高。

#5

R = /
    (?<=\d) # match a digit in a positive lookbehind
    [^\d\.] # match any character other than a digit or period
    /x      # free-spacing regex definition mode

def split_it(str)
  f,op,l = str.delete(' ').partition(R)
  [convert(f), op, convert(l)]
end

def convert(str)
  (str =~ /\./) ? str.to_f : str.to_i
end

split_it "1+12"
  #=> [1, "+", 12] 
split_it "3/ 5.2"
  #=> [3, "/", 5.2] 
split_it "-4.1 * 6"
  #=> [-4.1, "*", 6] 
split_it "-8/-2"
  #=> [-8, "/", -2]

The regex can of course be written in the conventional way:

regex当然可以按照常规方式编写:

R = /(?<=\d)[^\d\.]/

#1