Context :
I entered a expression 3.24 * 10^10 + 1
into a calculator that I made. My calculator's approach to solve this is - it first looks for pattern number_a^number_b
, parses the 2 numbers into double using Double.parseDouble()
method, then performs Math.pow(number_a, number_b)
and replaces the expression with the result.
我在一台计算器中输入了一个3.24 * 10 ^ 10 + 1的表达式。我的计算器解决这个问题的方法是 - 它首先查找模式number_a ^ number_b,使用Double.parseDouble()方法将2个数字解析成double,然后执行Math.pow(number_a,number_b)并用结果替换表达式。
The calculator, then, similarly looks for pattern number_a * number_b
and parses it. So far our expression becomes 3.24E10 + 1
. Now comes the tricky part. When I programmed this calculator I did it under consideration that calculator should find the pattern number_a + number_b
and parse it. My calculator indeed does this and returns the result as, unexpectedly but justifiably - 3.24E11.0
.
然后,计算器类似地查找模式number_a * number_b并解析它。到目前为止,我们的表达式变为3.24E10 + 1.现在是棘手的部分。当我编写这个计算器时,我考虑到计算器应该找到模式number_a + number_b并解析它。我的计算器确实做到了这一点并且意外但又合理地返回结果 - 3.24E11.0。
I am looking for workaround to make my calculator smart enough to take care of such expressions.
我正在寻找解决方法,使我的计算器足够聪明,以处理这样的表达。
Important information - Regex example = ([\\d\\.]+)\\*([\\d\\.]+)
重要信息 - 正则表达式示例=([\\ d \\。] +)\\ *([\\ d \\。] +)
Code example -
代码示例 -
// here 'expression' is a StringBuilder type
// only a (modified) snippet of actual code.
Matcher m = Pattern.compile ("([\\d\\.]+)\\^([\\d\\.]+)")
.matcher (expression.toString());
while (m.find()) {
Double d1 = Double.parseDouble(m.group(1));
Double d2 = Double.parseDouble(m.group(2));
Double d3 = Math.pow(d1, d2);
expression.replace(m.start(), m.end(), Double.toString(d3));
m.reset(expression);
}
PS : Many people seem to think, based on how I presented the question, that my calculator is a failed attempt as regex won't take me too far. Ofcourse, I agree that is true and there may exist far better algorithms. I just want to make clear that :-
PS:根据我提出的问题,许多人似乎认为我的计算器尝试失败,因为正则表达式不会让我走得太远。当然,我同意这是真的,并且可能存在更好的算法。我只想说清楚: -
1) Regex is only used for parsing expressions in direct form. I don't use regex for everything. Nested brackets are solved using recursion. Regex only comes to play at the last step when all the processing work has been done and what remains is only simple calculation.
1)Regex仅用于以直接形式解析表达式。我不使用正则表达式。使用递归解决嵌套括号。只有在完成所有处理工作并且剩下的只是简单的计算时,正则表达式才会在最后一步发挥作用。
2) My calculator works fine. It can and does solve nested expressions gracefully. Proof - 2^3*2/4+1 --> 5.0
, sin(cos(1.57) + tan(cos(1.57)) + 1.57) --> 0.9999996829318346
, ((3(2log(10))+1)+1)exp(0) --> 8.0
2)我的计算器工作正常。它可以并且确实可以优雅地解决嵌套表达式。证明 - 2 ^ 3 * 2/4 + 1 - > 5.0,sin(cos(1.57)+ tan(cos(1.57))+ 1.57) - > 0.9999996829318346,((3(2log(10))+ 1) +1)exp(0) - > 8.0
3) Does not use too many 'crutches'. If you are of an opinion that I have written thousands of line of code to obtain the desired functionality. No. 200 lines and that's it. And I have no intention of dumping my application (which is near completion).
3)不要使用太多“拐杖”。如果您认为我已经编写了数千行代码来获得所需的功能。 200号线,就是这样。而且我无意倾销我的申请(即将完成)。
2 个解决方案
#1
1
According to your comment, by changing the regex from this:
根据你的评论,通过更改正则表达式:
([\\d\\.]+)\\*([\\d\\.]+)
to this works:
这工作:
(\\d+(\\.\\d+)?(e\\d+)?)\\^(\\d+(\\.\\d+)?(e\\d+)?)
To explain what I've changed: Before, you were allowed to enter numbers in the format:
解释我改变了什么:之前,你被允许以下列格式输入数字:
1
.5
.......
.3.76
- and so on
等等
To overcome this: I added an optional decimal place ((\\.\\d+)?
), which allows integers, but also decimals.
为了克服这个问题:我添加了一个可选的小数位((\\。\\ d +)?),它允许整数,但也允许小数。
Also by adding an optional scientific notation ( (e\\d+)?
) on both sides, allows the numbers to be written:
另外,通过在两侧添加可选的科学记数法((e \\ d +)?),允许写入数字:
- As integers (
2 ^ 5
) - As decimals (
2.3 ^ 5.7
) - And as scientific (
2.345e2 ^ 5e10
)
作为整数(2 ^ 5)
小数(2.3 ^ 5.7)
和科学(2.345e2 ^ 5e10)
You can of course mix all variants up.
您当然可以混合所有变体。
But keep in mind the comments below your question. Regex is for small bits maybe useful, but it can get pretty clumpy, slow and messed up, the bigger the equations get.
但请记住您的问题下方的评论。正则表达式对于小位可能是有用的,但它可以变得非常结块,缓慢和混乱,方程式得到的越大。
Also if you want to support negative numbers, you can add optional hyphens in front of the bases and the exponents:
此外,如果您想支持负数,可以在基数和指数前添加可选连字符:
(-?\\d+(\\.\\d+)?(e-?\\d+)?)\\^(-?\\d+(\\.\\d+)?(e-?\\d+)?)
#2
2
if you could provide me a justification for why the regex is not a good fit
如果你能为我提供正则表达式不合适的理由
-
A true regular expression cannot properly parse nested / balanced brackets. (OK, it is possible to use advanced regex features to do it, but the result is hellishly difficult to understand1.)
真正的正则表达式无法正确解析嵌套/平衡括号。 (好的,可以使用高级正则表达式功能来完成它,但结果很难理解1。)
-
A true regular expression will have difficulty analyzing an expression with operators that have different precedence. Especially with brackets. (I'm not sure if it is impossible, but it is certainly difficult.)
真正的正则表达式将难以使用具有不同优先级的运算符分析表达式。特别是带支架。 (我不确定这是不可能的,但肯定很难。)
-
Once you have used your regex(es) to match the expression, you then have the problem of sorting out the "groups" that you have matched into something that allows you to (correctly) evaluate the expression.
一旦你使用你的正则表达式来匹配表达式,你就会遇到将你匹配的“组”整理成允许你(正确)评估表达式的问题。
-
A regex cannot produce any explanation if the input is syntactically invalid.
如果输入在语法上无效,则正则表达式不能产生任何解释。
-
Complicated regexes are often pathologically expensive ... especially for large input strings that are incorrect.
复杂的正则表达式通常在路径上很昂贵......特别是对于不正确的大输入字符串。
what exactly do the other algorithms have that make them superior.
其他算法到底具备哪些优势呢?
A properly written or generated lexer + parse will have none of the above problems. You can either evaluate the expression on the fly, or you can turn it into a parse tree that can be evaluated repeatedly; e.g. with different values for variables.
正确编写或生成的词法分析器+解析将不会出现上述问题。您可以动态评估表达式,也可以将其转换为可以重复计算的解析树;例如具有不同的变量值。
The shunting-yard algorithm (while of more limited application) also has none of the above problems.
分流码算法(虽然应用较为有限)也没有上述问题。
This is about picking the right tool for the job. And also about recognizing that regexes are NOT the right tool for every job.
这是关于为工作选择合适的工具。并且还认识到正则表达式不适合每项工作。
1 - If you want explore the rabbit warren of using regexes to parse nested structures, here is an entrance.
1 - 如果你想探索使用正则表达式来解析嵌套结构的兔子,这里有一个入口。
#1
1
According to your comment, by changing the regex from this:
根据你的评论,通过更改正则表达式:
([\\d\\.]+)\\*([\\d\\.]+)
to this works:
这工作:
(\\d+(\\.\\d+)?(e\\d+)?)\\^(\\d+(\\.\\d+)?(e\\d+)?)
To explain what I've changed: Before, you were allowed to enter numbers in the format:
解释我改变了什么:之前,你被允许以下列格式输入数字:
1
.5
.......
.3.76
- and so on
等等
To overcome this: I added an optional decimal place ((\\.\\d+)?
), which allows integers, but also decimals.
为了克服这个问题:我添加了一个可选的小数位((\\。\\ d +)?),它允许整数,但也允许小数。
Also by adding an optional scientific notation ( (e\\d+)?
) on both sides, allows the numbers to be written:
另外,通过在两侧添加可选的科学记数法((e \\ d +)?),允许写入数字:
- As integers (
2 ^ 5
) - As decimals (
2.3 ^ 5.7
) - And as scientific (
2.345e2 ^ 5e10
)
作为整数(2 ^ 5)
小数(2.3 ^ 5.7)
和科学(2.345e2 ^ 5e10)
You can of course mix all variants up.
您当然可以混合所有变体。
But keep in mind the comments below your question. Regex is for small bits maybe useful, but it can get pretty clumpy, slow and messed up, the bigger the equations get.
但请记住您的问题下方的评论。正则表达式对于小位可能是有用的,但它可以变得非常结块,缓慢和混乱,方程式得到的越大。
Also if you want to support negative numbers, you can add optional hyphens in front of the bases and the exponents:
此外,如果您想支持负数,可以在基数和指数前添加可选连字符:
(-?\\d+(\\.\\d+)?(e-?\\d+)?)\\^(-?\\d+(\\.\\d+)?(e-?\\d+)?)
#2
2
if you could provide me a justification for why the regex is not a good fit
如果你能为我提供正则表达式不合适的理由
-
A true regular expression cannot properly parse nested / balanced brackets. (OK, it is possible to use advanced regex features to do it, but the result is hellishly difficult to understand1.)
真正的正则表达式无法正确解析嵌套/平衡括号。 (好的,可以使用高级正则表达式功能来完成它,但结果很难理解1。)
-
A true regular expression will have difficulty analyzing an expression with operators that have different precedence. Especially with brackets. (I'm not sure if it is impossible, but it is certainly difficult.)
真正的正则表达式将难以使用具有不同优先级的运算符分析表达式。特别是带支架。 (我不确定这是不可能的,但肯定很难。)
-
Once you have used your regex(es) to match the expression, you then have the problem of sorting out the "groups" that you have matched into something that allows you to (correctly) evaluate the expression.
一旦你使用你的正则表达式来匹配表达式,你就会遇到将你匹配的“组”整理成允许你(正确)评估表达式的问题。
-
A regex cannot produce any explanation if the input is syntactically invalid.
如果输入在语法上无效,则正则表达式不能产生任何解释。
-
Complicated regexes are often pathologically expensive ... especially for large input strings that are incorrect.
复杂的正则表达式通常在路径上很昂贵......特别是对于不正确的大输入字符串。
what exactly do the other algorithms have that make them superior.
其他算法到底具备哪些优势呢?
A properly written or generated lexer + parse will have none of the above problems. You can either evaluate the expression on the fly, or you can turn it into a parse tree that can be evaluated repeatedly; e.g. with different values for variables.
正确编写或生成的词法分析器+解析将不会出现上述问题。您可以动态评估表达式,也可以将其转换为可以重复计算的解析树;例如具有不同的变量值。
The shunting-yard algorithm (while of more limited application) also has none of the above problems.
分流码算法(虽然应用较为有限)也没有上述问题。
This is about picking the right tool for the job. And also about recognizing that regexes are NOT the right tool for every job.
这是关于为工作选择合适的工具。并且还认识到正则表达式不适合每项工作。
1 - If you want explore the rabbit warren of using regexes to parse nested structures, here is an entrance.
1 - 如果你想探索使用正则表达式来解析嵌套结构的兔子,这里有一个入口。