I need to split strings containing basic mathematical expressions, such as:"(a+b)*c"
or" (a - c) / d"
The delimiters are + - * / ( ) and space and i need them as an independent token. Basically the result should look like this:
我需要拆分包含基本数学表达式的字符串,例如:“(a + b)* c”或“(a - c)/ d”分隔符是+ - * /()和空格我需要它们作为一个独立的令牌。基本上结果应如下所示:
"("
"a"
"+"
"b"
")"
"*"
"c"
“(”“”“”+“”b“”)“”*“”c“
And for the second example:
而对于第二个例子:
" "
"("
"a"
" "
"-"
...
“ “ “(“ “一个” ” ” ”-” ...
I read a lot of questions about similar problems with less complex delimiters and the common answer was to use zero space positive lookahead and -behind.
Like this: (?<=X | ?=X)
And X represents the delimiters, but putting them in a class like this:[\\Q+-*()\\E/\\s]
does not work in the desired way.
So how do i have to format the delimiters to make the split work how i need it?
我阅读了很多关于类似问题的问题,这些问题的分隔符不太复杂,常见的答案是使用零空间正向前瞻和后方。像这样:(?<= X |?= X)并且X代表分隔符,但是将它们放在这样的类中:[\\ Q + - *()\\ E / \\ s]在所需的类中不起作用办法。那么我如何格式化分隔符以使分割工作如何我需要它?
---Update---
Word class characters and longer combinations should not be splitted.
Such as "ab" "c1" or "12".
Or in short, I need the same result as the StringTokenizer would have, give the parameters "-+*/() " and true.
---更新---不应拆分Word类字符和更长的组合。例如“ab”“c1”或“12”。或者简而言之,我需要与StringTokenizer相同的结果,给出参数“ - + * /()”和true。
4 个解决方案
#1
1
Try splitting your data using
尝试使用分割数据
yourString.split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)"));
I assume that problem you had was not in \\Q+-*()\\E
part but in (?<=X | ?=X)
<- it should be (?<=X)|(?=X)
since it should produce look-behind and look-ahead.
我假设你遇到的问题不在\\ Q + - *()\\ E部分但在(?<= X |?= X)< - 它应该是(?<= X)|(?= X)它应该产生后视和前瞻。
demo for "_a+(ab-c1__)+12_"
(BTW _
will be replaced with space in code. SO shows two spaces as one, so had to use __
to present them somehow)
“_a +(ab-c1 __)+ 12_”的演示(BTW _将被替换为代码中的空格.SO显示两个空格为一,所以必须使用__以某种方式呈现它们)
String[] tokens = " a+(ab-c1 )+12 "
.split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)");
for (String token : tokens)
System.out.println("\"" + token + "\"");
result
" "
"a"
"+"
"("
"ab"
"-"
"c1"
" "
" "
")"
"+"
"12"
" "
#2
1
It is one thing if you are doing this as student work, but in practice this is more of a job for a lexical analyzer and parser. In C, you would use lex
and yacc
or GNU flex
and bison
. In Java, you'd use ANTLR
or JavaCC
.
如果您将此作为学生工作,这是一回事,但在实践中,这对于词法分析器和解析器来说更像是一项工作。在C中,您将使用lex和yacc或GNU flex和bison。在Java中,您使用ANTLR或JavaCC。
But start by writing a BNF grammar for your expected input (usually called the input language).
但首先要为您的预期输入(通常称为输入语言)编写BNF语法。
#3
0
You can use the following regex:
您可以使用以下正则表达式:
\s*(?<=[()+*/a-z-])\s*
?<=
makes zero-witdh assertions, that is, they match, but won't include the matched expression in the group. The \s*
will take care of the trailing spaces.
?<=进行零问题断言,即它们匹配,但不包括组中匹配的表达式。 \ s *将处理尾随空格。
Code example:
String a = " (a - c) / d * x ";
String regex = "\\s*(?<=[()+*/a-z-])\\s*";
String[] split = a.split(regex);
System.out.println(Arrays.toString(split));
Output:
[ (, a, -, c, ), /, d, *, x]
#4
0
Try this instead:
试试这个:
[-+*()\\s]
Dashes have to come first or last in a character class in order to not represent a range. The rest of the characters need no escaping (presumably what you were trying to do with \\Q
and \\E
) because most characters are taken literally anyway in a character class.
破折号必须在字符类中排在第一位或最后一位才能表示范围。其余的角色不需要逃避(大概是你试图用\\ Q和\\ E),因为大多数角色无论如何都要在角色类中进行。
Also, I wasn't aware of the syntax, (?<=X|?=X)
. If it works, then great. But if it doesn't, try this equivalent expansion, whose syntax I know does work:
另外,我不知道语法,(?<= X |?= X)。如果它有效,那么很棒。但如果没有,请尝试这种等效的扩展,我知道它的语法有效:
(?:(?<=X)|(?=X))
#1
1
Try splitting your data using
尝试使用分割数据
yourString.split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)"));
I assume that problem you had was not in \\Q+-*()\\E
part but in (?<=X | ?=X)
<- it should be (?<=X)|(?=X)
since it should produce look-behind and look-ahead.
我假设你遇到的问题不在\\ Q + - *()\\ E部分但在(?<= X |?= X)< - 它应该是(?<= X)|(?= X)它应该产生后视和前瞻。
demo for "_a+(ab-c1__)+12_"
(BTW _
will be replaced with space in code. SO shows two spaces as one, so had to use __
to present them somehow)
“_a +(ab-c1 __)+ 12_”的演示(BTW _将被替换为代码中的空格.SO显示两个空格为一,所以必须使用__以某种方式呈现它们)
String[] tokens = " a+(ab-c1 )+12 "
.split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)");
for (String token : tokens)
System.out.println("\"" + token + "\"");
result
" "
"a"
"+"
"("
"ab"
"-"
"c1"
" "
" "
")"
"+"
"12"
" "
#2
1
It is one thing if you are doing this as student work, but in practice this is more of a job for a lexical analyzer and parser. In C, you would use lex
and yacc
or GNU flex
and bison
. In Java, you'd use ANTLR
or JavaCC
.
如果您将此作为学生工作,这是一回事,但在实践中,这对于词法分析器和解析器来说更像是一项工作。在C中,您将使用lex和yacc或GNU flex和bison。在Java中,您使用ANTLR或JavaCC。
But start by writing a BNF grammar for your expected input (usually called the input language).
但首先要为您的预期输入(通常称为输入语言)编写BNF语法。
#3
0
You can use the following regex:
您可以使用以下正则表达式:
\s*(?<=[()+*/a-z-])\s*
?<=
makes zero-witdh assertions, that is, they match, but won't include the matched expression in the group. The \s*
will take care of the trailing spaces.
?<=进行零问题断言,即它们匹配,但不包括组中匹配的表达式。 \ s *将处理尾随空格。
Code example:
String a = " (a - c) / d * x ";
String regex = "\\s*(?<=[()+*/a-z-])\\s*";
String[] split = a.split(regex);
System.out.println(Arrays.toString(split));
Output:
[ (, a, -, c, ), /, d, *, x]
#4
0
Try this instead:
试试这个:
[-+*()\\s]
Dashes have to come first or last in a character class in order to not represent a range. The rest of the characters need no escaping (presumably what you were trying to do with \\Q
and \\E
) because most characters are taken literally anyway in a character class.
破折号必须在字符类中排在第一位或最后一位才能表示范围。其余的角色不需要逃避(大概是你试图用\\ Q和\\ E),因为大多数角色无论如何都要在角色类中进行。
Also, I wasn't aware of the syntax, (?<=X|?=X)
. If it works, then great. But if it doesn't, try this equivalent expansion, whose syntax I know does work:
另外,我不知道语法,(?<= X |?= X)。如果它有效,那么很棒。但如果没有,请尝试这种等效的扩展,我知道它的语法有效:
(?:(?<=X)|(?=X))