I need to split Java Strings at any " character. The main thing is, the previous character to that may not be a backslash ( \ ).
我需要将Java字符串拆分为任何“字符。主要的是,前一个字符可能不是反斜杠(\)。
So these Strings would split like so:
所以这些字符串会像这样分裂:
asdnaoe"asduwd"adfdgb => asdnaoe, asduwd, adfgfb
addfgmmnp"fd asd\"das"fsfk => addfgmmnp, fd asd\"das, fsfk
Is there any easy way to achieve this using regular expressions? (I use RegEx because it is easiest for me, the coder. Also performance is not an issue...)
有没有简单的方法来使用正则表达式实现这一点? (我使用RegEx,因为它对我来说最简单,编码器。性能也不是问题...)
Thank you in advance.
先谢谢你。
I solved it like this:
我这样解决了:
private static String[] split(String s) {
char[] cs = s.toCharArray();
int n = 1;
for (int i = 0; i < cs.length; i++) {
if (cs[i] == '"') {
int sn = 0;
for (int j = i - 1; j >= 0; j--) {
if (cs[j] == '\\')
sn += 1;
else
break;
}
if (sn % 2 == 0)
n += 1;
}
}
String[] result = new String[n];
int lastBreakPos = 0;
int index = 0;
for (int i = 0; i < cs.length; i++) {
if (cs[i] == '"') {
int sn = 0;
for (int j = i - 1; j >= 0; j--) {
if (cs[j] == '\\')
sn += 1;
else
break;
}
if (sn % 2 == 0) {
char[] splitcs = new char[i - lastBreakPos];
System.arraycopy(cs, lastBreakPos, splitcs, 0, i - lastBreakPos);
lastBreakPos = i + 1;
result[index] = new StringBuilder().append(splitcs).toString();
index += 1;
}
}
}
char[] splitcs = new char[cs.length - (lastBreakPos + 1)];
System.arraycopy(cs, lastBreakPos, splitcs, 0, cs.length - (lastBreakPos + 1));
result[index] = new StringBuilder().append(splitcs).toString();
return result;
}
Anyways, thanks for all your great responses! (Oh, and despite this, I will be using either @biziclop's or @Alan Moore's version, as they 're shorter and probably more efficient! =)
无论如何,感谢您的所有好评! (哦,尽管如此,我将使用@ biziclop或@Alan Moore的版本,因为它们更短,可能更有效!=)
3 个解决方案
#1
4
Sure, just use
当然,只需使用
(?<!\\)"
Quick PowerShell test:
快速PowerShell测试:
PS> 'addfgmmnp"fd asd\"das"fsfk' -split '(?<!\\)"'
addfgmmnp
fd asd\"das
fsfk
However, this won't split on \\"
(an escaped backslash, followed by a normal quote [at least in most C-like languages' escaping rules]). You cannot really solve that in Java, though, as arbitrary-length lookbehind isn't supported:
但是,这不会拆分为\\“(转义为反斜杠,后跟正常引用[至少在大多数类似C语言的转义规则中])。但是,你无法在Java中真正解决这个问题,因为任意长度不支持lookbehind:
PS> 'addfgmmnp"fd asd\\"das"fsfk' -split '(?<!\\)"'
addfgmmnp
fd asd\\"das
fsfk
Usually you would expect a proper solution to split on the remaining "
because it isn't really escaped.
通常你会期望一个适当的解决方案来分裂剩下的“因为它并没有真正逃脱。
#2
2
You can solve this problem with a Java regex; just don't use split()
.
您可以使用Java正则表达式解决此问题;只是不要使用split()。
public static void main(String[] args) throws Exception
{
String[] strs = {
"asdnaoe\"asduwd\"adfdgb",
"addfgmmnp\"fd asd\\\"das\"fsfk"
};
for (String str : strs)
{
System.out.printf("%n%-28s=> %s%n", str, splitIt(str));
}
}
public static List<String> splitIt(String s)
{
ArrayList<String> result = new ArrayList<String>();
Matcher m = Pattern.compile("([^\"\\\\]|\\\\.)+").matcher(s);
while (m.find())
{
result.add(m.group());
}
return result;
}
output:
输出:
asdnaoe"asduwd"adfdgb => [asdnaoe, asduwd, adfdgb]
addfgmmnp"fd asd\"das"fsfk => [addfgmmnp, fd asd\"das, fsfk]
The core regex, [^"\\]|\\.
, consumes anything that's not a backslash or a quotation mark, or a backslash followed by anything--so \\\"
would be matched as an escaped backslash (\\
) followed by an escaped quote (\"
).
核心正则表达式[^“\\] | \\。,消耗任何不是反斜杠或引号,或反斜杠后跟任何东西 - 所以\\\”将匹配为转义反斜杠(\\)然后是转义报价(\“)。
#3
1
Just for reference, here's a non-regexp solution that handles escaping of \
as well. (In real life, this could be simplified, there's no real need for the START_NEW
state, but I tried to write it in a way that's easier to read.)
仅供参考,这里是一个非正则表达式解决方案,可以处理\的转义。 (在现实生活中,这可以简化,没有真正需要START_NEW状态,但我试图以更容易阅读的方式编写它。)
public class Splitter {
private enum State {
IN_TEXT, ESCAPING, START_NEW;
}
public static List<String> split( String source ) {
LinkedList<String> ret = new LinkedList<String>();
StringBuilder sb = new StringBuilder();
State state = State.START_NEW;
for( int i = 0; i < source.length(); i++ ) {
char next = source.charAt( i );
if( next == '\\' && state != State.ESCAPING ) {
state = State.ESCAPING;
} else if( next == '\\' && state == State.ESCAPING ) {
state = State.IN_TEXT;
} else if( next == '"' && state != State.ESCAPING ) {
ret.add( sb.toString() );
sb = new StringBuilder();
state = State.START_NEW;
} else {
state = State.IN_TEXT;
}
if( state != State.START_NEW ) {
sb.append( next );
}
}
ret.add( sb.toString() );
return ret;
}
}
#1
4
Sure, just use
当然,只需使用
(?<!\\)"
Quick PowerShell test:
快速PowerShell测试:
PS> 'addfgmmnp"fd asd\"das"fsfk' -split '(?<!\\)"'
addfgmmnp
fd asd\"das
fsfk
However, this won't split on \\"
(an escaped backslash, followed by a normal quote [at least in most C-like languages' escaping rules]). You cannot really solve that in Java, though, as arbitrary-length lookbehind isn't supported:
但是,这不会拆分为\\“(转义为反斜杠,后跟正常引用[至少在大多数类似C语言的转义规则中])。但是,你无法在Java中真正解决这个问题,因为任意长度不支持lookbehind:
PS> 'addfgmmnp"fd asd\\"das"fsfk' -split '(?<!\\)"'
addfgmmnp
fd asd\\"das
fsfk
Usually you would expect a proper solution to split on the remaining "
because it isn't really escaped.
通常你会期望一个适当的解决方案来分裂剩下的“因为它并没有真正逃脱。
#2
2
You can solve this problem with a Java regex; just don't use split()
.
您可以使用Java正则表达式解决此问题;只是不要使用split()。
public static void main(String[] args) throws Exception
{
String[] strs = {
"asdnaoe\"asduwd\"adfdgb",
"addfgmmnp\"fd asd\\\"das\"fsfk"
};
for (String str : strs)
{
System.out.printf("%n%-28s=> %s%n", str, splitIt(str));
}
}
public static List<String> splitIt(String s)
{
ArrayList<String> result = new ArrayList<String>();
Matcher m = Pattern.compile("([^\"\\\\]|\\\\.)+").matcher(s);
while (m.find())
{
result.add(m.group());
}
return result;
}
output:
输出:
asdnaoe"asduwd"adfdgb => [asdnaoe, asduwd, adfdgb]
addfgmmnp"fd asd\"das"fsfk => [addfgmmnp, fd asd\"das, fsfk]
The core regex, [^"\\]|\\.
, consumes anything that's not a backslash or a quotation mark, or a backslash followed by anything--so \\\"
would be matched as an escaped backslash (\\
) followed by an escaped quote (\"
).
核心正则表达式[^“\\] | \\。,消耗任何不是反斜杠或引号,或反斜杠后跟任何东西 - 所以\\\”将匹配为转义反斜杠(\\)然后是转义报价(\“)。
#3
1
Just for reference, here's a non-regexp solution that handles escaping of \
as well. (In real life, this could be simplified, there's no real need for the START_NEW
state, but I tried to write it in a way that's easier to read.)
仅供参考,这里是一个非正则表达式解决方案,可以处理\的转义。 (在现实生活中,这可以简化,没有真正需要START_NEW状态,但我试图以更容易阅读的方式编写它。)
public class Splitter {
private enum State {
IN_TEXT, ESCAPING, START_NEW;
}
public static List<String> split( String source ) {
LinkedList<String> ret = new LinkedList<String>();
StringBuilder sb = new StringBuilder();
State state = State.START_NEW;
for( int i = 0; i < source.length(); i++ ) {
char next = source.charAt( i );
if( next == '\\' && state != State.ESCAPING ) {
state = State.ESCAPING;
} else if( next == '\\' && state == State.ESCAPING ) {
state = State.IN_TEXT;
} else if( next == '"' && state != State.ESCAPING ) {
ret.add( sb.toString() );
sb = new StringBuilder();
state = State.START_NEW;
} else {
state = State.IN_TEXT;
}
if( state != State.START_NEW ) {
sb.append( next );
}
}
ret.add( sb.toString() );
return ret;
}
}