I'm trying to work out a way of splitting up a string in java that follows a pattern like so:
我正试图找到一种方法来分割java中的字符串,它遵循如下模式:
String a = "123abc345def";
The results from this should be the following:
这方面的结果应如下:
x[0] = "123";
x[1] = "abc";
x[2] = "345";
x[3] = "def";
However I'm completely stumped as to how I can achieve this. Please can someone help me out? I have tried searching online for a similar problem, however it's very difficult to phrase it correctly in a search.
然而,我完全不知道如何才能做到这一点。有人能帮我一下吗?我尝试过在网上搜索类似的问题,但是很难在搜索中正确表达。
Please note: The number of letters & numbers may vary (e.g. There could be a string like so '1234a5bcdef')
请注意:字母和数字的数量可能不同(例如,可能有一个字符串,如'1234a5bcdef')
7 个解决方案
#1
77
You could try to split on (?<=\D)(?=\d)|(?<=\d)(?=\D)
, like:
你可以试图分裂(? < = \ D)(? = \ D)|(? < = \ D)(? = \ D),如:
str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
It matches positions between a number and not-a-number (in any order).
它匹配数字和非a数字之间的位置(以任何顺序)。
#2
9
How about:
如何:
private List<String> Parse(String str) {
List<String> output = new ArrayList<String>();
Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
while (match.find()) {
output.add(match.group());
}
return output;
}
#3
8
You can try this:
你可以试试这个:
Pattern p = Pattern.compile("[a-z]+|\\d+");
Matcher m = p.matcher("123abc345def");
ArrayList<String> allMatches = new ArrayList<>();
while (m.find()) {
allMatches.add(m.group());
}
The result (allMatches) will be:
结果(所有匹配项)为:
["123", "abc", "345", "def"]
#4
3
Use two different patterns: [0-9]*
and [a-zA-Z]*
and split twice by each of them.
使用两种不同的模式:[0-9]*和[a-zA-Z]*,每一种都分开两次。
#5
2
If you are looking for solution without using Java String
functionality (i.e. split
, match
, etc.) then the following should help:
如果您正在寻找解决方案而不使用Java字符串功能(例如分割、匹配等),那么以下内容应该会有所帮助:
List<String> splitString(String string) {
List<String> list = new ArrayList<String>();
String token = "";
char curr;
for (int e = 0; e < string.length() + 1; e++) {
if (e == 0)
curr = string.charAt(0);
else {
curr = string.charAt(--e);
}
if (isNumber(curr)) {
while (e < string.length() && isNumber(string.charAt(e))) {
token += string.charAt(e++);
}
list.add(token);
token = "";
} else {
while (e < string.length() && !isNumber(string.charAt(e))) {
token += string.charAt(e++);
}
list.add(token);
token = "";
}
}
return list;
}
boolean isNumber(char c) {
return c >= '0' && c <= '9';
}
This solution will split numbers and 'words', where 'words' are strings that don't contain numbers. However, if you like to have only 'words' containing English letters then you can easily modify it by adding more conditions (like isNumber
method call) depending on your requirements (for example you may wish to skip words that contain non English letters). Also note that the splitString
method returns ArrayList
which later can be converted to String
array.
这个解决方案将把数字和单词分开,单词是不包含数字的字符串。但是,如果您希望只有包含英文字母的“words”,那么您可以根据需要添加更多的条件(如isNumber方法调用)来修改它(例如,您可能希望跳过包含非英文字母的单词)。还要注意,splitString方法返回ArrayList,稍后可以将其转换为String数组。
#6
1
Didn't use Java for ages, so just some pseudo code, that should help get you started (faster for me than looking up everything :) ).
很久没有使用Java了,所以只有一些伪代码可以帮助您入门(对我来说,这比查找所有内容要快得多)。
string a = "123abc345def";
string[] result;
while(a.Length > 0)
{
string part;
if((part = a.Match(/\d+/)).Length) // match digits
;
else if((part = a.Match(/\a+/)).Length) // match letters
;
else
break; // something invalid - neither digit nor letter
result.append(part);
a = a.SubStr(part.Length - 1); // remove the part we've found
}
#7
1
I was doing this sort of thing for mission critical code. Like every fraction of a second counts because I need to process 180k entries in an unnoticeable amount of time. So I skipped the regex and split altogether and allowed for inline processing of each element (though adding them to an ArrayList<String>
would be fine). If you want to do this exact thing but need it to be something like 20x faster...
我为任务关键代码做过类似的事情。就像每一秒的分数一样,因为我需要在不明显的时间内处理180k项。因此,我跳过了regex,并完全拆分,并允许对每个元素进行内联处理(尽管将它们添加到ArrayList
void parseGroups(String text) {
int last = 0;
int state = 0;
for (int i = 0, s = text.length(); i < s; i++) {
switch (text.charAt(i)) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
if (state == 2) {
processElement(text.substring(last, i));
last = i;
}
state = 1;
break;
default:
if (state == 1) {
processElement(text.substring(last, i));
last = i;
}
state = 2;
break;
}
}
processElement(text.substring(last));
}
#1
77
You could try to split on (?<=\D)(?=\d)|(?<=\d)(?=\D)
, like:
你可以试图分裂(? < = \ D)(? = \ D)|(? < = \ D)(? = \ D),如:
str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
It matches positions between a number and not-a-number (in any order).
它匹配数字和非a数字之间的位置(以任何顺序)。
#2
9
How about:
如何:
private List<String> Parse(String str) {
List<String> output = new ArrayList<String>();
Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
while (match.find()) {
output.add(match.group());
}
return output;
}
#3
8
You can try this:
你可以试试这个:
Pattern p = Pattern.compile("[a-z]+|\\d+");
Matcher m = p.matcher("123abc345def");
ArrayList<String> allMatches = new ArrayList<>();
while (m.find()) {
allMatches.add(m.group());
}
The result (allMatches) will be:
结果(所有匹配项)为:
["123", "abc", "345", "def"]
#4
3
Use two different patterns: [0-9]*
and [a-zA-Z]*
and split twice by each of them.
使用两种不同的模式:[0-9]*和[a-zA-Z]*,每一种都分开两次。
#5
2
If you are looking for solution without using Java String
functionality (i.e. split
, match
, etc.) then the following should help:
如果您正在寻找解决方案而不使用Java字符串功能(例如分割、匹配等),那么以下内容应该会有所帮助:
List<String> splitString(String string) {
List<String> list = new ArrayList<String>();
String token = "";
char curr;
for (int e = 0; e < string.length() + 1; e++) {
if (e == 0)
curr = string.charAt(0);
else {
curr = string.charAt(--e);
}
if (isNumber(curr)) {
while (e < string.length() && isNumber(string.charAt(e))) {
token += string.charAt(e++);
}
list.add(token);
token = "";
} else {
while (e < string.length() && !isNumber(string.charAt(e))) {
token += string.charAt(e++);
}
list.add(token);
token = "";
}
}
return list;
}
boolean isNumber(char c) {
return c >= '0' && c <= '9';
}
This solution will split numbers and 'words', where 'words' are strings that don't contain numbers. However, if you like to have only 'words' containing English letters then you can easily modify it by adding more conditions (like isNumber
method call) depending on your requirements (for example you may wish to skip words that contain non English letters). Also note that the splitString
method returns ArrayList
which later can be converted to String
array.
这个解决方案将把数字和单词分开,单词是不包含数字的字符串。但是,如果您希望只有包含英文字母的“words”,那么您可以根据需要添加更多的条件(如isNumber方法调用)来修改它(例如,您可能希望跳过包含非英文字母的单词)。还要注意,splitString方法返回ArrayList,稍后可以将其转换为String数组。
#6
1
Didn't use Java for ages, so just some pseudo code, that should help get you started (faster for me than looking up everything :) ).
很久没有使用Java了,所以只有一些伪代码可以帮助您入门(对我来说,这比查找所有内容要快得多)。
string a = "123abc345def";
string[] result;
while(a.Length > 0)
{
string part;
if((part = a.Match(/\d+/)).Length) // match digits
;
else if((part = a.Match(/\a+/)).Length) // match letters
;
else
break; // something invalid - neither digit nor letter
result.append(part);
a = a.SubStr(part.Length - 1); // remove the part we've found
}
#7
1
I was doing this sort of thing for mission critical code. Like every fraction of a second counts because I need to process 180k entries in an unnoticeable amount of time. So I skipped the regex and split altogether and allowed for inline processing of each element (though adding them to an ArrayList<String>
would be fine). If you want to do this exact thing but need it to be something like 20x faster...
我为任务关键代码做过类似的事情。就像每一秒的分数一样,因为我需要在不明显的时间内处理180k项。因此,我跳过了regex,并完全拆分,并允许对每个元素进行内联处理(尽管将它们添加到ArrayList
void parseGroups(String text) {
int last = 0;
int state = 0;
for (int i = 0, s = text.length(); i < s; i++) {
switch (text.charAt(i)) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
if (state == 2) {
processElement(text.substring(last, i));
last = i;
}
state = 1;
break;
default:
if (state == 1) {
processElement(text.substring(last, i));
last = i;
}
state = 2;
break;
}
}
processElement(text.substring(last));
}