跳过第一次出现并在Java中分割字符串

时间:2021-04-08 16:55:52

I want to skip first occurrence if no of occurrence more than 4. For now I will get max of 5 number underscore occurrence. I need to produce the output A_B, C, D, E, F and I did using below code. I want better solution. Please check and let me know. Thanks in advance.

我想跳过第一次出现,如果没有发生超过4次。现在我将得到5个数字下划线的最大值。我需要生成A_B, C, D, E, F的输出,我使用了下面的代码。我想要更好的解决方案。请查一下,让我知道。提前谢谢。

String key = "A_B_C_D_E_F";
int occurance = StringUtils.countOccurrencesOf(key, "_");
System.out.println(occurance);
String[] keyValues = null;
if(occurance == 5){
    key = key.replaceFirst("_", "-");
    keyValues = StringUtils.tokenizeToStringArray(key, "_");
    keyValues[0] = replaceOnce(keyValues[0], "-", "_");
}else{
    keyValues = StringUtils.tokenizeToStringArray(key, "_");
}

for(String keyValue : keyValues){
    System.out.println(keyValue);
}

5 个解决方案

#1


1  

You can use this regex to split:

您可以使用这个regex来拆分:

String s = "A_B_C_D_E_F";
String[] list = s.split("(?<=_[A-Z])_");

Output:

输出:

[A_B, C, D, E, F]

[A_B, C, D, E, F]

The idea is to match only the _ who are preceded by "_[A-Z]", which effectively skips only the first one.

我们的想法是只匹配前面有“_[A-Z]”的_人,这实际上只跳过第一个。

If the strings you are considering have a different format between the "_", you have to replace [A-Z] by the appropriate regex

如果您正在考虑的字符串在“_”之间有不同的格式,您必须用适当的regex替换[a - z]

#2


2  

Well, it is relatively "simple":

嗯,它是相对“简单”的:

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?<!^[^_]*)_|_(?=(?:[^_]*_){0,3}[^_]*$)");
System.out.println(Arrays.toString(result));

Here a version with comments for better understanding that can also be used as is:

这里有一个版本,有更好的理解,也可以使用如下:

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?x)                  # enable embedded comments \n"
                            + "                    # first alternative splits on all but the first underscore \n"
                            + "(?<!                # next character should not be preceded by \n"
                            + "    ^[^_]*          #     only non-underscores since beginning of input \n"
                            + ")                   # so this matches only if there was an underscore before \n"
                            + "_                   # underscore \n"
                            + "|                   # alternatively split if an underscore is followed by at most three more underscores to match the less than five underscores case \n"
                            + "_                   # underscore \n"
                            + "(?=                 # preceding character must be followed by \n"
                            + "    (?:[^_]*_){0,3} #     at most three groups of non-underscores and an underscore \n"
                            + "    [^_]*$          #     only more non-underscores until end of line \n"
                            + ")");
System.out.println(Arrays.toString(result));

#3


0  

You can use this regex based on \G and instead of splitting use matching:

您可以使用这个基于\G的regex,而不是拆分使用匹配:

String str = "A_B_C_D_E_F";
Pattern p = Pattern.compile("(^[^_]*_[^_]+|\\G[^_]+)(?:_|$)");
Matcher m = p.matcher(str);
List<String> resultArr = new ArrayList<>();
while (m.find()) {
    resultArr.add( m.group(1) );
}
System.err.println(resultArr);

\G asserts position at the end of the previous match or the start of the string for the first match.

\G在前一个匹配的末尾或第一个匹配的字符串的开头断言位置。

Output:

输出:

[A_B, C, D, E, F]

RegEx Demo

RegEx演示

#4


0  

I would do it after the split.

离婚后我也会这么做。

public void test() {
    String key = "A_B_C_D_E_F";
    String[] parts = key.split("_");
    if (parts.length >= 5) {
        String[] newParts = new String[parts.length - 1];
        newParts[0] = parts[0] + "-" + parts[1];
        System.arraycopy(parts, 2, newParts, 1, parts.length - 2);
        parts = newParts;
    }
    System.out.println("parts = " + Arrays.toString(parts));
}

#5


0  

Although Java does not say that officially, you can use * and + in the lookbehind as they are implemented as limiting quantifiers: * as {0,0x7FFFFFFF} and + as {1,0x7FFFFFFF} (see Regex look-behind without obvious maximum length in Java). So, if your strings are not too long, you can use

虽然Java没有正式地说,但是您可以在lookbehind中使用*和+,因为它们是作为限制量词实现的:*作为{0,0x7fffff}, +作为{1,0x7FFFFFFF}(请参阅Regex查找,在Java中没有明显的最大长度)。所以,如果你的字符串不是太长,你可以使用

String key = "A_B_C_D";       // => [A, B, C, D]
//String key = "A_B_C_D_E_F"; // => [A_B, C, D, E, F]
String[] res = null;
if (key.split("_").length > 4) {
    res = key.split("(?<!^[^_]*)_");
} else {
    res = key.split("_");
}
System.out.println(Arrays.toString(res));

See the JAVA demo

查看演示JAVA

DISCLAIMER: Since this is an exploit of the current Java 8 regex engine, the code may break in the future when the bug is fixed in Java.

免责声明:由于这是当前Java 8 regex引擎的一个漏洞,所以当这个错误在Java中修复时,代码将来可能会崩溃。

#1


1  

You can use this regex to split:

您可以使用这个regex来拆分:

String s = "A_B_C_D_E_F";
String[] list = s.split("(?<=_[A-Z])_");

Output:

输出:

[A_B, C, D, E, F]

[A_B, C, D, E, F]

The idea is to match only the _ who are preceded by "_[A-Z]", which effectively skips only the first one.

我们的想法是只匹配前面有“_[A-Z]”的_人,这实际上只跳过第一个。

If the strings you are considering have a different format between the "_", you have to replace [A-Z] by the appropriate regex

如果您正在考虑的字符串在“_”之间有不同的格式,您必须用适当的regex替换[a - z]

#2


2  

Well, it is relatively "simple":

嗯,它是相对“简单”的:

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?<!^[^_]*)_|_(?=(?:[^_]*_){0,3}[^_]*$)");
System.out.println(Arrays.toString(result));

Here a version with comments for better understanding that can also be used as is:

这里有一个版本,有更好的理解,也可以使用如下:

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?x)                  # enable embedded comments \n"
                            + "                    # first alternative splits on all but the first underscore \n"
                            + "(?<!                # next character should not be preceded by \n"
                            + "    ^[^_]*          #     only non-underscores since beginning of input \n"
                            + ")                   # so this matches only if there was an underscore before \n"
                            + "_                   # underscore \n"
                            + "|                   # alternatively split if an underscore is followed by at most three more underscores to match the less than five underscores case \n"
                            + "_                   # underscore \n"
                            + "(?=                 # preceding character must be followed by \n"
                            + "    (?:[^_]*_){0,3} #     at most three groups of non-underscores and an underscore \n"
                            + "    [^_]*$          #     only more non-underscores until end of line \n"
                            + ")");
System.out.println(Arrays.toString(result));

#3


0  

You can use this regex based on \G and instead of splitting use matching:

您可以使用这个基于\G的regex,而不是拆分使用匹配:

String str = "A_B_C_D_E_F";
Pattern p = Pattern.compile("(^[^_]*_[^_]+|\\G[^_]+)(?:_|$)");
Matcher m = p.matcher(str);
List<String> resultArr = new ArrayList<>();
while (m.find()) {
    resultArr.add( m.group(1) );
}
System.err.println(resultArr);

\G asserts position at the end of the previous match or the start of the string for the first match.

\G在前一个匹配的末尾或第一个匹配的字符串的开头断言位置。

Output:

输出:

[A_B, C, D, E, F]

RegEx Demo

RegEx演示

#4


0  

I would do it after the split.

离婚后我也会这么做。

public void test() {
    String key = "A_B_C_D_E_F";
    String[] parts = key.split("_");
    if (parts.length >= 5) {
        String[] newParts = new String[parts.length - 1];
        newParts[0] = parts[0] + "-" + parts[1];
        System.arraycopy(parts, 2, newParts, 1, parts.length - 2);
        parts = newParts;
    }
    System.out.println("parts = " + Arrays.toString(parts));
}

#5


0  

Although Java does not say that officially, you can use * and + in the lookbehind as they are implemented as limiting quantifiers: * as {0,0x7FFFFFFF} and + as {1,0x7FFFFFFF} (see Regex look-behind without obvious maximum length in Java). So, if your strings are not too long, you can use

虽然Java没有正式地说,但是您可以在lookbehind中使用*和+,因为它们是作为限制量词实现的:*作为{0,0x7fffff}, +作为{1,0x7FFFFFFF}(请参阅Regex查找,在Java中没有明显的最大长度)。所以,如果你的字符串不是太长,你可以使用

String key = "A_B_C_D";       // => [A, B, C, D]
//String key = "A_B_C_D_E_F"; // => [A_B, C, D, E, F]
String[] res = null;
if (key.split("_").length > 4) {
    res = key.split("(?<!^[^_]*)_");
} else {
    res = key.split("_");
}
System.out.println(Arrays.toString(res));

See the JAVA demo

查看演示JAVA

DISCLAIMER: Since this is an exploit of the current Java 8 regex engine, the code may break in the future when the bug is fixed in Java.

免责声明:由于这是当前Java 8 regex引擎的一个漏洞,所以当这个错误在Java中修复时,代码将来可能会崩溃。