Java - 使用正则表达式和新行分割字符串

时间:2022-09-29 21:40:37

I have a file that I scan into my program and store in a String using this code:

我有一个文件,我扫描到我的程序并使用以下代码存储在一个字符串中:

    try {
        data= new Scanner(new File("file.csv")).useDelimiter("\\Z").next();
    } catch (FileNotFoundException e) {
        System.out.println("File not found");
    }

The file.csv looks something like this:

file.csv看起来像这样:

"RowA";"RowB"
55;56
57;58
59;60
61;62

Now, I'm trying to extract each number and put them in a String[] like so:

现在,我正在尝试提取每个数字并将它们放在String []中,如下所示:

    String[] number= data.split(";|\\r?\\n|\"|[a-zA-Z]");

When I print the code like so:

当我打印代码时:

    for(int i = 0; i < number.length; i++){
        System.out.println("Line: " + number[i]);
    }

I get the following output:

我得到以下输出:

Line: 
Line:  
Line:  
Line:  
Line:  
Line:  
Line: 
Line: 
Line: 
Line: 
Line: 
Line: 
Line: 
Line: 
Line: 55
Line: 56
Line: 57
Line: 58
Line: 59
Line: 60
Line: 61
Line: 62

Why are the first indexes in the array blank and how can I remove it?

为什么数组中的第一个索引是空白的,如何将其删除?

Thank you.

3 个解决方案

#1


2  

In this regex :

在这个正则表达式:

;|\r?\n|"|[a-zA-Z]
  • " matches the double quotes in the String ("RowA";"RowB")
  • “匹配字符串中的双引号(”RowA“;”RowB“)

  • [a-zA-Z] matches each character in "RowA";"RowB".
  • [a-zA-Z]匹配“RowA”;“RowB”中的每个字符。

Hence it is split at all those places and you get the blanks.

因此,它在所有这些地方分开,你得到了空白。

You can remove these parts if you don't need them, using:

如果您不需要,可以使用以下方法删除这些部件:

String[] number= data.split(";|\\r?\\n");

I can also see that you want only numbers in your data, and not " and characters. In that case, you can replace the " and characters by using replaceAll(), before you split it.

我还可以看到你只需要数据中的数字,而不是“和字符。在这种情况下,你可以在拆分之前使用replaceAll()替换”和字符。

data.replaceAll("\"|[a-zA-Z]",""); 

#2


0  

If you are sure that you want an element for every group of consecutive digits a quick and easy solutions would be:

如果您确定每个连续数字组都需要一个元素,那么快速简便的解决方案就是:

String[] number= data.split("([^0-9])+");

This will provide your expected output as long as every value you expect is an integer (no decimal separator) and that there is no digits anywhere else.

只要您期望的每个值都是整数(无小数点分隔符)并且其他地方没有数字,这将提供您的预期输出。

EDIT: If the first/last char of data is not a digit it will add one empty item at the start/end of the number array.

编辑:如果数据的第一个/最后一个字符不是数字,它将在数字数组的开头/结尾添加一个空项。

#3


0  

For a solution to your regex see the answer by @Hackerdarshi.

有关正则表达式的解决方案,请参阅@Hackerdarshi的答案。

However, I propose an alternative method, which is likely more efficient to parse the numbers.

但是,我提出了一种替代方法,可能更有效地解析数字。

Instead of reading the whole file into a String and then using Regex to parse the numbers, you can instead read the file line by line, split each line on ";" and then parse each number returned by the split:

您可以逐行读取文件,而不是将整个文件读入字符串,然后使用正则表达式来解析数字,而是将每行分开“;”然后解析拆分返回的每个数字:

List<Integer> numbers = new ArrayList<>();

File file = new File("file.csv");

try (FileReader fileReader = new FileReader(file);
     BufferedReader bufferedReader = new BufferedReader(fileReader)){
    bufferedReader.readLine();    

    String line;
    while ((line = bufferedReader.readLine()) != null) {
        for (final String number : line.split(";")) {
            numbers.add(Integer.parseInt(number));
        }
    }
} catch(final IOException e) {
    e.printStackTrace();
}

#1


2  

In this regex :

在这个正则表达式:

;|\r?\n|"|[a-zA-Z]
  • " matches the double quotes in the String ("RowA";"RowB")
  • “匹配字符串中的双引号(”RowA“;”RowB“)

  • [a-zA-Z] matches each character in "RowA";"RowB".
  • [a-zA-Z]匹配“RowA”;“RowB”中的每个字符。

Hence it is split at all those places and you get the blanks.

因此,它在所有这些地方分开,你得到了空白。

You can remove these parts if you don't need them, using:

如果您不需要,可以使用以下方法删除这些部件:

String[] number= data.split(";|\\r?\\n");

I can also see that you want only numbers in your data, and not " and characters. In that case, you can replace the " and characters by using replaceAll(), before you split it.

我还可以看到你只需要数据中的数字,而不是“和字符。在这种情况下,你可以在拆分之前使用replaceAll()替换”和字符。

data.replaceAll("\"|[a-zA-Z]",""); 

#2


0  

If you are sure that you want an element for every group of consecutive digits a quick and easy solutions would be:

如果您确定每个连续数字组都需要一个元素,那么快速简便的解决方案就是:

String[] number= data.split("([^0-9])+");

This will provide your expected output as long as every value you expect is an integer (no decimal separator) and that there is no digits anywhere else.

只要您期望的每个值都是整数(无小数点分隔符)并且其他地方没有数字,这将提供您的预期输出。

EDIT: If the first/last char of data is not a digit it will add one empty item at the start/end of the number array.

编辑:如果数据的第一个/最后一个字符不是数字,它将在数字数组的开头/结尾添加一个空项。

#3


0  

For a solution to your regex see the answer by @Hackerdarshi.

有关正则表达式的解决方案,请参阅@Hackerdarshi的答案。

However, I propose an alternative method, which is likely more efficient to parse the numbers.

但是,我提出了一种替代方法,可能更有效地解析数字。

Instead of reading the whole file into a String and then using Regex to parse the numbers, you can instead read the file line by line, split each line on ";" and then parse each number returned by the split:

您可以逐行读取文件,而不是将整个文件读入字符串,然后使用正则表达式来解析数字,而是将每行分开“;”然后解析拆分返回的每个数字:

List<Integer> numbers = new ArrayList<>();

File file = new File("file.csv");

try (FileReader fileReader = new FileReader(file);
     BufferedReader bufferedReader = new BufferedReader(fileReader)){
    bufferedReader.readLine();    

    String line;
    while ((line = bufferedReader.readLine()) != null) {
        for (final String number : line.split(";")) {
            numbers.add(Integer.parseInt(number));
        }
    }
} catch(final IOException e) {
    e.printStackTrace();
}