此Java Scanner示例中的正确分隔符是什么?

时间:2022-09-23 07:55:34

I have a file with comma separated floating point numbers and CR LF at the end of each line. For testing, I'm using a short string as shown here:

我有一个逗号分隔浮点数的文件和每行末尾的CR LF。为了测试,我使用的是如下所示的短字符串:

    Scanner s = new Scanner("0.1,0.2,0.3,\r\n0.4,0.5,0.6");

    s.useDelimiter(",|,\\r\\n");

    while(s.hasNext())
        System.out.println(s.next());

What is a correct delimiter to produce exactly 6 numeric tokens? The one shown above produces 7 tokens, including an empty one:

什么是正确的分隔符才能生成6个数字标记?上面显示的是一个令牌,包括一个空令牌:

    0.1
    0.2
    0.3

    0.4
    0.5
    0.6

2 个解决方案

#1


2  

,|,\\r\\n means that option 1 is ,. If that isn't matched, try ,\\r\\n. This means that the second option will never match, because if the match starts with a comma, option 1 will have already matched it.

,|,\\ r \\ n表示选项1是,。如果不匹配,请尝试\\ r \\ n。这意味着第二个选项永远不会匹配,因为如果匹配以逗号开头,则选项1已经匹配。

Instead try ,\\r\\n|,, first trying to match the sequence that has more than just a comma ,\\r\\n. If that doesn't match, then try matching just the comma ,.

而是尝试\\ r \\ n | ,,首先尝试匹配不仅仅是逗号的序列,\\ r \\ n。如果那不匹配,那么尝试只匹配逗号,。

#2


1  

The simplest solution is probably ",\\s*". This will treat a comma, followed by any number of whitespace characters, as a delimiter. Since \\r and \\n are whitespace characters, it will work great on your input, and you don't have to worry about the order problems you get with |. After seeing a ,, the scanner will consume as many whitespace characters as it finds.

最简单的解决方案可能是“,\\ s *”。这将处理逗号,后跟任意数量的空白字符作为分隔符。由于\\ r和\\ n是空格字符,因此它可以很好地输入您的输入,而且您不必担心使用的顺序问题。看到之后,扫描仪会消耗尽可能多的空白字符。

This means that the pattern will also match some things your original pattern didn't, such as multiple consecutive line breaks, spaces, tabs, etc. In practice, this is probably what you want anyway. If not, then you can go with Moishe's answer or ",(\\r\\n)?" which matches , and then consumes one "\\r\\n" sequence if that comes next.

这意味着该模式也将匹配原始模式没有的一些东西,例如多个连续的换行符,空格,制表符等。实际上,这可能是你想要的。如果没有,那么你可以使用Moishe的答案或“,(\\ r \\ n)?”匹配,然后消耗一个“\\ r \\ n”序列,如果接下来。

#1


2  

,|,\\r\\n means that option 1 is ,. If that isn't matched, try ,\\r\\n. This means that the second option will never match, because if the match starts with a comma, option 1 will have already matched it.

,|,\\ r \\ n表示选项1是,。如果不匹配,请尝试\\ r \\ n。这意味着第二个选项永远不会匹配,因为如果匹配以逗号开头,则选项1已经匹配。

Instead try ,\\r\\n|,, first trying to match the sequence that has more than just a comma ,\\r\\n. If that doesn't match, then try matching just the comma ,.

而是尝试\\ r \\ n | ,,首先尝试匹配不仅仅是逗号的序列,\\ r \\ n。如果那不匹配,那么尝试只匹配逗号,。

#2


1  

The simplest solution is probably ",\\s*". This will treat a comma, followed by any number of whitespace characters, as a delimiter. Since \\r and \\n are whitespace characters, it will work great on your input, and you don't have to worry about the order problems you get with |. After seeing a ,, the scanner will consume as many whitespace characters as it finds.

最简单的解决方案可能是“,\\ s *”。这将处理逗号,后跟任意数量的空白字符作为分隔符。由于\\ r和\\ n是空格字符,因此它可以很好地输入您的输入,而且您不必担心使用的顺序问题。看到之后,扫描仪会消耗尽可能多的空白字符。

This means that the pattern will also match some things your original pattern didn't, such as multiple consecutive line breaks, spaces, tabs, etc. In practice, this is probably what you want anyway. If not, then you can go with Moishe's answer or ",(\\r\\n)?" which matches , and then consumes one "\\r\\n" sequence if that comes next.

这意味着该模式也将匹配原始模式没有的一些东西,例如多个连续的换行符,空格,制表符等。实际上,这可能是你想要的。如果没有,那么你可以使用Moishe的答案或“,(\\ r \\ n)?”匹配,然后消耗一个“\\ r \\ n”序列,如果接下来。