I have some code that looks more or less like this:
我有一些看起来或多或少像这样的代码:
while(scanner.hasNext())
{
if(scanner.findInLine("Test") !=null) {
//do some things
}else{
scanner.nextLine();
}
}
I am using this to parse an ~10MB text file. The problem is, if I put a breakpoint on the while() and the scanner.nextLine(), I can see that sometimes the scanners position (in the debug window) goes back to zero. I think this is causing me some kind of loop blow up, because the regex in findInLine() starts at zero, looks through some amount of text, advancing the position, and then it randomly gets set back to zero, so it has to re-parse all that text again.
我用它来解析一个~10MB的文本文件。问题是,如果我在while()和scanner.nextLine()上放置一个断点,我可以看到有时扫描仪的位置(在调试窗口中)会回到零。我认为这导致我某种循环爆炸,因为findInLine()中的正则表达式从零开始,查看一些文本,推进位置,然后随机设置回零,所以它必须重新 - 再次解析所有文本。
Any ideas what can be causing that? Am I even doing this the right way?
有什么想法会导致什么?我是以正确的方式做到这一点的吗?
Thanks
Some additional info:
一些额外的信息:
The Scanner is instantiated from an InputStream. After debugging, it appears that there is a HeapCharBuffer that Scanner uses and it only allows 1024 characters at a time, and then resets. Is there a way to avoid this, or do things differently? That seems like a small amount of characters to be able to scan.
Scanner是从InputStream实例化的。调试后,似乎有一个Scanner使用的HeapCharBuffer,它一次只允许1024个字符,然后重置。有没有办法避免这种情况,或者以不同的方式做事?这似乎是可以扫描的少量字符。
Derek
1 个解决方案
#1
4
You're mixing Scanner.hasNext()
and Scanner.nextLine()
. Don't do that; they handle tokenization differently.
你正在混合Scanner.hasNext()和Scanner.nextLine()。不要那样做;他们处理标记化的方式不同
Use hasNext()
with next()
or hasNextLine()
with nextLine()
将hasNext()与next()或hasNextLine()与nextLine()一起使用
#1
4
You're mixing Scanner.hasNext()
and Scanner.nextLine()
. Don't do that; they handle tokenization differently.
你正在混合Scanner.hasNext()和Scanner.nextLine()。不要那样做;他们处理标记化的方式不同
Use hasNext()
with next()
or hasNextLine()
with nextLine()
将hasNext()与next()或hasNextLine()与nextLine()一起使用