I would like to receive some suggestions regarding a little problem I am going to solve in Java.
我想收到一些关于我将用Java解决的小问题的建议。
I have a file consisting in this format:
我有一个包含以下格式的文件:
@
some text
some text
some text
@
some text
some text
some text
@
some text
some text
some text
...and so on.
...等等。
I would need to read the next chunk of this text file, then to create an InputStream object consting of the read chunk and to pass the InputStream object to a parser. I have to repeat these operations for every chunk in the text file. Each chunk is written between the lines starting with @. The problem is to parse each section between the @ tags using a parser which should read each chunk from an InputStream.
我需要读取此文本文件的下一个块,然后创建一个读取块的InputStream对象,并将InputStream对象传递给解析器。我必须为文本文件中的每个块重复这些操作。每个块都写在以@开头的行之间。问题是使用解析器解析@标签之间的每个部分,该解析器应该从InputStream读取每个块。
The text file could be big, so I would like to obtain good performance.
文本文件可能很大,所以我希望获得良好的性能。
How could I solve this problem?
我怎么能解决这个问题?
I have thought about doing something like this:
我想过做这样的事情:
FileReader fileReader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fileReader);
Scanner scanner = new Scanner(bufferedReader);
scanner.useDelimiter("@");
List<ParsedChunk> parsedChunks = new ArrayList<ParsedChunk>();
ChunkParser parser = new ChunkParser();
while(scanner.hasNext())
{
String text = scanner.next();
InputStream inputStream = new ByteArrayInputStream(text.getBytes("UTF-8"));
ParsedChunk parsedChunk = parser.parse(inputStream);
parsedChunks.add(parsedChunk);
inputStream.close();
}
scanner.close();
but I am not sure if it would be a good way to do it.
但我不确定这是否是一个好方法。
Thank you.
2 个解决方案
#1
0
If I have understood correctly. This is what you are trying to achieve. FYI you will need JAVA 7 to get the below code running
如果我理解正确的话。这就是你想要实现的目标。仅供参考,您需要JAVA 7才能运行以下代码
public static void main(String[] args) throws IOException {
List<String> allLines = Files.readAllLines(new File("d:/input.txt").toPath(), Charset.defaultCharset());
List<List<String>> chunks = getChunks(allLines);
//Now you have all te chunks and you can process them
}
private static List<List<String>> getChunks(List<String> allLines) {
List<List<String>> result = new ArrayList<List<String>>();
int i = 0;
int fromIndex = 1;
int toIndex = 0;
for(String line : allLines){
i++;
if(line.startsWith("****") && i != 1){ // To skip the first line and the check next delimiter
toIndex = i-1;
result.add(allLines.subList(fromIndex, toIndex));
fromIndex = i;
}
}
return result;
}
#2
0
didnt quite get the question but u could try using char at this moment as, storing all the character in char array & going thhrough a loop & condiional statement which breaks the string every time it encounters a'@'
并没有完全得到问题,但你可以尝试使用char此时,将所有字符存储在char数组中并通过循环和条件语句,每次遇到'@'时都会破坏字符串
#1
0
If I have understood correctly. This is what you are trying to achieve. FYI you will need JAVA 7 to get the below code running
如果我理解正确的话。这就是你想要实现的目标。仅供参考,您需要JAVA 7才能运行以下代码
public static void main(String[] args) throws IOException {
List<String> allLines = Files.readAllLines(new File("d:/input.txt").toPath(), Charset.defaultCharset());
List<List<String>> chunks = getChunks(allLines);
//Now you have all te chunks and you can process them
}
private static List<List<String>> getChunks(List<String> allLines) {
List<List<String>> result = new ArrayList<List<String>>();
int i = 0;
int fromIndex = 1;
int toIndex = 0;
for(String line : allLines){
i++;
if(line.startsWith("****") && i != 1){ // To skip the first line and the check next delimiter
toIndex = i-1;
result.add(allLines.subList(fromIndex, toIndex));
fromIndex = i;
}
}
return result;
}
#2
0
didnt quite get the question but u could try using char at this moment as, storing all the character in char array & going thhrough a loop & condiional statement which breaks the string every time it encounters a'@'
并没有完全得到问题,但你可以尝试使用char此时,将所有字符存储在char数组中并通过循环和条件语句,每次遇到'@'时都会破坏字符串