I had a requirement to parser very large excel files using apache poi with limit memory. After googling, I came to know that poi provides SAX parser to parser large file effectively without consuming lot of memory.
我需要使用带有限制内存的apache poi解析非常大的excel文件。谷歌搜索后,我发现poi提供SAX解析器有效解析大文件而不会消耗大量内存。
Apache POI SAX Parser示例
private class SheetToCSV implements SheetContentsHandler {
private boolean firstCellOfRow = false;
private int currentRow = -1;
private int currentCol = -1;
private void outputMissingRows(int number) {
for (int i=0; i<number; i++) {
for (int j=0; j<minColumns; j++) {
output.append(',');
}
output.append('\n');
}
}
@Override
public void startRow(int rowNum) {
// If there were gaps, output the missing rows
outputMissingRows(rowNum-currentRow-1);
// Prepare for this row
firstCellOfRow = true;
currentRow = rowNum;
currentCol = -1;
}
@Override
public void endRow(int rowNum) {
// Ensure the minimum number of columns
for (int i=currentCol; i<minColumns; i++) {
output.append(',');
}
output.append('\n');
}
@Override
public void cell(String cellReference, String formattedValue,
XSSFComment comment) {
if (firstCellOfRow) {
firstCellOfRow = false;
} else {
output.append(',');
}
// gracefully handle missing CellRef here in a similar way as XSSFCell does
if(cellReference == null) {
cellReference = new CellAddress(currentRow, currentCol).formatAsString();
}
// Did we miss any cells?
int thisCol = (new CellReference(cellReference)).getCol();
int missedCols = thisCol - currentCol - 1;
for (int i=0; i<missedCols; i++) {
output.append(',');
}
currentCol = thisCol;
// Number or string?
try {
Double.parseDouble(formattedValue);
output.append(formattedValue);
} catch (NumberFormatException e) {
output.append('"');
output.append(formattedValue);
output.append('"');
}
}
@Override
public void headerFooter(String text, boolean isHeader, String tagName) {
// Skip, no headers or footers in CSV
}
}
In the example provided in the above link, the method 'cell' has access only to formatted value however I need to access the actual value of the cell.
在上面链接中提供的示例中,方法“cell”只能访问格式化值,但我需要访问该单元格的实际值。
1 个解决方案
#1
2
The current implementation of the streaming interface does not provide this. So in order to achieve this you will need to copy the code of the underlying XSSFSheetXMLHandler and adjust it so that the cell-content is not formatted.
流接口的当前实现不提供此功能。因此,为了实现这一点,您需要复制底层XSSFSheetXMLHandler的代码并进行调整,以便不格式化单元格内容。
#1
2
The current implementation of the streaming interface does not provide this. So in order to achieve this you will need to copy the code of the underlying XSSFSheetXMLHandler and adjust it so that the cell-content is not formatted.
流接口的当前实现不提供此功能。因此,为了实现这一点,您需要复制底层XSSFSheetXMLHandler的代码并进行调整,以便不格式化单元格内容。