I am working with Java Apache POI library and dealing with huge excel sheets. approx 10 mb of data with lots of rows and columns. There are also 8-10 different sheets in one excel file. The data is not in rich text format, but full of internal functions and formulas eg. = SUM(A2:A4)
and so on which I don't have any concerns with.
我正在使用Java Apache POI库并处理大型excel表。大约10mb的数据,有很多行和列。在一个excel文件中还有8-10张不同的表格。数据不是丰富的文本格式,而是充满了内部函数和公式。= SUM(A2:A4)等等,我对此没有任何顾虑。
This image is just for illustration purpose. functions in actual data are way different and very complex:
这张照片只是为了作插图。实际数据中的函数是非常不同和复杂的:
The data includes Strings, Numbers and Boolean values. My concern is only make XSSF read values as normal text excluding all the formulas or functions that are applied in excel. So to say, in above image I only want to read values in rows and columns i.e. 10,20,30 etc, Numbers, Total
数据包括字符串、数字和布尔值。我关心的是只让XSSF读取值作为常规文本,而不包括excel中应用的所有公式或函数。也就是说,在上面的图像中,我只想读取行和列中的值比如10 20 30等等,数字,总数
Problem
问题
If I format excel sheets and remove all formulas and functions and save data in simple rich text format, my code runs. However, when I don't modify excel files and keep data as shown in above format I run into GC overhead limit exceeded error.
如果我格式化excel表并删除所有公式和函数,并以简单的富文本格式保存数据,我的代码就会运行。但是,当我不修改excel文件并保持如上格式所示的数据时,我就会遇到超过GC开销限制的错误。
What I want
我想要的
I just want to read excel files full of formulas and functions just as they are. My algorithm works when I remove all the formulas and keep text in sheets as normal rich text format.
我只是想读excel文件,里面满是公式和函数。我的算法在删除所有公式并将文本保持为普通的富文本格式时有效。
What I tried
我试着什么
As mentioned in other resources online and on *, I tried 1st approach as given in below code:
正如其他在线资源和*上提到的,我尝试了第一种方法,如下代码所示:
fis = new FileInputStream(path);
opc = OPCPackage.open(fis);
XSSFWorkbook workbook = new XSSFWorkbook(opc);
Rather than using simply FileInputStream
for input I first passed it through OPCPackage. Still it shows same error and code wont execute below XSSFWorkbook workbook
而不是简单地使用FileInputStream作为输入,我首先通过OPCPackage传递它。仍然显示了相同的错误,代码不能在XSSFWorkbook下面执行
I then used 2nd approach with XSSFReader
. Below is the code:
然后我使用了XSSFReader的第二种方法。下面是代码:
xssfReader = new XSSFReader(opc);
SharedStringsTable sst = xssfReader.getSharedStringsTable();
XSSFReader.SheetIterator itr = (XSSFReader.SheetIterator)xssfReader.getSheetsData();
while(itr.hasNext()) {
InputStream sheetStream = itr.next();
if(itr.getSheetName().equals(sheetName)) {
// no idea how to extract sheet like I would do in XSSFWorkbook
// I only get Sheet name of desired sheet
} // while ends here
Nothing so far works for me and if I use XSSFWorkbook
, it will throw GC overhead limit exceeded error. So currently I am manually removing all formulas and functions and then algorithm works but its not efficient way to deal with the problem. Any help or suggestions are appreciated.
到目前为止,还没有任何东西适合我,如果我使用XSSFWorkbook,它将抛出GC开销限制超过错误。所以目前我正在手动删除所有的公式和函数,然后算法工作,但这不是解决问题的有效方法。如有任何帮助或建议,我们将不胜感激。
EDIT:
编辑:
As pointed in link here I tried allocating more memory, but its still not working out. Below are some snapshots of me trying to allocate more memory.
正如这里的链接所指出的,我尝试分配更多的内存,但它仍然不能工作。下面是一些我试图分配更多内存的快照。
If I am doing something wrong in allocating memory, let me know. I will do the needed change.
如果我在内存分配上做错了什么,请告诉我。我将做必要的改变。
New Edit
新编辑
I have solved my problem as mentioned in centic comment below by adding -Xmx8192m
to my run configurations in eclipse. I am now looking into other ways of solving memory issue by using SXSSFWorkbook
as already discussed in answer below.
我已经解决了在下面的centic注释中提到的问题,方法是向我在eclipse中的运行配置中添加-Xmx8192m。我现在正在研究使用SXSSFWorkbook来解决内存问题的其他方法,如下面的答案中已经讨论过了。
2 个解决方案
#1
1
Post comment as answer:
评论后回答:
The memory settings you show are for Eclipse IDE and Java Webstart, how are you actually starting your application? If as application or unit test inside Eclipse, then you need to adjust memory settings in the run configuration instead to actually apply them when your own code is running.
您所显示的内存设置是针对Eclipse IDE和Java Webstart的,您实际是如何启动应用程序的?如果在Eclipse中作为应用程序或单元测试,那么您需要在运行配置中调整内存设置,而不是在您自己的代码运行时实际应用它们。
#2
0
Have you tried opening the file as SXSSF workbook instead of a XSSF workbook?
您是否尝试过将文件打开为SXSSF工作簿而不是XSSF工作簿?
fis = new FileInputStream(path);
opc = OPCPackage.open(fis);
XSSFWorkbook workbook = new XSSFWorkbook(opc);
SXSSFWorkbook wb = new SXSSFWorkbook(workbook);
See https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html. Taken directy from their JavaDoc: "This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time"
见https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html。直接从他们的JavaDoc中获取:“这允许编写非常大的文件,而不会耗尽内存,因为任何时候只有可配置的行部分保存在内存中。”
#1
1
Post comment as answer:
评论后回答:
The memory settings you show are for Eclipse IDE and Java Webstart, how are you actually starting your application? If as application or unit test inside Eclipse, then you need to adjust memory settings in the run configuration instead to actually apply them when your own code is running.
您所显示的内存设置是针对Eclipse IDE和Java Webstart的,您实际是如何启动应用程序的?如果在Eclipse中作为应用程序或单元测试,那么您需要在运行配置中调整内存设置,而不是在您自己的代码运行时实际应用它们。
#2
0
Have you tried opening the file as SXSSF workbook instead of a XSSF workbook?
您是否尝试过将文件打开为SXSSF工作簿而不是XSSF工作簿?
fis = new FileInputStream(path);
opc = OPCPackage.open(fis);
XSSFWorkbook workbook = new XSSFWorkbook(opc);
SXSSFWorkbook wb = new SXSSFWorkbook(workbook);
See https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html. Taken directy from their JavaDoc: "This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time"
见https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html。直接从他们的JavaDoc中获取:“这允许编写非常大的文件,而不会耗尽内存,因为任何时候只有可配置的行部分保存在内存中。”