
时间:2022-08-20 20:21:24

I am working with Java Apache POI library and dealing with huge excel sheets. approx 10 mb of data with lots of rows and columns. There are also 8-10 different sheets in one excel file. The data is not in rich text format, but full of internal functions and formulas eg. = SUM(A2:A4) and so on which I don't have any concerns with.

我正在使用Java Apache POI库并处理大型excel表。大约10mb的数据,有很多行和列。在一个excel文件中还有8-10张不同的表格。数据不是丰富的文本格式,而是充满了内部函数和公式。= SUM(A2:A4)等等,我对此没有任何顾虑。

This image is just for illustration purpose. functions in actual data are way different and very complex:



The data includes Strings, Numbers and Boolean values. My concern is only make XSSF read values as normal text excluding all the formulas or functions that are applied in excel. So to say, in above image I only want to read values in rows and columns i.e. 10,20,30 etc, Numbers, Total

数据包括字符串、数字和布尔值。我关心的是只让XSSF读取值作为常规文本,而不包括excel中应用的所有公式或函数。也就是说,在上面的图像中,我只想读取行和列中的值比如10 20 30等等,数字,总数



If I format excel sheets and remove all formulas and functions and save data in simple rich text format, my code runs. However, when I don't modify excel files and keep data as shown in above format I run into GC overhead limit exceeded error.


What I want


I just want to read excel files full of formulas and functions just as they are. My algorithm works when I remove all the formulas and keep text in sheets as normal rich text format.


What I tried


As mentioned in other resources online and on *, I tried 1st approach as given in below code:


fis = new FileInputStream(path);
opc = OPCPackage.open(fis);  
XSSFWorkbook workbook = new XSSFWorkbook(opc);

Rather than using simply FileInputStream for input I first passed it through OPCPackage. Still it shows same error and code wont execute below XSSFWorkbook workbook


I then used 2nd approach with XSSFReader. Below is the code:


    xssfReader = new XSSFReader(opc);
    SharedStringsTable sst = xssfReader.getSharedStringsTable();
    XSSFReader.SheetIterator itr = (XSSFReader.SheetIterator)xssfReader.getSheetsData();                

    while(itr.hasNext()) {
            InputStream sheetStream = itr.next();
            if(itr.getSheetName().equals(sheetName)) {

              // no idea how to extract sheet like I would do in XSSFWorkbook
              // I only get Sheet name of desired sheet

    } // while ends here

Nothing so far works for me and if I use XSSFWorkbook, it will throw GC overhead limit exceeded error. So currently I am manually removing all formulas and functions and then algorithm works but its not efficient way to deal with the problem. Any help or suggestions are appreciated.




As pointed in link here I tried allocating more memory, but its still not working out. Below are some snapshots of me trying to allocate more memory.


错误:XSSFWorkbook中超过GC开销限制 错误:XSSFWorkbook中超过GC开销限制

If I am doing something wrong in allocating memory, let me know. I will do the needed change.


New Edit


I have solved my problem as mentioned in centic comment below by adding -Xmx8192m to my run configurations in eclipse. I am now looking into other ways of solving memory issue by using SXSSFWorkbook as already discussed in answer below.


2 个解决方案



Post comment as answer:


The memory settings you show are for Eclipse IDE and Java Webstart, how are you actually starting your application? If as application or unit test inside Eclipse, then you need to adjust memory settings in the run configuration instead to actually apply them when your own code is running.

您所显示的内存设置是针对Eclipse IDE和Java Webstart的,您实际是如何启动应用程序的?如果在Eclipse中作为应用程序或单元测试,那么您需要在运行配置中调整内存设置,而不是在您自己的代码运行时实际应用它们。



Have you tried opening the file as SXSSF workbook instead of a XSSF workbook?


fis = new FileInputStream(path);
opc = OPCPackage.open(fis); 
XSSFWorkbook workbook = new XSSFWorkbook(opc);
SXSSFWorkbook wb = new SXSSFWorkbook(workbook);

See https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html. Taken directy from their JavaDoc: "This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time"




Post comment as answer:


The memory settings you show are for Eclipse IDE and Java Webstart, how are you actually starting your application? If as application or unit test inside Eclipse, then you need to adjust memory settings in the run configuration instead to actually apply them when your own code is running.

您所显示的内存设置是针对Eclipse IDE和Java Webstart的,您实际是如何启动应用程序的?如果在Eclipse中作为应用程序或单元测试,那么您需要在运行配置中调整内存设置,而不是在您自己的代码运行时实际应用它们。



Have you tried opening the file as SXSSF workbook instead of a XSSF workbook?


fis = new FileInputStream(path);
opc = OPCPackage.open(fis); 
XSSFWorkbook workbook = new XSSFWorkbook(opc);
SXSSFWorkbook wb = new SXSSFWorkbook(workbook);

See https://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html. Taken directy from their JavaDoc: "This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time"
