I am looking to write to an excel (.xls MS Excel 2003 format) file programatically using Java. The excel output files may contain ~200,000 rows which I plan to split over number of sheets (64k rows per sheet, due to the excel limit).
我想写一篇excel(。xls MS Excel 2003格式)使用Java编程文件。excel输出文件可能包含大约20万行,我计划将这些行划分为多个表(由于excel的限制,每个表有64k行)。
I have tried using the apache POI APIs but it seems to be a memory hog due to the API object model. I am forced to add cells/sheets to the workbook object in memory and only once all data is added, I can write the workbook to a file! Here is a sample of how the apache recommends i write excel files using their API:
我尝试过使用apache POI API,但由于API对象模型,它似乎是内存占用。我不得不在内存中添加cell /sheets到workbook对象中,并且只有在添加了所有数据之后,我才能将工作簿写入文件!下面是apache建议我如何使用它们的API编写excel文件的示例:
Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");
//Create a row and put some cells in it
Row row = sheet.createRow((short)0);
// Create a cell and put a value in it.
Cell cell = row.createCell(0);
cell.setCellValue(1);
// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();
Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space".
显然,写~20k行(每行有10-20列)会给我带来可怕的“java.lang”。OutOfMemoryError:Java堆空间”。
I have tried increasing JVM initial heapsize and max heap size using Xms and Xmx parameters as Xms512m and Xmx1024. Still cant write more than 150k rows to the file.
我尝试使用Xms和Xmx参数作为Xms512m和Xmx1024来增加JVM初始堆大小和最大堆大小。仍然不能向文件写入超过150k行。
I am looking for a way to stream to an excel file instead of building the entire file in memory before writing it to disk which will hopefully save a lot of memory usage. Any alternative API or solutions would be appreciated, but I am restricted to usage of java. Thanks! :)
我正在寻找一种方法来流到excel文件中,而不是在将整个文件写入磁盘之前在内存中构建它,这有望节省大量内存使用。任何替代的API或解决方案都将受到重视,但我仅限于使用java。谢谢!:)
9 个解决方案
#1
6
All existing Java APIs try to build the whole document in RAM at once. Try to write an XML file which conforms to the new xslx file format instead. To get you started, I suggest to build a small file in the desired form in Excel and save it. Then open it and examine the structure and replace the parts you want.
所有现有的Java api都试图一次在RAM中构建整个文档。尝试编写一个符合新的xslx文件格式的XML文件。为了让您入门,我建议您在Excel中以所需的形式构建一个小文件并保存它。然后打开它,检查结构,替换你想要的部分。
Wikipedia has a good article about the overall format.
*有一篇关于整体格式的好文章。
#2
9
Try to use SXSSF workbook, thats great thing for huge xls documents, its build document and don't eat RAM at all, becase using nio
尝试使用SXSSF工作簿,这对大型xls文档、它的构建文档和不吃RAM都很重要,因为使用nio。
#3
4
I had to split my files into several excel files in order to overcome the heap space exception. I figured that around 5k rows with 22 columns was about it, so I just made my logic so that every 5k row I would end the file, start a new one and just numerate the files accordingly.
为了克服堆空间异常,我必须将我的文件分割成多个excel文件。我发现大约有5k行有22个列,所以我把我的逻辑设为每5k行,我就会结束文件,开始一个新的,然后相应的计算文件。
In the cases where I had 20k + rows to be written I would have 4+ different files representing the data.
在我有20k +行的情况下,我将有4+不同的文件表示数据。
#4
3
Have a look at the HSSF serializer from the cocoon project.
查看cocoon项目中的HSSF序列化器。
The HSSF serializer catches SAX events and creates a spreadsheet in the XLS format used by Microsoft Excel
HSSF序列化器捕获SAX事件并以Microsoft Excel使用的XLS格式创建电子表格
#5
2
There also is JExcelApi, but its uses more memory. i think you should create .csv file and open it in excel. it allows you to pass a lot of data, but you wont be able to do any "excel magic".
还有JExcelApi,但是它使用了更多的内存。我认为你应该创建。csv文件并在excel中打开它。它允许你传递大量数据,但你不能做任何“excel magic”。
#6
1
Consider using CSV format. This way you aren't limited by memory anymore --well, maybe only during prepopulating the data for CSV, but this can be done efficiently as well, for example querying subsets of rows from DB using for example LIMIT/OFFSET
and immediately write it to file instead of hauling the entire DB table contents into Java's memory before writing any line. The Excel limitation of the amount rows in one "sheet" will increase to about one million.
考虑使用CSV格式。这样你不限于内存了,也许只有在预填充为CSV数据,但是这样可以有效,例如查询的行子集使用例如限制从DB /抵消,并立即把它写文件而不是拖整个数据库表内容为Java编写任何线之前的记忆。Excel对一个“表”中的行数的限制将增加到100万。
That said, if the data is actually coming from a DB, then I would highly reconsider if Java is the right tool for this. Most decent DB's have an export-to-CSV function which can do this task undoubtely much more efficient. In case of for example MySQL, you can use the LOAD DATA INFILE
command for this.
也就是说,如果数据实际上来自于DB,那么我将强烈地重新考虑Java是否是合适的工具。大多数优秀的DB都具有导出到csv的功能,无疑可以更高效地完成这项任务。例如,MySQL,可以使用LOAD DATA INFILE命令来执行这个操作。
#7
1
We developed a java library for this purpose and currently it is available as open source project https://github.com/jbaliuka/x4j-analytic . We use it for operational reporting. We generate huge Excel files, ~200,000 should work without problems, Excel manages to open such files too. Our code uses POI to load template but generated content is streamed directly to file without XML or Object model layer in memory.
为此,我们开发了一个java库,目前可用作开源项目https://github.com/jbaliuka/x4j analytic。我们将其用于业务报告。我们生成了巨大的Excel文件,~200,000应该没有问题,Excel也设法打开这样的文件。我们的代码使用POI加载模板,但生成的内容直接流到文件中,而内存中没有XML或对象模型层。
#8
0
Is this memory issue happen when you insert data into cell, or when you perform data computation/generation?
当您将数据插入计算单元或执行数据计算/生成时,是否会发生这种内存问题?
If you are going to load files into an excel that consist of predefined static template format, then better to save a template and reuse multiple time. Normally template cases happen when you are going to generate daily sales report or etc...
如果要将文件加载到由预定义的静态模板格式组成的excel中,那么最好保存一个模板并多次重用。通常模板案例发生在你准备生成每日销售报告或其他…
Else, every time you need to create new row, border, column etc from scratch.
否则,每次您需要从头创建新的行、边框、列等。
So far, Apache POI is the only choice I found.
到目前为止,Apache POI是我找到的唯一选择。
"Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space"."
显然,写~20k行(每行大约有10-20列)会给我带来可怕的“java.lang”。OutOfMemoryError:Java堆空间”。“
"Enterprise IT"
“企业”
What YOU CAN DO is- perform batch data insertion. Create a queuetask table, everytime after generate 1 page, rest for seconds, then continue second portion. If you are worry about the dynamic data changes during your queue task, you can first get the primary key into the excel (by hiding and lock the column from user view). First run will be insert primary key, then second queue run onwards will read out from notepad and do the task portion by portion.
您可以做的是——执行批处理数据插入。创建一个queuetask表,每次生成一个页面后,休息几秒钟,然后继续第二个部分。如果您担心队列任务期间的动态数据更改,您可以首先将主键放入excel中(通过隐藏和锁定用户视图中的列)。第一次运行将插入主键,然后第二个队列运行将从记事本中读出并按部分执行任务部分。
#9
0
We did something quite similar, same amount of data, and we had to switch to JExcelapi because POI is so heavy on resources. Try JexcelApi, you won't regret it when you have to manipulate big Excel-files!
我们做了一些非常相似的事情,同样的数据量,我们不得不切换到JExcelapi,因为POI对资源非常依赖。试试JexcelApi,当您不得不操作大型的优秀文件时,您不会后悔的!
#1
6
All existing Java APIs try to build the whole document in RAM at once. Try to write an XML file which conforms to the new xslx file format instead. To get you started, I suggest to build a small file in the desired form in Excel and save it. Then open it and examine the structure and replace the parts you want.
所有现有的Java api都试图一次在RAM中构建整个文档。尝试编写一个符合新的xslx文件格式的XML文件。为了让您入门,我建议您在Excel中以所需的形式构建一个小文件并保存它。然后打开它,检查结构,替换你想要的部分。
Wikipedia has a good article about the overall format.
*有一篇关于整体格式的好文章。
#2
9
Try to use SXSSF workbook, thats great thing for huge xls documents, its build document and don't eat RAM at all, becase using nio
尝试使用SXSSF工作簿,这对大型xls文档、它的构建文档和不吃RAM都很重要,因为使用nio。
#3
4
I had to split my files into several excel files in order to overcome the heap space exception. I figured that around 5k rows with 22 columns was about it, so I just made my logic so that every 5k row I would end the file, start a new one and just numerate the files accordingly.
为了克服堆空间异常,我必须将我的文件分割成多个excel文件。我发现大约有5k行有22个列,所以我把我的逻辑设为每5k行,我就会结束文件,开始一个新的,然后相应的计算文件。
In the cases where I had 20k + rows to be written I would have 4+ different files representing the data.
在我有20k +行的情况下,我将有4+不同的文件表示数据。
#4
3
Have a look at the HSSF serializer from the cocoon project.
查看cocoon项目中的HSSF序列化器。
The HSSF serializer catches SAX events and creates a spreadsheet in the XLS format used by Microsoft Excel
HSSF序列化器捕获SAX事件并以Microsoft Excel使用的XLS格式创建电子表格
#5
2
There also is JExcelApi, but its uses more memory. i think you should create .csv file and open it in excel. it allows you to pass a lot of data, but you wont be able to do any "excel magic".
还有JExcelApi,但是它使用了更多的内存。我认为你应该创建。csv文件并在excel中打开它。它允许你传递大量数据,但你不能做任何“excel magic”。
#6
1
Consider using CSV format. This way you aren't limited by memory anymore --well, maybe only during prepopulating the data for CSV, but this can be done efficiently as well, for example querying subsets of rows from DB using for example LIMIT/OFFSET
and immediately write it to file instead of hauling the entire DB table contents into Java's memory before writing any line. The Excel limitation of the amount rows in one "sheet" will increase to about one million.
考虑使用CSV格式。这样你不限于内存了,也许只有在预填充为CSV数据,但是这样可以有效,例如查询的行子集使用例如限制从DB /抵消,并立即把它写文件而不是拖整个数据库表内容为Java编写任何线之前的记忆。Excel对一个“表”中的行数的限制将增加到100万。
That said, if the data is actually coming from a DB, then I would highly reconsider if Java is the right tool for this. Most decent DB's have an export-to-CSV function which can do this task undoubtely much more efficient. In case of for example MySQL, you can use the LOAD DATA INFILE
command for this.
也就是说,如果数据实际上来自于DB,那么我将强烈地重新考虑Java是否是合适的工具。大多数优秀的DB都具有导出到csv的功能,无疑可以更高效地完成这项任务。例如,MySQL,可以使用LOAD DATA INFILE命令来执行这个操作。
#7
1
We developed a java library for this purpose and currently it is available as open source project https://github.com/jbaliuka/x4j-analytic . We use it for operational reporting. We generate huge Excel files, ~200,000 should work without problems, Excel manages to open such files too. Our code uses POI to load template but generated content is streamed directly to file without XML or Object model layer in memory.
为此,我们开发了一个java库,目前可用作开源项目https://github.com/jbaliuka/x4j analytic。我们将其用于业务报告。我们生成了巨大的Excel文件,~200,000应该没有问题,Excel也设法打开这样的文件。我们的代码使用POI加载模板,但生成的内容直接流到文件中,而内存中没有XML或对象模型层。
#8
0
Is this memory issue happen when you insert data into cell, or when you perform data computation/generation?
当您将数据插入计算单元或执行数据计算/生成时,是否会发生这种内存问题?
If you are going to load files into an excel that consist of predefined static template format, then better to save a template and reuse multiple time. Normally template cases happen when you are going to generate daily sales report or etc...
如果要将文件加载到由预定义的静态模板格式组成的excel中,那么最好保存一个模板并多次重用。通常模板案例发生在你准备生成每日销售报告或其他…
Else, every time you need to create new row, border, column etc from scratch.
否则,每次您需要从头创建新的行、边框、列等。
So far, Apache POI is the only choice I found.
到目前为止,Apache POI是我找到的唯一选择。
"Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space"."
显然,写~20k行(每行大约有10-20列)会给我带来可怕的“java.lang”。OutOfMemoryError:Java堆空间”。“
"Enterprise IT"
“企业”
What YOU CAN DO is- perform batch data insertion. Create a queuetask table, everytime after generate 1 page, rest for seconds, then continue second portion. If you are worry about the dynamic data changes during your queue task, you can first get the primary key into the excel (by hiding and lock the column from user view). First run will be insert primary key, then second queue run onwards will read out from notepad and do the task portion by portion.
您可以做的是——执行批处理数据插入。创建一个queuetask表,每次生成一个页面后,休息几秒钟,然后继续第二个部分。如果您担心队列任务期间的动态数据更改,您可以首先将主键放入excel中(通过隐藏和锁定用户视图中的列)。第一次运行将插入主键,然后第二个队列运行将从记事本中读出并按部分执行任务部分。
#9
0
We did something quite similar, same amount of data, and we had to switch to JExcelapi because POI is so heavy on resources. Try JexcelApi, you won't regret it when you have to manipulate big Excel-files!
我们做了一些非常相似的事情,同样的数据量,我们不得不切换到JExcelapi,因为POI对资源非常依赖。试试JexcelApi,当您不得不操作大型的优秀文件时,您不会后悔的!