I am developing a web application which reads data from excel file (xlsx). I am using POI for reading excel sheet. The problem is when I try to read excel file, the server throws the following error:
我正在开发一个从excel文件(xlsx)读取数据的Web应用程序。我正在使用POI阅读excel表。问题是当我尝试读取excel文件时,服务器抛出以下错误:
The excel file I am trying to read has size of almost 80 MB. Any solution to this problem?
我试图读取的excel文件大小几乎为80 MB。任何解决这个问题的方法?
Actually user is uploading file and application after saving file to disk try to read file. The code snippet I am using for testing is:
实际上用户在将文件保存到磁盘后上传文件和应用程序尝试读取文件。我用于测试的代码片段是:
File savedFile = new File(file_path);
FileInputStream fis = null;
try {
fis = new FileInputStream(savedFile);
XSSFWorkbook xWorkbook = new XSSFWorkbook(fis);
XSSFSheet xSheet = xWorkbook.getSheetAt(5);
Iterator rows = xSheet.rowIterator();
while (rows.hasNext()) {
XSSFRow row = (XSSFRow) rows.next();
Iterator cells = row.cellIterator();
List data = new ArrayList();
while (cells.hasNext()) {
XSSFCell cell = (XSSFCell) cells.next();
System.out.println(cell.getStringCellValue());
data.add(cell);
}
}
} catch (IOException e) {
e.printStackTrace();
}
7 个解决方案
#1
3
One thing that'll make a small difference is when opening the file to start with. If you have a file, then pass that in! Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't!
有一点是产生一点点差异的是打开文件开始时。如果你有一个文件,那么传递它!使用InputStream需要将所有内容缓冲到内存中,从而占用空间。既然你不需要做那个缓冲,不要!
If you're running with the latest nightly builds of POI, then it's very easy. Your code becomes:
如果您使用最新的每晚POI版本运行,那么它非常简单。您的代码变为:
File file = new File(file_path);
OPCPackage opcPackage = OPCPackage.open(file);
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);
Otherwise, it's very similar:
否则,它非常相似:
File file = new File(file_path);
OPCPackage opcPackage = OPCPackage.open(file.getAbsolutePath());
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);
That'll free you a bit of memory, which might be enough. If it isn't, and if you can't increase your Java heap space enough to cope, then you'll have to stop using the XSSF UserModel.
那会让你有点内存,这可能已经足够了。如果不是,并且如果您无法增加足够的Java堆空间来应对,那么您将不得不停止使用XSSF UserModel。
In addition to the current, friendly UserModel that you've been using, POI also supports a lower level way to process files. This lower level way is harder to use, as you don't have the various helpers around that require the whole file in memory. However, it's much much more memory efficient, as you process the file in a streaming way. To get started, see the XSSF and SAX (Event API) How-To section on the POI website. Try that out, and also have a look at the various examples.
除了您一直使用的当前友好的UserModel之外,POI还支持较低级别的处理文件的方式。这种较低级别的方式更难以使用,因为您没有需要整个文件在内存中的各种帮助程序。但是,当您以流方式处理文件时,它的内存效率要高得多。要开始使用,请参阅POI网站上的XSSF和SAX(事件API)操作方法部分。试试看,并看看各种例子。
#2
2
You should probably change the settings of you JVM. Try to add -Xmx1024 -Xms1024
to the launcher.
您应该更改JVM的设置。尝试将-Xmx1024 -Xms1024添加到启动器。
#3
1
You could try to increase your Java heap size.
您可以尝试增加Java堆大小。
#4
1
I think you have to increase the size of the Heap. You can do it by editing the catalina.bat-file. Add -Xms1024m -Xmx1024m
to the CATALINA_OPTS
variable.
我认为你必须增加堆的大小。您可以通过编辑catalina.bat文件来完成。将-Xms1024m -Xmx1024m添加到CATALINA_OPTS变量。
- Xms = initial java heap size
- Xms =初始java堆大小
- Xmx = maximum java heap size
- Xmx =最大Java堆大小
EDIT: from Catalina.bat
编辑:来自Catalina.bat
rem CATALINA_OPTS (Optional) Java runtime options used when the "start",
rem "run" or "debug" command is executed.
rem Include here and not in JAVA_OPTS all options, that should
rem only be used by Tomcat itself, not by the stop process,
rem the version command etc.
rem Examples are heap size, GC logging, JMX ports etc.
#5
0
I have solved the problem through change in implementation. Actually firstly I was fetching all data from Excel file and data was being stored in ArrayList type. After that I was inserting data into DB and that was the real problem. Now I am not storing data at all. As I get one record from ResultSet, I insert it into DB immediately instead of storing it into arraylist. I know this one by one insertion is not a good approach but for time being I am using this approach. In future if I find better one, I definitely switch to that one. Thanks to all.
我通过改变实施来解决问题。实际上我首先从Excel文件中获取所有数据,并且数据存储在ArrayList类型中。之后我将数据插入数据库,这才是真正的问题。现在我根本不存储数据。当我从ResultSet获取一条记录时,我立即将其插入到DB中,而不是将其存储到arraylist中。我知道这个逐个插入不是一个好方法,但暂时我正在使用这种方法。将来,如果我找到更好的,我肯定会转向那个。谢谢大家。
#6
0
Improvement to your current approach could be to read around 100 lines (experiment with this figure to get optimum value) from excel and do a batch update in database. This will be more faster.
对当前方法的改进可能是从excel读取大约100行(试验此图以获得最佳值)并在数据库中进行批量更新。这会更快。
Also you can possibly perform some optimizations in your code, move the list creation out of outer loop (loop for reading row data)
您还可以在代码中执行一些优化,将列表创建移出外部循环(用于读取行数据的循环)
List data = new ArrayList();
List data = new ArrayList();
Read contents of all the cells present in a row in a string buffer (possibly delimited with "comma") and then add it to arraylist "data"
读取字符串缓冲区中一行中存在的所有单元格的内容(可能用“逗号”分隔),然后将其添加到arraylist“data”中
You are adding an object of type XSSFRow
to the arraylist. There is no point in storing the whole object of excel cell. Take out its contents and discard the object.
您正在向arraylist添加XSSFRow类型的对象。存储excel单元的整个对象没有意义。取出其内容并丢弃该对象。
Later before inserting the contents in to Database you can split the delimited cell contents and perform insertion.
稍后在将内容插入数据库之前,您可以拆分分隔的单元格内容并执行插入。
Hope this helps!
希望这可以帮助!
#7
-1
You better store them in file and try to load them in database at then end. This will avoid single insert a
您最好将它们存储在文件中,然后尝试将它们加载到数据库中。这将避免单插入a
#1
3
One thing that'll make a small difference is when opening the file to start with. If you have a file, then pass that in! Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't!
有一点是产生一点点差异的是打开文件开始时。如果你有一个文件,那么传递它!使用InputStream需要将所有内容缓冲到内存中,从而占用空间。既然你不需要做那个缓冲,不要!
If you're running with the latest nightly builds of POI, then it's very easy. Your code becomes:
如果您使用最新的每晚POI版本运行,那么它非常简单。您的代码变为:
File file = new File(file_path);
OPCPackage opcPackage = OPCPackage.open(file);
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);
Otherwise, it's very similar:
否则,它非常相似:
File file = new File(file_path);
OPCPackage opcPackage = OPCPackage.open(file.getAbsolutePath());
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);
That'll free you a bit of memory, which might be enough. If it isn't, and if you can't increase your Java heap space enough to cope, then you'll have to stop using the XSSF UserModel.
那会让你有点内存,这可能已经足够了。如果不是,并且如果您无法增加足够的Java堆空间来应对,那么您将不得不停止使用XSSF UserModel。
In addition to the current, friendly UserModel that you've been using, POI also supports a lower level way to process files. This lower level way is harder to use, as you don't have the various helpers around that require the whole file in memory. However, it's much much more memory efficient, as you process the file in a streaming way. To get started, see the XSSF and SAX (Event API) How-To section on the POI website. Try that out, and also have a look at the various examples.
除了您一直使用的当前友好的UserModel之外,POI还支持较低级别的处理文件的方式。这种较低级别的方式更难以使用,因为您没有需要整个文件在内存中的各种帮助程序。但是,当您以流方式处理文件时,它的内存效率要高得多。要开始使用,请参阅POI网站上的XSSF和SAX(事件API)操作方法部分。试试看,并看看各种例子。
#2
2
You should probably change the settings of you JVM. Try to add -Xmx1024 -Xms1024
to the launcher.
您应该更改JVM的设置。尝试将-Xmx1024 -Xms1024添加到启动器。
#3
1
You could try to increase your Java heap size.
您可以尝试增加Java堆大小。
#4
1
I think you have to increase the size of the Heap. You can do it by editing the catalina.bat-file. Add -Xms1024m -Xmx1024m
to the CATALINA_OPTS
variable.
我认为你必须增加堆的大小。您可以通过编辑catalina.bat文件来完成。将-Xms1024m -Xmx1024m添加到CATALINA_OPTS变量。
- Xms = initial java heap size
- Xms =初始java堆大小
- Xmx = maximum java heap size
- Xmx =最大Java堆大小
EDIT: from Catalina.bat
编辑:来自Catalina.bat
rem CATALINA_OPTS (Optional) Java runtime options used when the "start",
rem "run" or "debug" command is executed.
rem Include here and not in JAVA_OPTS all options, that should
rem only be used by Tomcat itself, not by the stop process,
rem the version command etc.
rem Examples are heap size, GC logging, JMX ports etc.
#5
0
I have solved the problem through change in implementation. Actually firstly I was fetching all data from Excel file and data was being stored in ArrayList type. After that I was inserting data into DB and that was the real problem. Now I am not storing data at all. As I get one record from ResultSet, I insert it into DB immediately instead of storing it into arraylist. I know this one by one insertion is not a good approach but for time being I am using this approach. In future if I find better one, I definitely switch to that one. Thanks to all.
我通过改变实施来解决问题。实际上我首先从Excel文件中获取所有数据,并且数据存储在ArrayList类型中。之后我将数据插入数据库,这才是真正的问题。现在我根本不存储数据。当我从ResultSet获取一条记录时,我立即将其插入到DB中,而不是将其存储到arraylist中。我知道这个逐个插入不是一个好方法,但暂时我正在使用这种方法。将来,如果我找到更好的,我肯定会转向那个。谢谢大家。
#6
0
Improvement to your current approach could be to read around 100 lines (experiment with this figure to get optimum value) from excel and do a batch update in database. This will be more faster.
对当前方法的改进可能是从excel读取大约100行(试验此图以获得最佳值)并在数据库中进行批量更新。这会更快。
Also you can possibly perform some optimizations in your code, move the list creation out of outer loop (loop for reading row data)
您还可以在代码中执行一些优化,将列表创建移出外部循环(用于读取行数据的循环)
List data = new ArrayList();
List data = new ArrayList();
Read contents of all the cells present in a row in a string buffer (possibly delimited with "comma") and then add it to arraylist "data"
读取字符串缓冲区中一行中存在的所有单元格的内容(可能用“逗号”分隔),然后将其添加到arraylist“data”中
You are adding an object of type XSSFRow
to the arraylist. There is no point in storing the whole object of excel cell. Take out its contents and discard the object.
您正在向arraylist添加XSSFRow类型的对象。存储excel单元的整个对象没有意义。取出其内容并丢弃该对象。
Later before inserting the contents in to Database you can split the delimited cell contents and perform insertion.
稍后在将内容插入数据库之前,您可以拆分分隔的单元格内容并执行插入。
Hope this helps!
希望这可以帮助!
#7
-1
You better store them in file and try to load them in database at then end. This will avoid single insert a
您最好将它们存储在文件中,然后尝试将它们加载到数据库中。这将避免单插入a