xlConnect R使用JVM内存

时间:2022-07-11 07:19:24

I'm running into a problem with JVM memory using XLConnect (Mirai Solutions) in R.

我在R中使用XLConnect(Mirai Solutions)遇到了JVM内存问题。

Data loads into R just fine using loadWorkbook or readWorksheetFromFile, but larger data (data frames about 3MB) get stuck while being written to the JVM during export with any of the export functions (writeNamedRegion, writeWorksheetToFile, etc.), and R stops responding.

使用loadWorkbook或readWorksheetFromFile将数据加载到R中就好了,但是在导出期间使用任何导出函数(writeNamedRegion,writeWorksheetToFile等)写入JVM时,更大的数据(大约3MB的数据帧)会被卡住,并且R停止响应。

I've reset the java parameters using options(java.parameters = "-Xmx1500m"), and this increased the size of the data frames I was able to export to Excel, but R still slows around 1MB and won't work around 3MB.

我使用选项(java.parameters =“ - Xmx1500m”)重置了java参数,这增加了我能够导出到Excel的数据帧的大小,但是R仍然减慢大约1MB并且不能在3MB左右工作。

I'm on a 64-bit Windows 7 system with 32-bit Office software and 32-bit Java on a machine with 8GB RAM. 3MB doesn't seem very big compared to the ~750 MB free memory in the JVM that is supposedly there at the beginning of export (checked with xlcMemoryReport).

我在64位Windows 7系统上使用32位Office软件和32位Java在具有8GB RAM的计算机上。与JVM中的~750 MB可用内存相比,3MB似乎不是很大,据说在导出开始时(使用xlcMemoryReport检查)。

Ideas?

想法?

1 个解决方案

#1


1  

Given your reference value of 3MB I'm concluding you are trying to write a data.frame with numeric variables of dimension 10 columns x 40k rows (or comparable; the object.size of such a data.frame results in approx. 3.2MB).

鉴于你的参考值为3MB我得出结论你正在尝试编写一个data.frame,其数值变量为10列x 40k行(或者相当;这种data.frame的object.size大约为3.2MB) 。

Depending on if you are trying to write xls (BIFF8) or xlsx (OOXML) files, memory requirements can be quite different. Reason being that xlsx documents are actually compressed XML files and Apache POI (which is the underlying Java API that is used by XLConnect) uses xmlbeans to manipulate those - this can be quite memory intense. BIFF8 on the other hand is a binary data format and requires less memory.

根据您是否尝试编写xls(BIFF8)或xlsx(OOXML)文件,内存要求可能会有很大差异。原因是xlsx文档实际上是压缩的XML文件,而Apache POI(XLConnect使用的底层Java API)使用xmlbeans来操作它们 - 这可能会非常紧张。另一方面,BIFF8是二进制数据格式,需要的内存较少。

You should be able to write a data.frame of before mentioned dimensions to an xlsx document with a max. heap size of 1024m, i.e. the following worked fine for me:

您应该能够将前面提到的维度的data.frame写入带有max的xlsx文档。堆大小为1024米,即以下对我来说没问题:

options(java.parameters = "-Xmx1024m") # required BEFORE any JVM is initialized in R
require(XLConnect)
tmp = as.data.frame(matrix(rnorm(4e5), ncol = 10))
writeWorksheetToFile(tmp, file = "test.xlsx", sheet = "test")

... using R 2.15.1 32-bit with RStudio, XLConnect 0.2-0 and JRE 1.6.0_25 (running on 32-bit Windows XP with 4GB of RAM).

...使用带有RStudio的R 2.15.1 32位,XLConnect 0.2-0和JRE 1.6.0_25(在具有4GB RAM的32位Windows XP上运行)。

For those interested in a more in-depth discussion of memory usage on the Apache POI side there is the following discussion: http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html

对于那些对Apache POI方面内存使用情况进行更深入讨论感兴趣的人,有以下讨论:http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-一些号码,td4312784.html

#1


1  

Given your reference value of 3MB I'm concluding you are trying to write a data.frame with numeric variables of dimension 10 columns x 40k rows (or comparable; the object.size of such a data.frame results in approx. 3.2MB).

鉴于你的参考值为3MB我得出结论你正在尝试编写一个data.frame,其数值变量为10列x 40k行(或者相当;这种data.frame的object.size大约为3.2MB) 。

Depending on if you are trying to write xls (BIFF8) or xlsx (OOXML) files, memory requirements can be quite different. Reason being that xlsx documents are actually compressed XML files and Apache POI (which is the underlying Java API that is used by XLConnect) uses xmlbeans to manipulate those - this can be quite memory intense. BIFF8 on the other hand is a binary data format and requires less memory.

根据您是否尝试编写xls(BIFF8)或xlsx(OOXML)文件,内存要求可能会有很大差异。原因是xlsx文档实际上是压缩的XML文件,而Apache POI(XLConnect使用的底层Java API)使用xmlbeans来操作它们 - 这可能会非常紧张。另一方面,BIFF8是二进制数据格式,需要的内存较少。

You should be able to write a data.frame of before mentioned dimensions to an xlsx document with a max. heap size of 1024m, i.e. the following worked fine for me:

您应该能够将前面提到的维度的data.frame写入带有max的xlsx文档。堆大小为1024米,即以下对我来说没问题:

options(java.parameters = "-Xmx1024m") # required BEFORE any JVM is initialized in R
require(XLConnect)
tmp = as.data.frame(matrix(rnorm(4e5), ncol = 10))
writeWorksheetToFile(tmp, file = "test.xlsx", sheet = "test")

... using R 2.15.1 32-bit with RStudio, XLConnect 0.2-0 and JRE 1.6.0_25 (running on 32-bit Windows XP with 4GB of RAM).

...使用带有RStudio的R 2.15.1 32位,XLConnect 0.2-0和JRE 1.6.0_25(在具有4GB RAM的32位Windows XP上运行)。

For those interested in a more in-depth discussion of memory usage on the Apache POI side there is the following discussion: http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html

对于那些对Apache POI方面内存使用情况进行更深入讨论感兴趣的人,有以下讨论:http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-一些号码,td4312784.html