在R中读取一个pickle文件(熊猫Python数据帧)

时间:2022-04-27 20:23:01

Is there an easy way to read pickle files (.pkl) from Pandas Dataframe into R?

是否有一种简单的方法可以将从熊猫Dataframe到R中读取pickle文件(.pkl) ?

One possibility is to export to CSV and have R read the CSV but that seems really cumbersome for me because my dataframes are rather large. Is there an easier way to do so?

一种可能是导出到CSV,让R读取CSV,但这对我来说很麻烦,因为我的数据aframes非常大。有没有更简单的方法?

Thanks!

谢谢!

1 个解决方案

#1


6  

You could load the pickle in python and then export it to R via the python package rpy2 (or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJython and rPython for ways in which you could trigger the python commands from R.

您可以在python中加载pickle,然后通过python包rpy2(或类似的包)将其导出到R。一旦您这样做了,您的数据将存在于一个链接到python的R会话中。我怀疑您接下来要做的是使用会话调用文件或RAM磁盘的R和saveRDS。然后在RStudio中,您可以重新读取该文件。查看rJython和rPython的R包,了解如何从R触发python命令。

Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread in the R package data.table. Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE) and read.table.

或者,您可以编写一个简单的python脚本来在python中加载数据(可能使用上面提到的一个R包),并将格式化的数据流写入stdout。然后,对脚本的整个系统调用(包括指定pickle的参数)可以用作在R包data.table中fread的参数。或者,如果您希望保持标准函数,您可以使用system(…,实习生= TRUE)和read.table。

As usual, there are /many/ ways to skin this particular cat. The basic steps are:

和往常一样,有很多方法可以剥这只猫的皮。的基本步骤是:

  1. Load the data in python
  2. 在python中加载数据
  3. Express the data to R (e.g., exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)
  4. 将数据表示为R(例如,通过rpy2导出对象,或者将格式化文本写入stdout, R准备在另一端接收)
  5. Serialize the expressed data in R to an internal data representation (e.g., exporting the object via rpy2 or fread)
  6. 将R中的表示数据序列化为内部数据表示(例如,通过rpy2或fread导出对象)
  7. (optional) Make the data in that session of R accessible to another R session (i.e., the step to close the loop with rpy2, or if you've been using fread then you're already done).
  8. (可选)使该会话中的数据可由另一个会话访问(即,使用rpy2关闭循环的步骤,或者如果您已经使用了fread,那么您已经完成了)。

#1


6  

You could load the pickle in python and then export it to R via the python package rpy2 (or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJython and rPython for ways in which you could trigger the python commands from R.

您可以在python中加载pickle,然后通过python包rpy2(或类似的包)将其导出到R。一旦您这样做了,您的数据将存在于一个链接到python的R会话中。我怀疑您接下来要做的是使用会话调用文件或RAM磁盘的R和saveRDS。然后在RStudio中,您可以重新读取该文件。查看rJython和rPython的R包,了解如何从R触发python命令。

Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread in the R package data.table. Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE) and read.table.

或者,您可以编写一个简单的python脚本来在python中加载数据(可能使用上面提到的一个R包),并将格式化的数据流写入stdout。然后,对脚本的整个系统调用(包括指定pickle的参数)可以用作在R包data.table中fread的参数。或者,如果您希望保持标准函数,您可以使用system(…,实习生= TRUE)和read.table。

As usual, there are /many/ ways to skin this particular cat. The basic steps are:

和往常一样,有很多方法可以剥这只猫的皮。的基本步骤是:

  1. Load the data in python
  2. 在python中加载数据
  3. Express the data to R (e.g., exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)
  4. 将数据表示为R(例如,通过rpy2导出对象,或者将格式化文本写入stdout, R准备在另一端接收)
  5. Serialize the expressed data in R to an internal data representation (e.g., exporting the object via rpy2 or fread)
  6. 将R中的表示数据序列化为内部数据表示(例如,通过rpy2或fread导出对象)
  7. (optional) Make the data in that session of R accessible to another R session (i.e., the step to close the loop with rpy2, or if you've been using fread then you're already done).
  8. (可选)使该会话中的数据可由另一个会话访问(即,使用rpy2关闭循环的步骤,或者如果您已经使用了fread,那么您已经完成了)。