我可以使用rpy2将pandas数据帧保存到.Rdata文件吗?

时间:2021-05-19 15:49:06

I never used rpy2 before, but I am just wondering if I could use it to save a python object (a pandas DataFrame) in an R-readable file. I am having trouble to move objects between these environments mainly because I'm using Windows and the data source is an Excel file. Yes, the kind that has cells with text including inverted commas, newlines, and all the stuff that CSV can't handle adequately.

我之前从未使用过rpy2,但我只是想知道是否可以用它来保存一个R-readable文件中的python对象(一个pandas DataFrame)。我在这些环境之间移动对象时遇到了麻烦,主要是因为我使用的是Windows,而数据源是一个Excel文件。是的,包含带有文本的单元格的类型,包括引号,换行符以及CSV无法充分处理的所有内容。

I usually rely on XLConnectJars, but it seems to be broken

我通常依赖XLConnectJars,但它似乎被打破了

Installing package(s) into ‘C:/Program Files/R/library’
(as ‘lib’ is unspecified)
trying URL 'http://cran.csiro.au/bin/windows/contrib/2.15/XLConnectJars_0.2-4.zip'
Content type 'application/zip' length 16538311 bytes (15.8 Mb)
opened URL
downloaded 15.3 Mb

Warning in install.packages :
  downloaded length 16011264 != reported length 16538311

pandas reads it properly, but I need to use the information in R.

pandas正确读取它,但我需要使用R中的信息。

2 个解决方案

#1


8  

You can use rpy2 to do this. Once you have the data in a panda, you have to transmit it to R. This link provides an experimental interface between Python Pandas and R data.frames. A code example copied from the website:

您可以使用rpy2执行此操作。在熊猫中获得数据后,必须将其传输到R.此链接提供Python Pandas和R data.frames之间的实验接口。从网站复制的代码示例:

from pandas import DataFrame
import pandas.rpy.common as com

df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
                index=["one", "two", "three"])
r_dataframe = com.convert_to_r_dataframe(df)

print type(r_dataframe)
 <class 'rpy2.robjects.vectors.DataFrame'>

print r_dataframe
      A B C
one   1 4 7
two   2 5 8
three 3 6 9

#2


2  

Here is how you write/read .RData files with rpy2 (since accepted solution is deprecated and doesn't show how to save to .RData file):

以下是使用rpy2编写/读取.RData文件的方法(因为已接受的解决方案已弃用,并未显示如何保存到.RData文件):

import rpy2
from rpy2 import robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

# read .RData file as a pandas dataframe
def load_rdata_file(filename):
    r_data = robjects.r['get'](robjects.r['load'](filename))
    df = pandas2ri.ri2py(r_data)
    return df

# write pandas dataframe to an .RData file
def save_rdata_file(df, filename):
    r_data = pandas2ri.py2ri(df)
    robjects.r.assign("my_df", r_data)
    robjects.r("save(my_df, file='{}')".format(filename))

#1


8  

You can use rpy2 to do this. Once you have the data in a panda, you have to transmit it to R. This link provides an experimental interface between Python Pandas and R data.frames. A code example copied from the website:

您可以使用rpy2执行此操作。在熊猫中获得数据后,必须将其传输到R.此链接提供Python Pandas和R data.frames之间的实验接口。从网站复制的代码示例:

from pandas import DataFrame
import pandas.rpy.common as com

df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
                index=["one", "two", "three"])
r_dataframe = com.convert_to_r_dataframe(df)

print type(r_dataframe)
 <class 'rpy2.robjects.vectors.DataFrame'>

print r_dataframe
      A B C
one   1 4 7
two   2 5 8
three 3 6 9

#2


2  

Here is how you write/read .RData files with rpy2 (since accepted solution is deprecated and doesn't show how to save to .RData file):

以下是使用rpy2编写/读取.RData文件的方法(因为已接受的解决方案已弃用,并未显示如何保存到.RData文件):

import rpy2
from rpy2 import robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

# read .RData file as a pandas dataframe
def load_rdata_file(filename):
    r_data = robjects.r['get'](robjects.r['load'](filename))
    df = pandas2ri.ri2py(r_data)
    return df

# write pandas dataframe to an .RData file
def save_rdata_file(df, filename):
    r_data = pandas2ri.py2ri(df)
    robjects.r.assign("my_df", r_data)
    robjects.r("save(my_df, file='{}')".format(filename))