(By object-relational mapping, I mean what is described here: Wikipedia: Object-relational mapping.)
(这里所说的对象关系映射,我的意思是这里描述的:Wikipedia:对象关系映射。)
Here is how I could imagine this work in R : a kind of "virtual data frame" is linked to a database, and returns the results of SQL queries when accessed. For instance, head(virtual_list)
would actually return the results of (select * from mapped_table limit 5)
on the mapped database.
以下是我在R中可以想象的工作方式:将一种“虚拟数据帧”链接到数据库,并在访问时返回SQL查询的结果。例如,head(virtual_list)实际上会返回映射数据库上的(select * from mapped_table limit 5)的结果。
I have found this post by John Myles White, but there seems to have been no progress in the last 3 years.
我找到了John Myles White的这篇文章,但是在过去的3年里似乎没有任何进展。
Is there a working package that implements this ?
是否存在实现此功能的工作包?
If not,
如果不是这样,
- Would it be useful ?
- 它有用吗?
- What would be the best way to implement it (S4 ?) ?
- 实现它的最佳方式是什么(S4 ?)
6 个解决方案
#1
10
The very recent package dplyr
is implementing this (amongst other amazing features).
最近的软件包dplyr正在实现这一点(以及其他令人惊叹的特性)。
Here are illustrations from the examples of function src_mysql()
:
下面是函数src_mysql()的例子:
# Connection basics ---------------------------------------------------------
# To connect to a database first create a src:
my_db <- src_mysql(host = "blah.com", user = "hadley",
password = "pass")
# Then reference a tbl within that src
my_tbl <- tbl(my_db, "my_table")
# Methods -------------------------------------------------------------------
batting <- tbl(lahman_mysql(), "Batting")
dim(batting)
colnames(batting)
head(batting)
#2
7
There is an old unsupported package, SQLiteDF, that does that. Build it from source and ignore the numerous error messages.
有一个旧的不受支持的包SQLiteDF可以做到这一点。从源代码构建它,并忽略大量的错误消息。
> # from example(sqlite.data.frame)
>
> library(SQLiteDF)
> iris.sdf <- sqlite.data.frame(iris)
> iris.sdf$Petal.Length[1:10] # $ done via SQL
[1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5
#3
2
Looks like John Myles White he's given up on it.
看起来约翰·迈尔斯·怀特已经放弃了。
There is a bit of a workaround explained here.
这里有一个解决方法。
#4
1
I don't think it would be useful. R is not a real OOP language. The "central" data structure in R is the data frame. No need for Object-Relational Mapping here.What you want is a mapping between SQL tables and data frames and the RMySQL and RODBC provide just that :
我不认为它有用。R不是真正的OOP语言。R中的“中心”数据结构是数据框架。这里不需要对象关系映射。您需要的是SQL表和数据帧之间的映射,而RMySQL和RODBC提供的是:
dbGetQuery to return the results of a query in a data frame and dbWriteTable to insert data in a table or do a bulk update ( from a data frame).
dbGetQuery返回数据框架和dbWriteTable中的查询结果,以在表中插入数据或进行批量更新(从数据帧)。
#5
1
Next to the various driver packages for querying DBs (DBI, RODBC,RJDBC,RMySql,...) and dplyr, there's also sqldf https://cran.r-project.org/web/packages/sqldf/
在查询DBs (DBI、RODBC、RJDBC、RMySql、…)和dplyr的各种驱动程序包的旁边,还有sqldf https://cran.r project.org/web/packages/sqldf/。
This will automatically import dataframes into the db & let you query the data via sql. At the end the db is deleted.
这将自动将dataframes导入到db中,并允许您通过sql查询数据。最后删除db。
#6
0
As an experienced R user, I would not use this. First off, this 'virtual frame' would be slow to use, since you constantly need to synchronize between R memory and the database. It would also require locking the database table, since otherwise you have unpredictable results due to other edits happening at the same time.
作为一个有经验的R用户,我不会使用这个。首先,这个“虚拟帧”将会很慢,因为您需要经常在R内存和数据库之间进行同步。它还需要锁定数据库表,否则由于同时进行其他编辑,您将得到不可预测的结果。
Finally, I do not think R is suited for implementing a different evaluation of promise
objects. Doing myFrame$foo[ myFrame$foo > 40 ]
will still fetch the full foo
column, since you cannot possible implement a full translation scheme from R to SQL.
最后,我认为R不适合实现对承诺对象的不同评估。执行myFrame$foo[myFrame$foo > 40]仍然会获取完整的foo列,因为您无法实现从R到SQL的完整转换方案。
Therefore, I prefer to load a dataframe() from a query, use it, and write it back to the database if required.
因此,我更喜欢从查询加载dataframe(),使用它,并在需要时将其写回数据库。
#1
10
The very recent package dplyr
is implementing this (amongst other amazing features).
最近的软件包dplyr正在实现这一点(以及其他令人惊叹的特性)。
Here are illustrations from the examples of function src_mysql()
:
下面是函数src_mysql()的例子:
# Connection basics ---------------------------------------------------------
# To connect to a database first create a src:
my_db <- src_mysql(host = "blah.com", user = "hadley",
password = "pass")
# Then reference a tbl within that src
my_tbl <- tbl(my_db, "my_table")
# Methods -------------------------------------------------------------------
batting <- tbl(lahman_mysql(), "Batting")
dim(batting)
colnames(batting)
head(batting)
#2
7
There is an old unsupported package, SQLiteDF, that does that. Build it from source and ignore the numerous error messages.
有一个旧的不受支持的包SQLiteDF可以做到这一点。从源代码构建它,并忽略大量的错误消息。
> # from example(sqlite.data.frame)
>
> library(SQLiteDF)
> iris.sdf <- sqlite.data.frame(iris)
> iris.sdf$Petal.Length[1:10] # $ done via SQL
[1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5
#3
2
Looks like John Myles White he's given up on it.
看起来约翰·迈尔斯·怀特已经放弃了。
There is a bit of a workaround explained here.
这里有一个解决方法。
#4
1
I don't think it would be useful. R is not a real OOP language. The "central" data structure in R is the data frame. No need for Object-Relational Mapping here.What you want is a mapping between SQL tables and data frames and the RMySQL and RODBC provide just that :
我不认为它有用。R不是真正的OOP语言。R中的“中心”数据结构是数据框架。这里不需要对象关系映射。您需要的是SQL表和数据帧之间的映射,而RMySQL和RODBC提供的是:
dbGetQuery to return the results of a query in a data frame and dbWriteTable to insert data in a table or do a bulk update ( from a data frame).
dbGetQuery返回数据框架和dbWriteTable中的查询结果,以在表中插入数据或进行批量更新(从数据帧)。
#5
1
Next to the various driver packages for querying DBs (DBI, RODBC,RJDBC,RMySql,...) and dplyr, there's also sqldf https://cran.r-project.org/web/packages/sqldf/
在查询DBs (DBI、RODBC、RJDBC、RMySql、…)和dplyr的各种驱动程序包的旁边,还有sqldf https://cran.r project.org/web/packages/sqldf/。
This will automatically import dataframes into the db & let you query the data via sql. At the end the db is deleted.
这将自动将dataframes导入到db中,并允许您通过sql查询数据。最后删除db。
#6
0
As an experienced R user, I would not use this. First off, this 'virtual frame' would be slow to use, since you constantly need to synchronize between R memory and the database. It would also require locking the database table, since otherwise you have unpredictable results due to other edits happening at the same time.
作为一个有经验的R用户,我不会使用这个。首先,这个“虚拟帧”将会很慢,因为您需要经常在R内存和数据库之间进行同步。它还需要锁定数据库表,否则由于同时进行其他编辑,您将得到不可预测的结果。
Finally, I do not think R is suited for implementing a different evaluation of promise
objects. Doing myFrame$foo[ myFrame$foo > 40 ]
will still fetch the full foo
column, since you cannot possible implement a full translation scheme from R to SQL.
最后,我认为R不适合实现对承诺对象的不同评估。执行myFrame$foo[myFrame$foo > 40]仍然会获取完整的foo列,因为您无法实现从R到SQL的完整转换方案。
Therefore, I prefer to load a dataframe() from a query, use it, and write it back to the database if required.
因此,我更喜欢从查询加载dataframe(),使用它,并在需要时将其写回数据库。