I use RMySQL and a MySQL database to store my datasets. Sometimes data gets revised or I store results back to the database as well. Long story short, there is quite some interaction between R and the database in my use case.
我使用RMySQL和MySQL数据库来存储数据集。有时数据会被修改,或者我将结果存储回数据库。长话短说,在我的用例中,R和数据库之间有一些交互。
Most of the time I use convenience functions like dbWriteTable
and dbReadTable
to write and read my data. Unfortunately these are just completely ignoring R data types and the MySQL field types. I mean I would expect that MySQL date fields end up in a Date
or POSIX
class. The other way around I´d think that these R classes are stored as a somewhat corresponding MySQL field type. That means a date should not be character – I do not expect to distinguish between float and doubles here...
大多数时候,我使用dbWriteTable和dbreadtabletable之类的便利函数来编写和读取数据。不幸的是,它们完全忽略了R数据类型和MySQL字段类型。我的意思是,我希望MySQL日期字段以date或POSIX类结束。相反我认为这些´d R类被存储为一个相应的MySQL字段类型。这意味着日期不应该是字符——我不希望在这里区分浮点数和双精度数……
I also tried to use dbGetQuery
– same result there. Is there something I have completely missed when reading the manual or is it simply not possible (yet) in these packages? What would by a nice work around?
我还尝试过使用dbGetQuery——同样的结果。当我阅读手册的时候,是否有一些我完全没有注意到的东西,或者在这些包中根本没有可能(还没有)?做个漂亮的工作会怎么样?
EDIT: @mdsummer I tried to find something more in the documentation, but found only these disappointing lines: `MySQL tables are read into R as data.frames, but without coercing character or logical data into factors. Similarly while exporting data.frames, factors are exported as character vectors.
编辑:@mdsummer我试图在文档中找到更多的东西,但只找到了令人失望的几行:“MySQL表被读入R作为data.frame,但没有强制字符或逻辑数据进入因子。”同样,在导出数据的时候,作为字符矢量的因素被导出。
Integer columns are usually imported as R integer vectors, except for cases such as BIGINT or UNSIGNED INTEGER which are coerced to R's double precision vectors to avoid truncation (currently R's integers are signed 32-bit quantities).
整数列通常被导入为R整型向量,除了像BIGINT或无符号整数被强制到R的双精度向量以避免截断(目前R的整数是有符号的32位量)。
Time variables are imported/exported as character data, so you need to convert these to your favorite date/time representation.
时间变量作为字符数据导入/导出,因此需要将它们转换为您最喜欢的日期/时间表示形式。
2 个解决方案
#1
5
Ok, I got a working solution now. Here's a function that maps MySQL field types to R classes. This helps in particular handling the MySQL field type date...
好的,我现在有一个可行的解决方案。这是一个将MySQL字段类型映射到R类的函数。这有助于特别处理MySQL字段类型日期…
dbReadMap <- function(con,table){
statement <- paste("DESCRIBE ",table,sep="")
desc <- dbGetQuery(con=con,statement)[,1:2]
# strip row_names if exists because it's an attribute and not real column
# otherweise it causes problems with the row count if the table has a row_names col
if(length(grep(pattern="row_names",x=desc)) != 0){
x <- grep(pattern="row_names",x=desc)
desc <- desc[-x,]
}
# replace length output in brackets that is returned by describe
desc[,2] <- gsub("[^a-z]","",desc[,2])
# building a dictionary
fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character")
fieldtype_to_rclass = cbind(fieldtypes,rclasses)
map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
map$rclasses <- as.character(map$rclasses)
#get data
res <- dbReadTable(con=con,table)
i=1
for(i in 1:length(map$rclasses)) {
cvn <- call(map$rclasses[i],res[,map$Field[i]])
res[map$Field[i]] <- eval(cvn)
}
return(res)
}
Maybe this is not good programming practice – I just don't know any better. So, use it at your own risk or help me to improve it... And of course it's only half of it: reading
. Hopefully I´ll find some time to write a writing function soon.
也许这不是一个好的编程实践——我只是不知道更好。所以,用它来冒险,或者帮助我改进它……当然只有一半:阅读。希望我´会很快找到一些时间来写一写函数。
If you have suggestions for the mapping dictionary let me know :)
如果您对映射字典有什么建议,请告诉我:)
#2
1
Here is a more generic function of the function of @Matt Bannert
that works with queries instead of tables:
下面是@Matt Bannert函数的一个更通用的函数,它处理查询而不是表:
# Extension to dbGetQuery2 that understands MySQL data types
dbGetQuery2 <- function(con,query){
statement <- paste0("CREATE TEMPORARY TABLE `temp` ", query)
dbSendQuery(con, statement)
desc <- dbGetQuery(con, "DESCRIBE `temp`")[,1:2]
dbSendQuery(con, "DROP TABLE `temp`")
# strip row_names if exists because it's an attribute and not real column
# otherweise it causes problems with the row count if the table has a row_names col
if(length(grep(pattern="row_names",x=desc)) != 0){
x <- grep(pattern="row_names",x=desc)
desc <- desc[-x,]
}
# replace length output in brackets that is returned by describe
desc[,2] <- gsub("[^a-z]","",desc[,2])
# building a dictionary
fieldtypes <- c("int", "tinyint", "bigint", "float", "double", "date", "character", "varchar", "text")
rclasses <- c("as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.Date", "as.character", "as.factor", "as.character")
fieldtype_to_rclass = cbind(fieldtypes,rclasses)
map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
map$rclasses <- as.character(map$rclasses)
#get data
res <- dbGetQuery(con,query)
i=1
for(i in 1:length(map$rclasses)) {
cvn <- call(map$rclasses[i],res[,map$Field[i]])
res[map$Field[i]] <- eval(cvn)
}
return(res)
}
#1
5
Ok, I got a working solution now. Here's a function that maps MySQL field types to R classes. This helps in particular handling the MySQL field type date...
好的,我现在有一个可行的解决方案。这是一个将MySQL字段类型映射到R类的函数。这有助于特别处理MySQL字段类型日期…
dbReadMap <- function(con,table){
statement <- paste("DESCRIBE ",table,sep="")
desc <- dbGetQuery(con=con,statement)[,1:2]
# strip row_names if exists because it's an attribute and not real column
# otherweise it causes problems with the row count if the table has a row_names col
if(length(grep(pattern="row_names",x=desc)) != 0){
x <- grep(pattern="row_names",x=desc)
desc <- desc[-x,]
}
# replace length output in brackets that is returned by describe
desc[,2] <- gsub("[^a-z]","",desc[,2])
# building a dictionary
fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character")
fieldtype_to_rclass = cbind(fieldtypes,rclasses)
map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
map$rclasses <- as.character(map$rclasses)
#get data
res <- dbReadTable(con=con,table)
i=1
for(i in 1:length(map$rclasses)) {
cvn <- call(map$rclasses[i],res[,map$Field[i]])
res[map$Field[i]] <- eval(cvn)
}
return(res)
}
Maybe this is not good programming practice – I just don't know any better. So, use it at your own risk or help me to improve it... And of course it's only half of it: reading
. Hopefully I´ll find some time to write a writing function soon.
也许这不是一个好的编程实践——我只是不知道更好。所以,用它来冒险,或者帮助我改进它……当然只有一半:阅读。希望我´会很快找到一些时间来写一写函数。
If you have suggestions for the mapping dictionary let me know :)
如果您对映射字典有什么建议,请告诉我:)
#2
1
Here is a more generic function of the function of @Matt Bannert
that works with queries instead of tables:
下面是@Matt Bannert函数的一个更通用的函数,它处理查询而不是表:
# Extension to dbGetQuery2 that understands MySQL data types
dbGetQuery2 <- function(con,query){
statement <- paste0("CREATE TEMPORARY TABLE `temp` ", query)
dbSendQuery(con, statement)
desc <- dbGetQuery(con, "DESCRIBE `temp`")[,1:2]
dbSendQuery(con, "DROP TABLE `temp`")
# strip row_names if exists because it's an attribute and not real column
# otherweise it causes problems with the row count if the table has a row_names col
if(length(grep(pattern="row_names",x=desc)) != 0){
x <- grep(pattern="row_names",x=desc)
desc <- desc[-x,]
}
# replace length output in brackets that is returned by describe
desc[,2] <- gsub("[^a-z]","",desc[,2])
# building a dictionary
fieldtypes <- c("int", "tinyint", "bigint", "float", "double", "date", "character", "varchar", "text")
rclasses <- c("as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.Date", "as.character", "as.factor", "as.character")
fieldtype_to_rclass = cbind(fieldtypes,rclasses)
map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
map$rclasses <- as.character(map$rclasses)
#get data
res <- dbGetQuery(con,query)
i=1
for(i in 1:length(map$rclasses)) {
cvn <- call(map$rclasses[i],res[,map$Field[i]])
res[map$Field[i]] <- eval(cvn)
}
return(res)
}