I have a data frame column of class character
containing different date formats:
我有一个包含不同日期格式的类字符的数据框列:
foo=data.frame(Date=c("29-Jan-16","29-Jan-16","2/5/2016","2/5/2016"),stringsAsFactors = F)
I would like to convert the column Date
to a vector of class Date
objects. I can correctly parse each format separately:
我想将列日期转换为类日期对象的向量。我可以分别正确地解析每一种格式:
> as.Date( foo$Date, format = "%d-%b-%y" )
[1] "2016-01-29" "2016-01-29" NA NA
> as.Date( foo$Date, format = "%m/%d/%Y" )
[1] NA NA "2016-02-05" "2016-02-05"
So I thought to join the two parsing instructions with ifelse
and grepl
. Note that grepl
correctly identifies the rows where the first format is used
因此我想将这两个解析指令与ifelse和grepl结合起来。注意,grepl正确地标识了使用第一种格式的行
> grepl("-",foo$Date)
[1] TRUE TRUE FALSE FALSE
However, the complet instruction doesn't work:
但是,complet指令不起作用:
bar = foo
bar$Date=ifelse(grepl("-",foo$Date),
as.Date( foo$Date, format = "%d-%b-%y" ),
as.Date( foo$Date, format = "%m/%d/%Y" ))
> bar
Date
1 16829
2 16829
3 16836
4 16836
Questions:
问题:
- Can you help me understand what's happening?
- 你能帮我了解一下发生了什么事吗?
- Even if I manage to fix my solution with your help, I learn more about R and that's great, but the solution remains suboptimal. The reason is that the "brilliant" person who populates the dataframe may choose to use even more date formats (it has already happened before, and it will likely happen again). I will then have to nest more
ifelse
, and create more complex regexps. The code will soon become nasty and unreadable. Isn't there a way to have R automatically find the right data format for each element offoo$Date
? - 即使我在你的帮助下解决了我的问题,我也学到了更多关于R的知识,这很好,但是这个解仍然是次优的。原因是,填充dataframe的“聪明的”人可能会选择使用更多的日期格式(以前已经有过这种情况,可能还会再次发生)。然后,我将不得不嵌套更多的ifelse,并创建更复杂的regexp。代码很快就会变得令人讨厌和难以读懂。有没有办法让R自动为foo$Date的每个元素找到正确的数据格式?
1 个解决方案
#1
3
It would be easier to work with lubridate
. Assuming that the format of the 'Date' is in the order day, month, year, we can use dmy
.
使用润滑剂会更容易。假设“日期”的格式在订单日、月、年,我们可以使用dmy。
library(lubridate)
dmy(foo$Date)
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-05-02 UTC" "2016-05-02 UTC"
In case there are other variations in the order, we can also use guess_formats
with parse_date_time
.
如果顺序中还有其他变化,我们还可以使用带有parse_date_time的guess_format。
with(foo, parse_date_time(Date,
orders=guess_formats(Date, c('dby', 'mdy'))))
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-02-05 UTC" "2016-02-05 UTC"
Regarding the use of ifelse
in the OP's code, the output we get in the numeric
class can be converted back to Date
class
关于在OP的代码中使用ifelse,我们在数值类中获得的输出可以转换回Date类
v1 <- ifelse(grepl("-",foo$Date),
as.Date( foo$Date, format = "%d-%b-%y" ),
as.Date( foo$Date, format = "%m/%d/%Y" ))
as.Date(v1, origin='1970-01-01')
#[1] "2016-01-29" "2016-01-29" "2016-02-05" "2016-02-05"
#1
3
It would be easier to work with lubridate
. Assuming that the format of the 'Date' is in the order day, month, year, we can use dmy
.
使用润滑剂会更容易。假设“日期”的格式在订单日、月、年,我们可以使用dmy。
library(lubridate)
dmy(foo$Date)
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-05-02 UTC" "2016-05-02 UTC"
In case there are other variations in the order, we can also use guess_formats
with parse_date_time
.
如果顺序中还有其他变化,我们还可以使用带有parse_date_time的guess_format。
with(foo, parse_date_time(Date,
orders=guess_formats(Date, c('dby', 'mdy'))))
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-02-05 UTC" "2016-02-05 UTC"
Regarding the use of ifelse
in the OP's code, the output we get in the numeric
class can be converted back to Date
class
关于在OP的代码中使用ifelse,我们在数值类中获得的输出可以转换回Date类
v1 <- ifelse(grepl("-",foo$Date),
as.Date( foo$Date, format = "%d-%b-%y" ),
as.Date( foo$Date, format = "%m/%d/%Y" ))
as.Date(v1, origin='1970-01-01')
#[1] "2016-01-29" "2016-01-29" "2016-02-05" "2016-02-05"