将包含不同日期格式的数据帧列转换为日期对象

时间:2022-07-04 16:32:41

I have a data frame column of class character containing different date formats:

我有一个包含不同日期格式的类字符的数据框列:

foo=data.frame(Date=c("29-Jan-16","29-Jan-16","2/5/2016","2/5/2016"),stringsAsFactors = F)

I would like to convert the column Date to a vector of class Date objects. I can correctly parse each format separately:

我想将列日期转换为类日期对象的向量。我可以分别正确地解析每一种格式:

> as.Date( foo$Date, format = "%d-%b-%y" )
[1] "2016-01-29" "2016-01-29" NA           NA          
> as.Date( foo$Date, format = "%m/%d/%Y" )
[1] NA           NA           "2016-02-05" "2016-02-05"

So I thought to join the two parsing instructions with ifelse and grepl. Note that grepl correctly identifies the rows where the first format is used

因此我想将这两个解析指令与ifelse和grepl结合起来。注意,grepl正确地标识了使用第一种格式的行

> grepl("-",foo$Date)
[1]  TRUE  TRUE FALSE FALSE

However, the complet instruction doesn't work:

但是,complet指令不起作用:

bar = foo
bar$Date=ifelse(grepl("-",foo$Date),
                      as.Date( foo$Date, format = "%d-%b-%y" ),
                      as.Date( foo$Date, format = "%m/%d/%Y" ))

> bar
   Date
1 16829
2 16829
3 16836
4 16836

Questions:

问题:

  1. Can you help me understand what's happening?
  2. 你能帮我了解一下发生了什么事吗?
  3. Even if I manage to fix my solution with your help, I learn more about R and that's great, but the solution remains suboptimal. The reason is that the "brilliant" person who populates the dataframe may choose to use even more date formats (it has already happened before, and it will likely happen again). I will then have to nest more ifelse, and create more complex regexps. The code will soon become nasty and unreadable. Isn't there a way to have R automatically find the right data format for each element of foo$Date?
  4. 即使我在你的帮助下解决了我的问题,我也学到了更多关于R的知识,这很好,但是这个解仍然是次优的。原因是,填充dataframe的“聪明的”人可能会选择使用更多的日期格式(以前已经有过这种情况,可能还会再次发生)。然后,我将不得不嵌套更多的ifelse,并创建更复杂的regexp。代码很快就会变得令人讨厌和难以读懂。有没有办法让R自动为foo$Date的每个元素找到正确的数据格式?

1 个解决方案

#1


3  

It would be easier to work with lubridate. Assuming that the format of the 'Date' is in the order day, month, year, we can use dmy.

使用润滑剂会更容易。假设“日期”的格式在订单日、月、年,我们可以使用dmy。

library(lubridate)
dmy(foo$Date)
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-05-02 UTC" "2016-05-02 UTC"

In case there are other variations in the order, we can also use guess_formats with parse_date_time.

如果顺序中还有其他变化,我们还可以使用带有parse_date_time的guess_format。

 with(foo, parse_date_time(Date, 
         orders=guess_formats(Date, c('dby', 'mdy'))))
 #[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-02-05 UTC" "2016-02-05 UTC"

Regarding the use of ifelse in the OP's code, the output we get in the numeric class can be converted back to Date class

关于在OP的代码中使用ifelse,我们在数值类中获得的输出可以转换回Date类

v1 <- ifelse(grepl("-",foo$Date),
                  as.Date( foo$Date, format = "%d-%b-%y" ),
                  as.Date( foo$Date, format = "%m/%d/%Y" ))

as.Date(v1, origin='1970-01-01')
#[1] "2016-01-29" "2016-01-29" "2016-02-05" "2016-02-05"

#1


3  

It would be easier to work with lubridate. Assuming that the format of the 'Date' is in the order day, month, year, we can use dmy.

使用润滑剂会更容易。假设“日期”的格式在订单日、月、年,我们可以使用dmy。

library(lubridate)
dmy(foo$Date)
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-05-02 UTC" "2016-05-02 UTC"

In case there are other variations in the order, we can also use guess_formats with parse_date_time.

如果顺序中还有其他变化,我们还可以使用带有parse_date_time的guess_format。

 with(foo, parse_date_time(Date, 
         orders=guess_formats(Date, c('dby', 'mdy'))))
 #[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-02-05 UTC" "2016-02-05 UTC"

Regarding the use of ifelse in the OP's code, the output we get in the numeric class can be converted back to Date class

关于在OP的代码中使用ifelse,我们在数值类中获得的输出可以转换回Date类

v1 <- ifelse(grepl("-",foo$Date),
                  as.Date( foo$Date, format = "%d-%b-%y" ),
                  as.Date( foo$Date, format = "%m/%d/%Y" ))

as.Date(v1, origin='1970-01-01')
#[1] "2016-01-29" "2016-01-29" "2016-02-05" "2016-02-05"