检查一个变量R内的各种DATE的差异

时间:2022-03-01 15:48:48

I want to split the line when the variable contain different YEAR, also split the col : "Price" with evenly divided by the numbers of date appear --> count (" ; ") +1

我想在变量包含不同的YEAR时拆分行,也拆分col:“Price”均匀地除以显示的日期数 - > count(“;”)+1

There is a table with the variable that is not yet be splitted.

有一个表中包含尚未拆分的变量。

# Dataset call df 

Price   Date 
500     2016-01-01
400     2016-01-03;2016-01-09
1000    2016-01-04;2017-09-01;2017-08-10;2018-01-01
25      2016-01-04;2017-09-01
304     2015-01-02
238     2018-01-02;2018-02-02

Desire Outlook

# Targeted df
Price   Date 
500     2016-01-01
400     2016-01-03;2016-01-09
250     2016-01-04
250     2017-09-01
250     2017-08-10
250     2018-01-01
12.5    2016-01-04
12.5    2017-09-01
304     2015-01-02
238     2018-01-02;2018-02-02

Once the variable contains different year is defined , below is the operation have to do .(It is just a example .)

一旦变量包含不同的年份定义,下面是操作必须做的。(这只是一个例子。)

mutate(Price = ifelse(DIFFERENT_DATE_ROW,
                     as.numeric(Price) / (str_count(Date,";")+1),
                     as.numeric(Price)),
       Date = ifelse(DIFFERENT_DATE_ROW,
                     strsplit(as.character(Date),";"),
                     Date)) %>%
 unnest()

I meet some constraints that cannot use dplyr's function "if_else" because else NO operation cannot be recognized .Only ifelse work properly.

我遇到了一些不能使用dplyr函数“if_else”的约束,因为否则无法识别NO操作。只有ifelse正常工作。

How to find out there is differences of the year in one variables to PROVOKE the split line & split price calculations ?

如何找出一个变量中的年份差异来PROVOKE分割线和拆分价格计算?

so far the operation to split the element like

到目前为止,拆分元素的操作就像

unlist(lapply(unlist(strsplit(df1$noFDate[8],";")),FUN = year))

cannot solve the problem.

无法解决问题。

I am beginner of coding , please feel free to change all operation above with considering the real data have over 2 million rows and 50 cols.

我是编码的初学者,请考虑真实数据超过200万行和50列,请随意更改上述所有操作。

2 个解决方案

#1


2  

This might not be the most efficient one but can be used to get the required answer.

这可能不是最有效的,但可以用来获得所需的答案。

#Get the row indices which we need to separate
inds <- sapply(strsplit(df$Date, ";"), function(x) 
#Format the date into year and count number of unique values
#Return TRUE if number of unique values is greater than 1
    length(unique(format(as.Date(x), "%Y"))) > 1
)

library(tidyverse)
library(stringr)

#Select those indices 
df[inds, ] %>%
   # divide the price by number of dates in that row 
    mutate(Price = Price / (str_count(Date,";") + 1)) %>%
   # separate `;` delimited values in separate rows
    separate_rows(Date, sep = ";") %>%
   # bind the remaining rows as it is 
    bind_rows(df[!inds,])


# Price                  Date
#1  250.0            2016-01-04
#2  250.0            2017-09-01
#3  250.0            2017-08-10
#4  250.0            2018-01-01
#5   12.5            2016-01-04
#6   12.5            2017-09-01
#7  500.0            2016-01-01
#8  400.0 2016-01-03;2016-01-09
#9  304.0            2015-01-02
#10 238.0 2018-01-02;2018-02-02

#2


1  

A bit cumbersome but you could do:

有点麻烦,但你可以这样做:

d_new = lapply(1:nrow(dat),function(x) {
  a = dat[x,]
  b = unlist(strsplit(as.character(a$Date),";"))
  l = length(b)
  if (l==1) check = 0 else check = ifelse(var(as.numeric(strftime(b,"%Y")))==0,0,1)

  if (check==0) {
      a
  } else {
      data.frame(Date = b, Price = rep(a$Price / l,l))
  }
})

do.call(rbind,d_new)

#1


2  

This might not be the most efficient one but can be used to get the required answer.

这可能不是最有效的,但可以用来获得所需的答案。

#Get the row indices which we need to separate
inds <- sapply(strsplit(df$Date, ";"), function(x) 
#Format the date into year and count number of unique values
#Return TRUE if number of unique values is greater than 1
    length(unique(format(as.Date(x), "%Y"))) > 1
)

library(tidyverse)
library(stringr)

#Select those indices 
df[inds, ] %>%
   # divide the price by number of dates in that row 
    mutate(Price = Price / (str_count(Date,";") + 1)) %>%
   # separate `;` delimited values in separate rows
    separate_rows(Date, sep = ";") %>%
   # bind the remaining rows as it is 
    bind_rows(df[!inds,])


# Price                  Date
#1  250.0            2016-01-04
#2  250.0            2017-09-01
#3  250.0            2017-08-10
#4  250.0            2018-01-01
#5   12.5            2016-01-04
#6   12.5            2017-09-01
#7  500.0            2016-01-01
#8  400.0 2016-01-03;2016-01-09
#9  304.0            2015-01-02
#10 238.0 2018-01-02;2018-02-02

#2


1  

A bit cumbersome but you could do:

有点麻烦,但你可以这样做:

d_new = lapply(1:nrow(dat),function(x) {
  a = dat[x,]
  b = unlist(strsplit(as.character(a$Date),";"))
  l = length(b)
  if (l==1) check = 0 else check = ifelse(var(as.numeric(strftime(b,"%Y")))==0,0,1)

  if (check==0) {
      a
  } else {
      data.frame(Date = b, Price = rep(a$Price / l,l))
  }
})

do.call(rbind,d_new)