汇总多个动态列并将结果存储在新列中

时间:2022-10-25 00:07:13

I have the following situation.

我有以下情况。

    df <- rbind(
  data.frame(thisDate = rep(seq(as.Date("2018-1-1"), as.Date("2018-1-2"), by="day")) ),
  data.frame(thisDate = rep(seq(as.Date("2018-2-1"), as.Date("2018-2-2"), by="day")) ))
df <- cbind(df,lastMonth = as.Date(format(as.Date(df$thisDate - months(1)),"%Y-%m-01")))
df <- cbind(df, prod1Quantity= seq(1:4) )

I have quantities for different days of a month for an unknown number of products. I want to have 1 column for every product with the total monthly quantity of that product for all of the previous month. So the output would be like this .. ie grouped by lastMonth, Prod1Quantity . I just don't get how to group by, mutate and summarise dynamically if that indeed is the right approach.

对于数量不详的产品,我有一个月不同日期的数量。我希望每个产品都有1列,其中包含上个月所有产品的月产量总额。所以输出就像这样..即按lastMonth,Prod1Quantity分组。如果这确实是正确的方法,我只是不知道如何动态分组,变异和总结。

I came across data.table generate multiple columns and summarize them . I think it appears to do what I need - but I just don't get how it is working!

我遇到了data.table生成多个列并对它们进行汇总。我认为这似乎做了我需要的 - 但我只是不知道它是如何工作的!

Desired Output


   thisDate  lastMonth prod1Quantity prod1prevMonth
1 2018-01-01 2017-12-01             1             NA
2 2018-01-02 2017-12-01             2             NA
3 2018-02-01 2018-01-01             3              3
4 2018-02-02 2018-01-01             4              3

2 个解决方案

#1


0  

Another approach could be

另一种方法可能是

library(dplyr)
library(lubridate)

temp_df <- df %>%
  mutate(thisDate_forJoin = as.Date(format(thisDate,"%Y-%m-01"))) 

final_df <- temp_df %>%
  mutate(thisDate_forJoin = thisDate_forJoin %m-% months(1)) %>%
  left_join(temp_df %>%
              group_by(thisDate_forJoin) %>%
              summarise_if(is.numeric, sum), 
            by="thisDate_forJoin") %>%
  select(-thisDate_forJoin)

Output is:

    thisDate prod1Quantity.x prod2Quantity.x prod1Quantity.y prod2Quantity.y
1 2018-01-01               1              10              NA              NA
2 2018-01-02               2              11              NA              NA
3 2018-02-01               3              12               3              21
4 2018-02-02               4              13               3              21

Sample data:

df <- structure(list(thisDate = structure(c(17532, 17533, 17563, 17564
), class = "Date"), prod1Quantity = 1:4, prod2Quantity = 10:13), class = "data.frame", row.names = c(NA, 
-4L))
#    thisDate prod1Quantity prod2Quantity
#1 2018-01-01             1            10
#2 2018-01-02             2            11
#3 2018-02-01             3            12
#4 2018-02-02             4            13

#2


0  

A solution can be reached by calculating the month-wise production quantity and then joining on month of lastMonth and thisDate.

通过计算按月生产数量然后加入lastMonth和thisDate的月份,可以达到解决方案。

lubridate::month function has been used evaluate month from date.

lubridate :: month函数已用于评估从日期开始的月份。

library(dplyr)
library(lubridate)
df %>% group_by(month = as.integer(month(thisDate))) %>%
  summarise(prodQuantMonth = sum(prod1Quantity)) %>%
  right_join(., mutate(df, prevMonth = month(lastMonth)), by=c("month" = "prevMonth")) %>%
  select(thisDate, lastMonth, prod1Quantity, prodQuantLastMonth = prodQuantMonth)

# # A tibble: 4 x 4
#   thisDate   lastMonth  prod1Quantity prodQuantLastMonth
#   <date>     <date>             <int>              <int>
# 1 2018-01-01 2017-12-01             1                 NA
# 2 2018-01-02 2017-12-01             2                 NA
# 3 2018-02-01 2018-01-01             3                  3
# 4 2018-02-02 2018-01-01             4                  3

#1


0  

Another approach could be

另一种方法可能是

library(dplyr)
library(lubridate)

temp_df <- df %>%
  mutate(thisDate_forJoin = as.Date(format(thisDate,"%Y-%m-01"))) 

final_df <- temp_df %>%
  mutate(thisDate_forJoin = thisDate_forJoin %m-% months(1)) %>%
  left_join(temp_df %>%
              group_by(thisDate_forJoin) %>%
              summarise_if(is.numeric, sum), 
            by="thisDate_forJoin") %>%
  select(-thisDate_forJoin)

Output is:

    thisDate prod1Quantity.x prod2Quantity.x prod1Quantity.y prod2Quantity.y
1 2018-01-01               1              10              NA              NA
2 2018-01-02               2              11              NA              NA
3 2018-02-01               3              12               3              21
4 2018-02-02               4              13               3              21

Sample data:

df <- structure(list(thisDate = structure(c(17532, 17533, 17563, 17564
), class = "Date"), prod1Quantity = 1:4, prod2Quantity = 10:13), class = "data.frame", row.names = c(NA, 
-4L))
#    thisDate prod1Quantity prod2Quantity
#1 2018-01-01             1            10
#2 2018-01-02             2            11
#3 2018-02-01             3            12
#4 2018-02-02             4            13

#2


0  

A solution can be reached by calculating the month-wise production quantity and then joining on month of lastMonth and thisDate.

通过计算按月生产数量然后加入lastMonth和thisDate的月份,可以达到解决方案。

lubridate::month function has been used evaluate month from date.

lubridate :: month函数已用于评估从日期开始的月份。

library(dplyr)
library(lubridate)
df %>% group_by(month = as.integer(month(thisDate))) %>%
  summarise(prodQuantMonth = sum(prod1Quantity)) %>%
  right_join(., mutate(df, prevMonth = month(lastMonth)), by=c("month" = "prevMonth")) %>%
  select(thisDate, lastMonth, prod1Quantity, prodQuantLastMonth = prodQuantMonth)

# # A tibble: 4 x 4
#   thisDate   lastMonth  prod1Quantity prodQuantLastMonth
#   <date>     <date>             <int>              <int>
# 1 2018-01-01 2017-12-01             1                 NA
# 2 2018-01-02 2017-12-01             2                 NA
# 3 2018-02-01 2018-01-01             3                  3
# 4 2018-02-02 2018-01-01             4                  3