I have a table df that looks like this (with many more columns and many more rows):
我有一个看起来像这样的表df(有更多列和更多行):
I want to create a new table df2 in which each cell in the red box is replaced by percentage change from the value for the previous date (with the top row obviously displaying NA). For example the bottom right cell in the new table should read -.0188
我想创建一个新表df2,其中红色框中的每个单元格都被上一个日期值的百分比变化所取代(顶行明显显示NA)。例如,新表中右下角的单元格应为-.0188
Is there a function (base or from some package) that can do this? If so, how can I use it?
是否有功能(基础或某些包)可以做到这一点?如果是这样,我该如何使用它?
2 个解决方案
#1
2
Here is a solution using some made up data and tidyverse
functions. The key thing here is dplyr::lag
, which lets you reference the previous value in a data frame. Note that the selection vars(-day)
selects all the columns except day
, and the use of funs(pct_change = )
creates new columns with the original name prepended to pct_change
. Inside funs
, the .
refers to a value in that column, so we can directly calculate the proportional change by dividing by the previous value and subtracting 1.
这是一个使用一些组合数据和整数函数的解决方案。这里的关键是dplyr :: lag,它允许您引用数据框中的先前值。请注意,选择变量(-day)选择除日之外的所有列,并且使用funs(pct_change =)会创建新的列,其原始名称前缀为pct_change。里面的乐趣,。指的是该列中的值,因此我们可以通过除以前一个值并减去1来直接计算比例变化。
library(tidyverse)
set.seed(100)
tbl <- tibble(
day = 1:4,
col1 = rnorm(4, mean = 10),
col2 = rnorm(4, mean = 10)
)
tbl %>%
mutate_at(
.vars = vars(-day),
.funs = funs(pct_change = (. / lag(.)) - 1)
)
#> # A tibble: 4 x 5
#> day col1 col2 col1_pct_change col2_pct_change
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 9.50 10.1 NA NA
#> 2 2 10.1 10.3 0.0667 0.0199
#> 3 3 9.92 9.42 -0.0208 -0.0873
#> 4 4 10.9 10.7 0.0973 0.138
Created on 2018-04-19 by the reprex package (v0.2.0).
由reprex包(v0.2.0)于2018-04-19创建。
#2
0
A data.table
based solution can be as:
基于data.table的解决方案可以是:
library(data.table)
library(lubridate)
#Convert Date column to of type Date/POSIXct and order on that column
DT[,Date:=ymd(Date)][
order(Date),c(.(Date = Date), lapply(.SD,function(x)(x/lag(x) - 1))),.SDcols=2:4]
# Date AEDUSD AUDUSD CADUSD
# 1: 2008-03-17 NA NA NA
# 2: 2008-03-18 0.0000000000 -0.021097496 -0.006626446
# 3: 2008-03-19 0.0000000000 0.009361054 0.001294305
# 4: 2008-03-20 -0.0003671072 -0.006578238 -0.009645023
# 5: 2008-03-24 0.0000000000 -0.026378637 -0.018775100
Data:
DT = data.table(Date = c("2008-03-17","2008-03-18","2008-03-19","2008-03-20","2008-03-24"),
AEDUSD = c(0.2724, 0.2724, 0.2724, 0.2723, 0.2723),
AUDUSD = c(0.9385, 0.9187, 0.9273, 0.9212, 0.8969),
CADUSD = c(1.0111, 1.0044, 1.0057, 0.9960, 0.9773))
#1
2
Here is a solution using some made up data and tidyverse
functions. The key thing here is dplyr::lag
, which lets you reference the previous value in a data frame. Note that the selection vars(-day)
selects all the columns except day
, and the use of funs(pct_change = )
creates new columns with the original name prepended to pct_change
. Inside funs
, the .
refers to a value in that column, so we can directly calculate the proportional change by dividing by the previous value and subtracting 1.
这是一个使用一些组合数据和整数函数的解决方案。这里的关键是dplyr :: lag,它允许您引用数据框中的先前值。请注意,选择变量(-day)选择除日之外的所有列,并且使用funs(pct_change =)会创建新的列,其原始名称前缀为pct_change。里面的乐趣,。指的是该列中的值,因此我们可以通过除以前一个值并减去1来直接计算比例变化。
library(tidyverse)
set.seed(100)
tbl <- tibble(
day = 1:4,
col1 = rnorm(4, mean = 10),
col2 = rnorm(4, mean = 10)
)
tbl %>%
mutate_at(
.vars = vars(-day),
.funs = funs(pct_change = (. / lag(.)) - 1)
)
#> # A tibble: 4 x 5
#> day col1 col2 col1_pct_change col2_pct_change
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 9.50 10.1 NA NA
#> 2 2 10.1 10.3 0.0667 0.0199
#> 3 3 9.92 9.42 -0.0208 -0.0873
#> 4 4 10.9 10.7 0.0973 0.138
Created on 2018-04-19 by the reprex package (v0.2.0).
由reprex包(v0.2.0)于2018-04-19创建。
#2
0
A data.table
based solution can be as:
基于data.table的解决方案可以是:
library(data.table)
library(lubridate)
#Convert Date column to of type Date/POSIXct and order on that column
DT[,Date:=ymd(Date)][
order(Date),c(.(Date = Date), lapply(.SD,function(x)(x/lag(x) - 1))),.SDcols=2:4]
# Date AEDUSD AUDUSD CADUSD
# 1: 2008-03-17 NA NA NA
# 2: 2008-03-18 0.0000000000 -0.021097496 -0.006626446
# 3: 2008-03-19 0.0000000000 0.009361054 0.001294305
# 4: 2008-03-20 -0.0003671072 -0.006578238 -0.009645023
# 5: 2008-03-24 0.0000000000 -0.026378637 -0.018775100
Data:
DT = data.table(Date = c("2008-03-17","2008-03-18","2008-03-19","2008-03-20","2008-03-24"),
AEDUSD = c(0.2724, 0.2724, 0.2724, 0.2723, 0.2723),
AUDUSD = c(0.9385, 0.9187, 0.9273, 0.9212, 0.8969),
CADUSD = c(1.0111, 1.0044, 1.0057, 0.9960, 0.9773))