unique_ref priority_201801 balance_201801 action_201801 priority_201802 balance_201802 action_201802 priority_201803 balance_201803 action_201803
1 1 3 30 text,letter 1 60 letter 2 30 text
2 2 2 -20 visit 1 -40 text 2 -40 call
3 3 3 35 letter 1 35 call 3 -50 text
4 4 0 -100 call 2 0 visit 2 100 call
priority_201804 balance_201804 action_201804
1 99 0 text
2 0 -20 visit
3 0 -50 letter
4 0 -100 text
Above is a sample of my dataset. I want to search through each action looking for the first instance of the action "text", this will then be Week0, I then want to line up the priority's and balances of the next weeks, as Week1, Week2 etc.
以上是我的数据集的示例。我想搜索每个动作,寻找动作“文本”的第一个实例,这将是第0周,然后我想要排列下周的优先级和余额,如第1周,第2周等。
Desirable output:
unique_ref week0_priority week0_balance week0_action week1_priority week1_balance week1_action week2_priority week2_balance week2_action
1 1 3 30 text 1 60 letter 2 30 text
2 2 1 -40 text 2 -40 call 0 -20 visit
3 3 3 -50 text 0 -40 letter No Data No Data No Data
4 4 0 -100 text No Data No Data No Data No Data No Data No Data
week3_priority week3_balance week3_action
1 99 0 text
2 No Data No Data No Data
3 No Data No Data No Data
4 No Data No Data No Data
Reproducible example of data:
可重复的数据示例:
unique_ref=c(1,2,3,4)
priority_201801=c('3','2','3','0')
balance_201801=c('30','-20','35','-100')
action_201801=c('text,letter','visit','letter','call')
priority_201802=c('1','1','1','2')
balance_201802=c('60','-40','35','0')
action_201802=c('letter','text', 'call', 'visit')
priority_201803=c('2','2','3','2')
balance_201803=c('30','-40','-50','100')
action_201803=c('text','call','text','call')
priority_201804=c('99','0','0','0')
balance_201804=c('0','-20','-50','-100')
action_201804=c('text','visit','letter','text')
df3=as.data.frame(cbind(unique_ref,priority_201801,balance_201801,action_201801,priority_201802,balance_201802,action_201802,priority_201803,
balance_201803,action_201803,priority_201804,balance_201804,action_201804))
1 个解决方案
#1
0
I came up with a solution involving some reshaping of the data.
我想出了一个涉及重新整理数据的解决方案。
library(data.table)
library(tidyr)
library(dplyr)
dt_long <- gather(data = df3, key = "unique_ref", value = colnames(df3)[2:13])
colnames(dt_long)[2] <- "Values"
dt3_long <- data.table(dt_long)
dt3_long[, c("Type", "Week.Num") := tstrsplit(unique_ref, "_", fixed = TRUE)]
dt3_merge <- merge(x = dt3_long
, y = unique(dt3_long[like(Values, "text"),.(Type
,Values
, Week.Num
)])[, .(Week.Num, Week.ID = paste0("week", rank(Week.Num, ties.method = "min" )-1))]
, by.x = "Week.Num"
, by.y = "Week.Num"
, all.x = TRUE
, allow.cartesian = TRUE
)
The resulting data.table dt3_merge looks like the following and has all necessary information, but just needs to be reshaped.
生成的data.table dt3_merge看起来如下所示,并且包含所有必要的信息,但只需要重新整形。
dt3_merge
Week.Num unique_ref Values Type Week.ID
1: 201801 priority_201801 3 priority week0
2: 201801 priority_201801 2 priority week0
3: 201801 priority_201801 3 priority week0
4: 201801 priority_201801 0 priority week0
5: 201801 balance_201801 30 balance week0
6: 201801 balance_201801 -20 balance week0
7: 201801 balance_201801 35 balance week0
8: 201801 balance_201801 -100 balance week0
9: 201801 action_201801 text,letter action week0
10: 201801 action_201801 visit action week0
11: 201801 action_201801 letter action week0
12: 201801 action_201801 call action week0
13: 201802 priority_201802 1 priority week1
14: 201802 priority_201802 1 priority week1
15: 201802 priority_201802 1 priority week1
16: 201802 priority_201802 2 priority week1
17: 201802 balance_201802 60 balance week1
18: 201802 balance_201802 -40 balance week1
19: 201802 balance_201802 35 balance week1
20: 201802 balance_201802 0 balance week1
21: 201802 action_201802 letter action week1
22: 201802 action_201802 text action week1
23: 201802 action_201802 call action week1
24: 201802 action_201802 visit action week1
25: 201803 priority_201803 2 priority week2
26: 201803 priority_201803 2 priority week2
27: 201803 priority_201803 3 priority week2
28: 201803 priority_201803 2 priority week2
29: 201803 balance_201803 30 balance week2
30: 201803 balance_201803 -40 balance week2
31: 201803 balance_201803 -50 balance week2
32: 201803 balance_201803 100 balance week2
33: 201803 action_201803 text action week2
34: 201803 action_201803 call action week2
35: 201803 action_201803 text action week2
36: 201803 action_201803 call action week2
37: 201804 priority_201804 99 priority week3
38: 201804 priority_201804 0 priority week3
39: 201804 priority_201804 0 priority week3
40: 201804 priority_201804 0 priority week3
41: 201804 balance_201804 0 balance week3
42: 201804 balance_201804 -20 balance week3
43: 201804 balance_201804 -50 balance week3
44: 201804 balance_201804 -100 balance week3
45: 201804 action_201804 text action week3
46: 201804 action_201804 visit action week3
47: 201804 action_201804 letter action week3
48: 201804 action_201804 text action week3
Week.Num unique_ref Values Type Week.ID
Finally the call to dcast to reshape it into wide
最后呼吁dcast将其重塑为广泛的
dt3_merge[, `:=`(id = 1:.N), by = list(unique_ref)]
dt3_result <- dcast(dt3_merge, id ~ Type + Week.ID, value.var=c('Values'))
id action_week0 action_week1 action_week2 action_week3 balance_week0 balance_week1
1: 1 text,letter letter text text 30 60
2: 2 visit text call visit -20 -40
3: 3 letter call text letter 35 35
4: 4 call visit call text -100 0
balance_week2 balance_week3 priority_week0 priority_week1 priority_week2 priority_week3
1: 30 0 3 1 2 99
2: -40 -20 2 1 2 0
3: -50 -50 3 1 3 0
4: 100 -100 0 2 2 0
#1
0
I came up with a solution involving some reshaping of the data.
我想出了一个涉及重新整理数据的解决方案。
library(data.table)
library(tidyr)
library(dplyr)
dt_long <- gather(data = df3, key = "unique_ref", value = colnames(df3)[2:13])
colnames(dt_long)[2] <- "Values"
dt3_long <- data.table(dt_long)
dt3_long[, c("Type", "Week.Num") := tstrsplit(unique_ref, "_", fixed = TRUE)]
dt3_merge <- merge(x = dt3_long
, y = unique(dt3_long[like(Values, "text"),.(Type
,Values
, Week.Num
)])[, .(Week.Num, Week.ID = paste0("week", rank(Week.Num, ties.method = "min" )-1))]
, by.x = "Week.Num"
, by.y = "Week.Num"
, all.x = TRUE
, allow.cartesian = TRUE
)
The resulting data.table dt3_merge looks like the following and has all necessary information, but just needs to be reshaped.
生成的data.table dt3_merge看起来如下所示,并且包含所有必要的信息,但只需要重新整形。
dt3_merge
Week.Num unique_ref Values Type Week.ID
1: 201801 priority_201801 3 priority week0
2: 201801 priority_201801 2 priority week0
3: 201801 priority_201801 3 priority week0
4: 201801 priority_201801 0 priority week0
5: 201801 balance_201801 30 balance week0
6: 201801 balance_201801 -20 balance week0
7: 201801 balance_201801 35 balance week0
8: 201801 balance_201801 -100 balance week0
9: 201801 action_201801 text,letter action week0
10: 201801 action_201801 visit action week0
11: 201801 action_201801 letter action week0
12: 201801 action_201801 call action week0
13: 201802 priority_201802 1 priority week1
14: 201802 priority_201802 1 priority week1
15: 201802 priority_201802 1 priority week1
16: 201802 priority_201802 2 priority week1
17: 201802 balance_201802 60 balance week1
18: 201802 balance_201802 -40 balance week1
19: 201802 balance_201802 35 balance week1
20: 201802 balance_201802 0 balance week1
21: 201802 action_201802 letter action week1
22: 201802 action_201802 text action week1
23: 201802 action_201802 call action week1
24: 201802 action_201802 visit action week1
25: 201803 priority_201803 2 priority week2
26: 201803 priority_201803 2 priority week2
27: 201803 priority_201803 3 priority week2
28: 201803 priority_201803 2 priority week2
29: 201803 balance_201803 30 balance week2
30: 201803 balance_201803 -40 balance week2
31: 201803 balance_201803 -50 balance week2
32: 201803 balance_201803 100 balance week2
33: 201803 action_201803 text action week2
34: 201803 action_201803 call action week2
35: 201803 action_201803 text action week2
36: 201803 action_201803 call action week2
37: 201804 priority_201804 99 priority week3
38: 201804 priority_201804 0 priority week3
39: 201804 priority_201804 0 priority week3
40: 201804 priority_201804 0 priority week3
41: 201804 balance_201804 0 balance week3
42: 201804 balance_201804 -20 balance week3
43: 201804 balance_201804 -50 balance week3
44: 201804 balance_201804 -100 balance week3
45: 201804 action_201804 text action week3
46: 201804 action_201804 visit action week3
47: 201804 action_201804 letter action week3
48: 201804 action_201804 text action week3
Week.Num unique_ref Values Type Week.ID
Finally the call to dcast to reshape it into wide
最后呼吁dcast将其重塑为广泛的
dt3_merge[, `:=`(id = 1:.N), by = list(unique_ref)]
dt3_result <- dcast(dt3_merge, id ~ Type + Week.ID, value.var=c('Values'))
id action_week0 action_week1 action_week2 action_week3 balance_week0 balance_week1
1: 1 text,letter letter text text 30 60
2: 2 visit text call visit -20 -40
3: 3 letter call text letter 35 35
4: 4 call visit call text -100 0
balance_week2 balance_week3 priority_week0 priority_week1 priority_week2 priority_week3
1: 30 0 3 1 2 99
2: -40 -20 2 1 2 0
3: -50 -50 3 1 3 0
4: 100 -100 0 2 2 0