I have the following dataset
我有以下数据集
#df
Factors Transactions
a,c 1
b 0
c 0
d,a 0
a 1
a 0
b 1
I'd like to know how many times we did not have a factor and we had a transaction. So, my desirable output is as follows:
我想知道有多少次我们没有因数,我们有交易。因此,我期望的输出如下:
#desired output
Factors count
a 1
b 2
c 2
d 3
For instance, only one time we didn't have a
and we had a transaction (i.e. only in the last row).
例如,只有一次我们没有a,我们有一个事务(例如,只在最后一行)。
There are many ways to know how many times we had each factor and we had transactions. For instance I tried this one:
有很多方法可以知道我们有多少次每个因子,我们有交易。比如我试过这个
library(data.table)
setDT(df)[, .(Factors = unlist(strsplit(as.character(Factors), ","))),
by = Transactions][,.(Transactions = sum(Transactions > 0)), by = Factors]
But I wish to count how many times we didn't have a factor and we had transaction.
但我想数一下有多少次我们没有因数,我们有交易。
Thanks in advance.
提前谢谢。
2 个解决方案
#1
2
You can calculate the opposite, i.e, how many times the factor has a transaction and then the difference between the total transactions and transactions for each individual factor would be what you are looking for:
你可以计算相反的i。e,这个因数有多少次交易然后每个因数的总交易数和交易数的差值就是你要找的:
library(data.table)
total <- sum(df$Transactions > 0)
(setDT(df)[, .(Factors = unlist(strsplit(as.character(Factors), ","))), Transactions]
[, total - sum(Transactions > 0), Factors])
# Factors V1
#1: a 1
#2: c 2
#3: b 2
#4: d 3
#2
1
We can also do this with cSplit
我们也可以用cSplit
library(splitstackshape)
cSplit(df, "Factors", ',', 'long')[, sum(df$Transactions) - sum(Transactions>0), Factors]
# Factors V1
#1: a 1
#2: c 2
#3: b 2
#4: d 3
Or with dplyr/tidyr
或与dplyr / tidyr
library(dplyr)
library(tidyr)
separate_rows(df, Factors) %>%
group_by(Factors) %>%
summarise(count = sum(df$Transactions) - sum(Transactions>0))
#1
2
You can calculate the opposite, i.e, how many times the factor has a transaction and then the difference between the total transactions and transactions for each individual factor would be what you are looking for:
你可以计算相反的i。e,这个因数有多少次交易然后每个因数的总交易数和交易数的差值就是你要找的:
library(data.table)
total <- sum(df$Transactions > 0)
(setDT(df)[, .(Factors = unlist(strsplit(as.character(Factors), ","))), Transactions]
[, total - sum(Transactions > 0), Factors])
# Factors V1
#1: a 1
#2: c 2
#3: b 2
#4: d 3
#2
1
We can also do this with cSplit
我们也可以用cSplit
library(splitstackshape)
cSplit(df, "Factors", ',', 'long')[, sum(df$Transactions) - sum(Transactions>0), Factors]
# Factors V1
#1: a 1
#2: c 2
#3: b 2
#4: d 3
Or with dplyr/tidyr
或与dplyr / tidyr
library(dplyr)
library(tidyr)
separate_rows(df, Factors) %>%
group_by(Factors) %>%
summarise(count = sum(df$Transactions) - sum(Transactions>0))