my data as follows:
我的数据如下:
>df2
> DF2
id calmonth product
1 101 01 apple
2 102 01 apple&nokia&htc
3 103 01 htc
4 104 01 apple&htc
5 104 02 nokia
Now i wanna calculate the number of ids
whose products
contain both 'apple' and 'htc'
when calmonth='01'
. Because what i need is not only 'apple' and 'htc', also i need 'apple' and 'nokia',etc. So i want to realize this by a function like this:
现在我想计算当calmonth ='01'时其产品包含'apple'和'htc'的id的数量。因为我需要的不仅是'苹果'和'htc',我还需要'苹果'和'诺基亚'等。所以我想通过这样的函数来实现这个:
xandy=function(a,b) data.frame(product=paste(a,b,sep='&'),
csum=length(grep('a.*b',x=df2$product))
)
also, i make a parameters list like this:
另外,我制作一个这样的参数列表:
para=c('apple','htc','nokia')
but the problem is here. When i pass parameters like
但问题出在这里。当我传递参数时
xandy(para[1],para[2])
the results is as follows:
结果如下:
product csum
1 apple&htc 0
What my expecting result should be
我期待的结果应该是什么
product csum calmonth
1 apple&htc 2 01
2 apple&htc 0 02
So where is wrong about the parameters passing? and, how can i add the calmonth
in to the function() xandy
correctly? FYI.This question stems from my another question before What's the R statement responding to SQL's 'in' statement
那么参数传递的错误在哪里?并且,我如何正确地将calmonth添加到函数()xandy中?仅供参考。这个问题源于我之前的另一个问题,即什么是R语句响应SQL的'in'语句
EDIT AFTER COMMENT
评论后编辑
My predictive result will be:
我的预测结果将是:
product csum calmonth
1 apple&htc 2 01
2 apple&htc 0 02
1 个解决方案
#1
1
May answer is another way how to tackle your problem.
可以回答是另一种解决问题的方法。
library(stringr)
The function contains
will split up the elements of a string vector according to the split
character and evaluate if all target words are contained.
函数contains将根据拆分字符拆分字符串向量的元素,并评估是否包含所有目标字。
contains <- function(x, target, split="&") {
l <- str_split(x, split)
sapply(l, function(x, y) all(y %in% x), y=target)
}
contains(d$product, c("apple", "htc"))
[1] FALSE TRUE FALSE TRUE FALSE
The rest is just subsetting and summarizing
其余的只是子集和总结
get_data <- function(a, b) {
e <- subset(d, contains(product, c(a, b)))
e$product2 <- paste(a, b, sep="&")
ddply(e, .(calmonth, product2), summarise, csum=length(id))
}
Using the data below, order does not play a role now anymore (see comment below).
使用下面的数据,订单现在不再发挥作用(请参阅下面的评论)。
get_data("apple", "htc")
calmonth product2 csum
1 1 apple&htc 1
2 2 apple&htc 2
get_data("htc", "apple")
calmonth product2 csum
1 1 htc&apple 1
2 2 htc&apple 2
I know this is not a direct answer to your question but I find this approach quite clean.
我知道这不是你问题的直接答案,但我觉得这个方法很干净。
EDIT AFTER COMMENT
评论后编辑
The reason that you get csum=0
is simply that you are searching for the wrong regex pattern, i.e. a something in between b
not for apple ... htc
. You need to construct the correct regex pattern,i.e. paste0(a, ".*", b)
.
你得到csum = 0的原因只是你正在寻找错误的正则表达式模式,即b之间的东西而不是苹果... htc。你需要构造正确的正则表达式模式,即。 paste0(a,“。*”,b)。
Here a complete solution. I would not call it beautiful code, but anyway (note that I change the data to show that it generalizes for months).
这是一个完整的解决方我不会称它为漂亮的代码,但无论如何(注意我更改数据以显示它概括了几个月)。
library(plyr)
df2 <- read.table(text="
id calmonth product
101 01 apple
102 01 apple&nokia&htc
103 01 htc
104 02 apple&htc
104 02 apple&htc", header=T)
xandy <- function(a, b) {
pattern <- paste0(a, ".*", b)
d1 <- df2[grep(pattern, df2$product), ]
d1$product <- paste0(a,"&", b)
ddply(d1, .(calmonth), summarise,
csum=length(calmonth),
product=unique(product))
}
xandy("apple", "htc")
calmonth csum product
1 1 1 apple&htc
2 2 2 apple&htc
#1
1
May answer is another way how to tackle your problem.
可以回答是另一种解决问题的方法。
library(stringr)
The function contains
will split up the elements of a string vector according to the split
character and evaluate if all target words are contained.
函数contains将根据拆分字符拆分字符串向量的元素,并评估是否包含所有目标字。
contains <- function(x, target, split="&") {
l <- str_split(x, split)
sapply(l, function(x, y) all(y %in% x), y=target)
}
contains(d$product, c("apple", "htc"))
[1] FALSE TRUE FALSE TRUE FALSE
The rest is just subsetting and summarizing
其余的只是子集和总结
get_data <- function(a, b) {
e <- subset(d, contains(product, c(a, b)))
e$product2 <- paste(a, b, sep="&")
ddply(e, .(calmonth, product2), summarise, csum=length(id))
}
Using the data below, order does not play a role now anymore (see comment below).
使用下面的数据,订单现在不再发挥作用(请参阅下面的评论)。
get_data("apple", "htc")
calmonth product2 csum
1 1 apple&htc 1
2 2 apple&htc 2
get_data("htc", "apple")
calmonth product2 csum
1 1 htc&apple 1
2 2 htc&apple 2
I know this is not a direct answer to your question but I find this approach quite clean.
我知道这不是你问题的直接答案,但我觉得这个方法很干净。
EDIT AFTER COMMENT
评论后编辑
The reason that you get csum=0
is simply that you are searching for the wrong regex pattern, i.e. a something in between b
not for apple ... htc
. You need to construct the correct regex pattern,i.e. paste0(a, ".*", b)
.
你得到csum = 0的原因只是你正在寻找错误的正则表达式模式,即b之间的东西而不是苹果... htc。你需要构造正确的正则表达式模式,即。 paste0(a,“。*”,b)。
Here a complete solution. I would not call it beautiful code, but anyway (note that I change the data to show that it generalizes for months).
这是一个完整的解决方我不会称它为漂亮的代码,但无论如何(注意我更改数据以显示它概括了几个月)。
library(plyr)
df2 <- read.table(text="
id calmonth product
101 01 apple
102 01 apple&nokia&htc
103 01 htc
104 02 apple&htc
104 02 apple&htc", header=T)
xandy <- function(a, b) {
pattern <- paste0(a, ".*", b)
d1 <- df2[grep(pattern, df2$product), ]
d1$product <- paste0(a,"&", b)
ddply(d1, .(calmonth), summarise,
csum=length(calmonth),
product=unique(product))
}
xandy("apple", "htc")
calmonth csum product
1 1 1 apple&htc
2 2 2 apple&htc