R function():如何传递包含字符和正则表达式的参数

时间:2022-07-25 18:00:19

my data as follows:

我的数据如下:

>df2

> DF2

  id   calmonth        product
1 101       01           apple
2 102       01 apple&nokia&htc
3 103       01             htc
4 104       01       apple&htc
5 104       02           nokia

Now i wanna calculate the number of ids whose products contain both 'apple' and 'htc' when calmonth='01'. Because what i need is not only 'apple' and 'htc', also i need 'apple' and 'nokia',etc. So i want to realize this by a function like this:

现在我想计算当calmonth ='01'时其产品包含'apple'和'htc'的id的数量。因为我需要的不仅是'苹果'和'htc',我还需要'苹果'和'诺基亚'等。所以我想通过这样的函数来实现这个:

xandy=function(a,b) data.frame(product=paste(a,b,sep='&'),
                               csum=length(grep('a.*b',x=df2$product))
                              )

also, i make a parameters list like this:

另外,我制作一个这样的参数列表:

para=c('apple','htc','nokia')

but the problem is here. When i pass parameters like

但问题出在这里。当我传递参数时

xandy(para[1],para[2])

the results is as follows:

结果如下:

  product    csum
1 apple&htc    0

What my expecting result should be

我期待的结果应该是什么

  product    csum   calmonth
1 apple&htc    2     01
2 apple&htc    0     02

So where is wrong about the parameters passing? and, how can i add the calmonth in to the function() xandy correctly? FYI.This question stems from my another question before What's the R statement responding to SQL's 'in' statement

那么参数传递的错误在哪里?并且,我如何正确地将calmonth添加到函数()xandy中?仅供参考。这个问题源于我之前的另一个问题,即什么是R语句响应SQL的'in'语句


EDIT AFTER COMMENT

评论后编辑

My predictive result will be:

我的预测结果将是:

product    csum   calmonth
 1 apple&htc    2     01
 2 apple&htc    0     02

1 个解决方案

#1


1  

May answer is another way how to tackle your problem.

可以回答是另一种解决问题的方法。

library(stringr)

The function contains will split up the elements of a string vector according to the split character and evaluate if all target words are contained.

函数contains将根据拆分字符拆分字符串向量的元素,并评估是否包含所有目标字。

contains <- function(x, target, split="&") {
  l <- str_split(x, split)
  sapply(l, function(x, y) all(y %in% x), y=target)  
}

contains(d$product, c("apple", "htc")) 
[1] FALSE  TRUE FALSE  TRUE FALSE

The rest is just subsetting and summarizing

其余的只是子集和总结

get_data <- function(a, b) {
  e <- subset(d, contains(product, c(a, b)))
  e$product2 <- paste(a, b, sep="&")
  ddply(e, .(calmonth, product2), summarise, csum=length(id))
}

Using the data below, order does not play a role now anymore (see comment below).

使用下面的数据,订单现在不再发挥作用(请参阅下面的评论)。

get_data("apple", "htc")

  calmonth  product2 csum
1        1 apple&htc    1
2        2 apple&htc    2

get_data("htc", "apple")

  calmonth  product2 csum
1        1 htc&apple    1
2        2 htc&apple    2

I know this is not a direct answer to your question but I find this approach quite clean.

我知道这不是你问题的直接答案,但我觉得这个方法很干净。

EDIT AFTER COMMENT

评论后编辑

The reason that you get csum=0 is simply that you are searching for the wrong regex pattern, i.e. a something in between b not for apple ... htc. You need to construct the correct regex pattern,i.e. paste0(a, ".*", b).

你得到csum = 0的原因只是你正在寻找错误的正则表达式模式,即b之间的东西而不是苹果... htc。你需要构造正确的正则表达式模式,即。 paste0(a,“。*”,b)。

Here a complete solution. I would not call it beautiful code, but anyway (note that I change the data to show that it generalizes for months).

这是一个完整的解决方我不会称它为漂亮的代码,但无论如何(注意我更改数据以显示它概括了几个月)。

library(plyr)

df2 <- read.table(text="
  id   calmonth        product
 101       01           apple
 102       01 apple&nokia&htc
 103       01             htc
 104       02       apple&htc
 104       02       apple&htc",  header=T)

xandy <- function(a, b) {
  pattern <- paste0(a, ".*", b)
  d1 <- df2[grep(pattern, df2$product), ]
  d1$product <- paste0(a,"&", b)
  ddply(d1, .(calmonth), summarise, 
        csum=length(calmonth),
        product=unique(product))  
}
xandy("apple", "htc")

  calmonth csum   product
1        1    1 apple&htc
2        2    2 apple&htc

#1


1  

May answer is another way how to tackle your problem.

可以回答是另一种解决问题的方法。

library(stringr)

The function contains will split up the elements of a string vector according to the split character and evaluate if all target words are contained.

函数contains将根据拆分字符拆分字符串向量的元素,并评估是否包含所有目标字。

contains <- function(x, target, split="&") {
  l <- str_split(x, split)
  sapply(l, function(x, y) all(y %in% x), y=target)  
}

contains(d$product, c("apple", "htc")) 
[1] FALSE  TRUE FALSE  TRUE FALSE

The rest is just subsetting and summarizing

其余的只是子集和总结

get_data <- function(a, b) {
  e <- subset(d, contains(product, c(a, b)))
  e$product2 <- paste(a, b, sep="&")
  ddply(e, .(calmonth, product2), summarise, csum=length(id))
}

Using the data below, order does not play a role now anymore (see comment below).

使用下面的数据,订单现在不再发挥作用(请参阅下面的评论)。

get_data("apple", "htc")

  calmonth  product2 csum
1        1 apple&htc    1
2        2 apple&htc    2

get_data("htc", "apple")

  calmonth  product2 csum
1        1 htc&apple    1
2        2 htc&apple    2

I know this is not a direct answer to your question but I find this approach quite clean.

我知道这不是你问题的直接答案,但我觉得这个方法很干净。

EDIT AFTER COMMENT

评论后编辑

The reason that you get csum=0 is simply that you are searching for the wrong regex pattern, i.e. a something in between b not for apple ... htc. You need to construct the correct regex pattern,i.e. paste0(a, ".*", b).

你得到csum = 0的原因只是你正在寻找错误的正则表达式模式,即b之间的东西而不是苹果... htc。你需要构造正确的正则表达式模式,即。 paste0(a,“。*”,b)。

Here a complete solution. I would not call it beautiful code, but anyway (note that I change the data to show that it generalizes for months).

这是一个完整的解决方我不会称它为漂亮的代码,但无论如何(注意我更改数据以显示它概括了几个月)。

library(plyr)

df2 <- read.table(text="
  id   calmonth        product
 101       01           apple
 102       01 apple&nokia&htc
 103       01             htc
 104       02       apple&htc
 104       02       apple&htc",  header=T)

xandy <- function(a, b) {
  pattern <- paste0(a, ".*", b)
  d1 <- df2[grep(pattern, df2$product), ]
  d1$product <- paste0(a,"&", b)
  ddply(d1, .(calmonth), summarise, 
        csum=length(calmonth),
        product=unique(product))  
}
xandy("apple", "htc")

  calmonth csum   product
1        1    1 apple&htc
2        2    2 apple&htc