计算发生一对值的行数

my data frame looks like this:

我的数据帧是这样的:

Index   V1  v2  v3  v4  v5  v6

 1      a    b  c   d    e  f

 2      b    c  d   e       
 3      a    b  c   f    g  
 4      a    c  f   d    g  
 5      b    c  d   g    h  i
 .      .    .  .   .    .  .
 .      .    .  .   .    .  .

I need to iterate through each row in the data frame and pick up pairs that appear together, and count them. For example a and b appears in row index 1 and 3, so count = 2.

我需要遍历数据框中的每一行，并选择出现在一起的对，并对它们进行计数。例如，a和b出现在第1和第3行，所以count = 2。

Data frame has 6 columns excluding index and 554 rows. 6 variables in each row out of a possible 11.

数据帧有6列，不包括索引和554行。每一行有6个变量。

First step would be to do the pair of a and b.

第一步是做a和b的对。

Then to do all combinations. eg. a+c, a+d, a+e... b+c, b+d...

然后做所有的组合。如。a + c + d、e +……b + c,b + d…

I've used table(apply(df,1,function(x) paste(sort(x), collapse='-'))) and count(df) from the plyr package but the output was freq of a+b, a+b+c.... b+c, b+c+d.

我用表(适用(df 1函数(x)粘贴(排序(x)崩溃= '——')))和计数(df)plyr包但输出频率的a + b,a + b + c ....b + c,b + c + d。

I need freq of all pairs. So the freq of a+b = (freq of a+b) + (freq of a+b+c) + (freq of a+b+c+d) and so on

我需要所有双的freq。a+b = (a+b) + (a+b+c) + (a+b+c) + (a+b+c+d)等等

In excel, I've tried COUNTIF. Such that COUNTIF(column1,a,column2,b), but a and b aren't always in columns 1 and 2 respectively.

在excel中，我试过COUNTIF。这样，COUNTIF(column1,a,column2,b)但a和b并不总是分别位于列1和列2中。

Also tried COUNTIF(df,a,df,b) but that gave me a huge number.

也尝试过COUNTIF(df,a,df,b)但那给了我一个很大的数字。

Can be done in either r or excel. Although I think it would be faster in R.

可以在r或excel中完成。虽然我认为R会更快。

1 个解决方案

#1

Using an example random data, let's assume that the data frame is in C5:H558.

使用一个随机数据示例，我们假设数据帧位于C5:H558中。

Define a name str as

将名称str定义为

=$C$5:$C$558&$D$5:$D$558&$E$5:$E$558&$F$5:$F$558&$G$5:$G$558&$H$5:$H$558

Enter the symbols in L5:V5 as well as in K6:K16.

在L5:V5和K6:K16中输入符号。

Enter this counting formula

进入这个计算公式

=IF(CODE($K7)>CODE(L$5),SUMPRODUCT(1-N(ISERROR(FIND($K7,str))+N(ISERROR(FIND(L$5,str)))>0)),"")

in L6 and copy it to fill the rest of the table L6:V16.

在L6中复制它以填充表的其余部分L6:V16。

#1