It seems like such a simple problem, yet i've been pulling my hair out trying to get this to work:
这似乎是一个简单的问题,但我一直努力想让这个问题奏效:
Given this data frame identifying the interactions id
had with contact
who is grouped by contactGrp
,
给定这个数据帧,识别与联系人的交互,由contactGrp分组,
head(data)
id sesTs contact contactGrp relpos maxpos
1 6849 2012-06-25 15:58:34 peter west 0.000000 3
2 6849 2012-06-25 18:24:49 sarah south 0.500000 3
3 6849 2012-06-27 00:13:30 sarah south 1.000000 3
4 1235 2012-06-29 17:49:35 peter west 0.000000 2
5 1235 2012-06-29 23:56:35 peter west 1.000000 2
6 5893 2012-06-30 22:21:33 carl east 0.000000 1
how many contacts where there for unique(data$contactGrp)
with relpos=1
and maxpos>1
?
有多少联系人(数据$contactGrp)与relpos=1和maxpos>1 ?
An expected Result would be:
预期的结果是:
1 west 1
2 south 1
3 east 0
A small subset of lines i have tried:
我尝试过的一小部分线路:
-
aggregate(data, by=list('contactGrp'), FUN=count)
yields an error, no filtering - 聚合(数据,by=list('contactGrp'), FUN=count)产生一个错误,没有过滤。
- using
data.table
seems to require a key, which is not unique in this data… - 使用数据。表似乎需要一个键,这在这个数据中不是唯一的…
-
ddply(data,"contactGrp",summarise,count=???)
not sure which function to use to fill thecount
column - ddply(数据,“contactGrp”,总结,count=??)不确定用来填充count列的函数。
-
ddply(subset(data,maxpos>1 & relpos==0), c('contactGrp'), function(df)count(df$relpos))
works but gives me an extra columnx
and it feels like i've overcomplicated it… - ddply(数据,maxpos>1 & relpos==0), c('contactGrp'),函数(df)count(df$relpos))工作,但给我额外的列x,感觉好像我把它过于复杂了……
SQL would be easy: Select contactGrp, count(*) as cnt from data where … Group by contactGrp
but im trying to learn R
SQL很容易:选择contactGrp, count(*)为cnt,通过contactGrp进行分组,但我试图学习R。
4 个解决方案
#1
19
I think this is the ddply
version you're looking for:
我想这就是你要找的ddply版本:
ddply(sessions,.(contactGrp),
summarise,
count = length(contact[relpos == 0 & maxpos > 1]))
#2
22
And here is the data.table
solution:
这是数据。表解决方案:
> library(data.table)
> dt <- data.table(sessions)
> dt[, length(contact[relpos == 0 & maxpos > 1]), by = contactGrp]
contactGrp V1
[1,] west 2
[2,] south 0
[3,] east 0
> dt[, length(contact[relpos == 1 & maxpos > 1]), by = contactGrp]
contactGrp V1
[1,] west 1
[2,] south 1
[3,] east 0
#3
10
Here is an other approach:
下面是另一种方法:
a <- data.frame(id=1:10, contact=sample(c("peter", "sahrah"), 10, T), contactGrp=sample(c("west", "east"), 10, T), relpos=sample(0:1, 10, T), maxpos=runif(10, 0,10))
library(sqldf)
sqldf("Select contactGrp, count(*) as cnt from a where relpos=0 and maxpos > 1 Group by contactGrp")
contactGrp cnt
1 east 3
2 west 1
#4
10
Your first attempted line with aggregate doesn't work because there is no function count
. You meant length
. All you had to do was execute that with conditional data selection for relpos and maxpos, and also select a dummy variable to get the count of (doesn't matter which). Nevertheless, instead of using flexible aggregating commands of various kinds the built in table
command is designed just for this.
由于没有函数计数,所以您的第一行尝试的聚合方法不起作用。你是指长度。您所要做的就是使用条件数据选择relpos和maxpos,并选择一个哑变量来获得计数(不重要)。然而,并不是使用各种类型的灵活的聚合命令,而是针对此设计了表命令。
with( data[data$relpos == 1 & data$maxpos > 1,], table(contactGrp) )
#1
19
I think this is the ddply
version you're looking for:
我想这就是你要找的ddply版本:
ddply(sessions,.(contactGrp),
summarise,
count = length(contact[relpos == 0 & maxpos > 1]))
#2
22
And here is the data.table
solution:
这是数据。表解决方案:
> library(data.table)
> dt <- data.table(sessions)
> dt[, length(contact[relpos == 0 & maxpos > 1]), by = contactGrp]
contactGrp V1
[1,] west 2
[2,] south 0
[3,] east 0
> dt[, length(contact[relpos == 1 & maxpos > 1]), by = contactGrp]
contactGrp V1
[1,] west 1
[2,] south 1
[3,] east 0
#3
10
Here is an other approach:
下面是另一种方法:
a <- data.frame(id=1:10, contact=sample(c("peter", "sahrah"), 10, T), contactGrp=sample(c("west", "east"), 10, T), relpos=sample(0:1, 10, T), maxpos=runif(10, 0,10))
library(sqldf)
sqldf("Select contactGrp, count(*) as cnt from a where relpos=0 and maxpos > 1 Group by contactGrp")
contactGrp cnt
1 east 3
2 west 1
#4
10
Your first attempted line with aggregate doesn't work because there is no function count
. You meant length
. All you had to do was execute that with conditional data selection for relpos and maxpos, and also select a dummy variable to get the count of (doesn't matter which). Nevertheless, instead of using flexible aggregating commands of various kinds the built in table
command is designed just for this.
由于没有函数计数,所以您的第一行尝试的聚合方法不起作用。你是指长度。您所要做的就是使用条件数据选择relpos和maxpos,并选择一个哑变量来获得计数(不重要)。然而,并不是使用各种类型的灵活的聚合命令,而是针对此设计了表命令。
with( data[data$relpos == 1 & data$maxpos > 1,], table(contactGrp) )