I have a data frame containing many columns, including Quarter
and CustomerID
. In this I want to identify the unique combinations of Quarter
and CustomerID
.
我有一个包含许多列的数据框,包括Quarter和CustomerID。在此,我想确定Quarter和CustomerID的唯一组合。
For eg:
masterdf <- read.csv(text = "
Quarter, CustomerID, ProductID
2009 Q1, 1234, 1
2009 Q1, 1234, 2
2009 Q2, 1324, 3
2009 Q3, 1234, 4
2009 Q3, 1234, 5
2009 Q3, 8764, 6
2009 Q4, 5432, 7")
What i want is:
我想要的是:
FilterQuarter UniqueCustomerID
2009 Q1 1234
2009 Q2 1324
2009 Q3 8764
2009 Q3 1234
2009 Q4 5432
How to do this in R? I tried unique
function but it is not working as i want.
在R中如何做到这一点?我尝试了独特的功能,但它不能按我的意愿工作。
2 个解决方案
#1
10
The long comments under the OP are getting hard to follow. You are looking for duplicated
as pointed out by @RomanLustrik. Use it to subset your original data.frame
like this...
OP下的长篇评论越来越难以理解。您正在寻找@RomanLustrik指出的重复。使用它来分组您的原始data.frame像这样...
masterdf[ ! duplicated( masterdf[ c("Quarter" , "CustomerID") ] ) , ]
# Quarter CustomerID
#1 2009 Q1 1234
#3 2009 Q2 1324
#4 2009 Q3 1234
#6 2009 Q3 8764
#7 2009 Q4 5432
#2
2
Another simple way is to use SQL
queries from R, check the codes below. This assumes masterdf is the name of the original file...
另一种简单的方法是使用R中的SQL查询,检查下面的代码。假设masterdf是原始文件的名称......
library(sqldf)
sqldf("select Quarter, CustomerID from masterdf group by 1,2")
#1
10
The long comments under the OP are getting hard to follow. You are looking for duplicated
as pointed out by @RomanLustrik. Use it to subset your original data.frame
like this...
OP下的长篇评论越来越难以理解。您正在寻找@RomanLustrik指出的重复。使用它来分组您的原始data.frame像这样...
masterdf[ ! duplicated( masterdf[ c("Quarter" , "CustomerID") ] ) , ]
# Quarter CustomerID
#1 2009 Q1 1234
#3 2009 Q2 1324
#4 2009 Q3 1234
#6 2009 Q3 8764
#7 2009 Q4 5432
#2
2
Another simple way is to use SQL
queries from R, check the codes below. This assumes masterdf is the name of the original file...
另一种简单的方法是使用R中的SQL查询,检查下面的代码。假设masterdf是原始文件的名称......
library(sqldf)
sqldf("select Quarter, CustomerID from masterdf group by 1,2")