如何从数据框中的两列中查找唯一字段值

时间:2021-06-06 01:42:44

I have a data frame containing many columns, including Quarter and CustomerID. In this I want to identify the unique combinations of Quarter and CustomerID.

我有一个包含许多列的数据框,包括Quarter和CustomerID。在此,我想确定Quarter和CustomerID的唯一组合。

For eg:

masterdf <- read.csv(text = "
    Quarter,  CustomerID, ProductID
    2009 Q1,    1234,     1
    2009 Q1,    1234,     2
    2009 Q2,    1324,     3
    2009 Q3,    1234,     4
    2009 Q3,    1234,     5
    2009 Q3,    8764,     6
    2009 Q4,    5432,     7")

What i want is:

我想要的是:

FilterQuarter     UniqueCustomerID
2009 Q1           1234
2009 Q2           1324
2009 Q3           8764
2009 Q3           1234
2009 Q4           5432

How to do this in R? I tried unique function but it is not working as i want.

在R中如何做到这一点?我尝试了独特的功能,但它不能按我的意愿工作。

2 个解决方案

#1


10  

The long comments under the OP are getting hard to follow. You are looking for duplicated as pointed out by @RomanLustrik. Use it to subset your original data.frame like this...

OP下的长篇评论越来越难以理解。您正在寻找@RomanLustrik指出的重复。使用它来分组您的原始data.frame像这样...

masterdf[ ! duplicated( masterdf[ c("Quarter" , "CustomerID") ] ) , ]
#  Quarter CustomerID
#1 2009 Q1       1234
#3 2009 Q2       1324
#4 2009 Q3       1234
#6 2009 Q3       8764
#7 2009 Q4       5432

#2


2  

Another simple way is to use SQL queries from R, check the codes below. This assumes masterdf is the name of the original file...

另一种简单的方法是使用R中的SQL查询,检查下面的代码。假设masterdf是原始文件的名称......

library(sqldf)
sqldf("select Quarter, CustomerID from masterdf group by 1,2")

#1


10  

The long comments under the OP are getting hard to follow. You are looking for duplicated as pointed out by @RomanLustrik. Use it to subset your original data.frame like this...

OP下的长篇评论越来越难以理解。您正在寻找@RomanLustrik指出的重复。使用它来分组您的原始data.frame像这样...

masterdf[ ! duplicated( masterdf[ c("Quarter" , "CustomerID") ] ) , ]
#  Quarter CustomerID
#1 2009 Q1       1234
#3 2009 Q2       1324
#4 2009 Q3       1234
#6 2009 Q3       8764
#7 2009 Q4       5432

#2


2  

Another simple way is to use SQL queries from R, check the codes below. This assumes masterdf is the name of the original file...

另一种简单的方法是使用R中的SQL查询,检查下面的代码。假设masterdf是原始文件的名称......

library(sqldf)
sqldf("select Quarter, CustomerID from masterdf group by 1,2")