I have the following data frame in R:
我在R中有如下的数据框架:
> str(df)
'data.frame': 545227 obs. of 15 variables:
$ ykod : int 93 93 93 93 93 93 93 93 93 93 ...
$ yad : Factor w/ 42 levels "BAKUGAN","BARBIE",..: 30 30 30 30 30 30 30 30 30 30 ...
$ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ...
$ donem: int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ...
$ sayi : int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ...
$ mkod : int 4 5 9 11 12 18 20 22 25 26 ...
$ mad : Factor w/ 10464 levels " Defne Market ",..: 405 8075 9710 10145 9297 7973 2542 3892 2759 5769 ...
$ mtip : Factor w/ 29 levels "Abone Bürosu ",..: 2 20 20 2 2 2 2 2 2 2 ...
$ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 2 2 ...
$ bkod : int 110565 110565 110565 110565 110565 110565 110565 110565 110565 110565 ...
$ bad : Factor w/ 212 levels "4. Levent","500 Evler",..: 167 167 167 167 167 167 167 167 167 167 ...
$ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ...
$ sevk : int 2 3 3 3 2 2 2 6 2 2 ...
$ iade : int 2 1 0 2 0 2 1 0 0 2 ...
$ satis: int 0 2 3 1 2 0 1 6 2 0 ...
I want to list unique (like SQL's DISTINCT) values for selected multiple variables. For example, unique(yad)
gives me the names of each 42 elements, but I need to extract two columns (yad
and per
together, with all unique combinations):
我想为选定的多个变量列出唯一的值(比如SQL的不同值)。例如,unique(yad)给出了每个42个元素的名称,但我需要提取两列(yad和per合在一起,所有唯一的组合):
yad per
--- ---
BARBIE AYLIK
BAKUGAN 2 AYLIK
MICKEY MOUSE 2 AYLIK
TINKERBELL 3 AYLIK
... ...
How can I achieve this?
我怎样才能做到这一点呢?
4 个解决方案
#1
85
How about using unique()
itself?
如何使用unique()本身?
df <- data.frame(yad = c("BARBIE", "BARBIE", "BAKUGAN", "BAKUGAN"),
per = c("AYLIK", "AYLIK", "2 AYLIK", "2 AYLIK"),
hmm = 1:4)
df
# yad per hmm
# 1 BARBIE AYLIK 1
# 2 BARBIE AYLIK 2
# 3 BAKUGAN 2 AYLIK 3
# 4 BAKUGAN 2 AYLIK 4
unique(df[c("yad", "per")])
# yad per
# 1 BARBIE AYLIK
# 3 BAKUGAN 2 AYLIK
#2
9
This is an addition to Josh's answer.
这是乔希回答的补充。
You can also keep the values of other variables while filtering out duplicated rows in data.table
还可以保留其他变量的值,同时过滤data.table中的重复行
Example:
例子:
library(data.table)
#create data table
dt <- data.table(
V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)],
V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)],
V3=c(1),
V4=c(2) )
> dt
# V1 V2 V3 V4
# A B 1 2
# A C 1 2
# A D 1 2
# A B 1 2
# B A 1 2
# C D 1 2
# C D 1 2
# E F 1 2
# G G 1 2
# A B 1 2
# set the key to all columns
setkey(dt)
# Get Unique lines in the data table
unique( dt[list(V1, V2), nomatch = 0] )
# V1 V2 V3 V4
# A B 1 2
# A C 1 2
# A D 1 2
# B A 1 2
# C D 1 2
# E F 1 2
# G G 1 2
Alert: If there are different combinations of values in the other variables, then your result will be
警告:如果在其他变量中有不同的值组合,那么您的结果将是
unique combination of V1 and V2
V1和V2的唯一组合
#3
5
There are a few ways to get all unique combinations of a set of factors.
有几种方法可以得到一组因子的所有唯一组合。
with(df, interaction(yad, per, drop=TRUE)) # gives labels
with(df, yad:per) # ditto
aggregate(numeric(nrow(df)), df[c("yad", "per")], length) # gives a data frame
#4
-1
df$new_var = paste(df$yad,df$per,sep = "_")
length(unique(df$new_var)) #for checking
df = df[!duplicated(df$new_var),]
nrow(df) # for checking , this should be equal to 2nd line output
df$new_var = NULL
#1
85
How about using unique()
itself?
如何使用unique()本身?
df <- data.frame(yad = c("BARBIE", "BARBIE", "BAKUGAN", "BAKUGAN"),
per = c("AYLIK", "AYLIK", "2 AYLIK", "2 AYLIK"),
hmm = 1:4)
df
# yad per hmm
# 1 BARBIE AYLIK 1
# 2 BARBIE AYLIK 2
# 3 BAKUGAN 2 AYLIK 3
# 4 BAKUGAN 2 AYLIK 4
unique(df[c("yad", "per")])
# yad per
# 1 BARBIE AYLIK
# 3 BAKUGAN 2 AYLIK
#2
9
This is an addition to Josh's answer.
这是乔希回答的补充。
You can also keep the values of other variables while filtering out duplicated rows in data.table
还可以保留其他变量的值,同时过滤data.table中的重复行
Example:
例子:
library(data.table)
#create data table
dt <- data.table(
V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)],
V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)],
V3=c(1),
V4=c(2) )
> dt
# V1 V2 V3 V4
# A B 1 2
# A C 1 2
# A D 1 2
# A B 1 2
# B A 1 2
# C D 1 2
# C D 1 2
# E F 1 2
# G G 1 2
# A B 1 2
# set the key to all columns
setkey(dt)
# Get Unique lines in the data table
unique( dt[list(V1, V2), nomatch = 0] )
# V1 V2 V3 V4
# A B 1 2
# A C 1 2
# A D 1 2
# B A 1 2
# C D 1 2
# E F 1 2
# G G 1 2
Alert: If there are different combinations of values in the other variables, then your result will be
警告:如果在其他变量中有不同的值组合,那么您的结果将是
unique combination of V1 and V2
V1和V2的唯一组合
#3
5
There are a few ways to get all unique combinations of a set of factors.
有几种方法可以得到一组因子的所有唯一组合。
with(df, interaction(yad, per, drop=TRUE)) # gives labels
with(df, yad:per) # ditto
aggregate(numeric(nrow(df)), df[c("yad", "per")], length) # gives a data frame
#4
-1
df$new_var = paste(df$yad,df$per,sep = "_")
length(unique(df$new_var)) #for checking
df = df[!duplicated(df$new_var),]
nrow(df) # for checking , this should be equal to 2nd line output
df$new_var = NULL