在R中,如何使用另一个data.frame的一列中的唯一值创建data.frame?

时间:2021-05-27 09:16:47

I'm trying to learn R, but I'm stuck on something that seems simple. I know SQL, and the easiest way for me to communicate my question is with that language. Can someone help me with a translation from SQL to R?

我正在努力学习R,但我仍然坚持看似简单的事情。我知道SQL,我最简单的方式就是用这种语言来表达我的问题。有人可以帮我解决从SQL到R的翻译吗?

I've figured out that this:

我发现这个:

    SELECT col1, sum(col2) FROM table1 GROUP BY col1

translates into this:

转化为:

    aggregate(x=table1$col2, by=list(table1$col1), FUN=sum)

And I've figured out that this:

而且我已经想到了这个:

    SELECT col1, col2 FROM table1 GROUP BY col1, col2

translates into this:

转化为:

    unique(table1[,c("col1","col2")])

But what is the translation for this?

但是这个翻译是什么?

    SELECT col1 FROM table1 GROUP BY col1

For some reason, the "unique" function seems to switch to a different return type when working on only one column, so it doesn't work as I would expect.

出于某种原因,“唯一”函数似乎在仅处理一列时切换到不同的返回类型,因此它不能像我期望的那样工作。

-TC

2 个解决方案

#1


2  

I'm guessing that you are referring to the fact that calling unique on a vector will return a vector, rather than a data frame. Here are a couple of examples that may help:

我猜你指的是在向量上调用unique会返回一个向量而不是数据帧。以下是一些可能有用的示例:

#Some example data
dat <- data.frame(x = rep(letters[1:2],times = 5),
                  y = rep(letters[3:4],each = 5))
> dat
   x y
1  a c
2  b c
3  a c
4  b c
5  a c
6  b d
7  a d
8  b d
9  a d
10 b d
> unique(dat)
  x y
1 a c
2 b c
6 b d
7 a d
#Unique => vector
> unique(dat$x)
[1] "a" "b"
#Same thing
> unique(dat[,'x'])
[1] "a" "b"
#drop = FALSE preserves the data frame structure
> unique(dat[,'x',drop = FALSE])
  x
1 a
2 b
#Or you can just convert it back (although the default column name is ugly)
> data.frame(unique(dat$x))
  unique.dat.x.
1             a
2             b

#2


1  

If you know SQL then try packages sqldf and data.table.

如果你知道SQL,那么尝试包sqldf和data.table。

#1


2  

I'm guessing that you are referring to the fact that calling unique on a vector will return a vector, rather than a data frame. Here are a couple of examples that may help:

我猜你指的是在向量上调用unique会返回一个向量而不是数据帧。以下是一些可能有用的示例:

#Some example data
dat <- data.frame(x = rep(letters[1:2],times = 5),
                  y = rep(letters[3:4],each = 5))
> dat
   x y
1  a c
2  b c
3  a c
4  b c
5  a c
6  b d
7  a d
8  b d
9  a d
10 b d
> unique(dat)
  x y
1 a c
2 b c
6 b d
7 a d
#Unique => vector
> unique(dat$x)
[1] "a" "b"
#Same thing
> unique(dat[,'x'])
[1] "a" "b"
#drop = FALSE preserves the data frame structure
> unique(dat[,'x',drop = FALSE])
  x
1 a
2 b
#Or you can just convert it back (although the default column name is ugly)
> data.frame(unique(dat$x))
  unique.dat.x.
1             a
2             b

#2


1  

If you know SQL then try packages sqldf and data.table.

如果你知道SQL,那么尝试包sqldf和data.table。