I have two vectors and I want to create a list in R where one vector are the keys and the other the values. I thought that I was going to find easily the answer in my books or googleing around and I was expecting to find a solution like when adding names to a vector ( names(v)<- names_vector), but I failed.
我有两个向量,我想在R中创建一个列表,其中一个向量是键,另一个是值。我以为我会在我的书中轻松找到答案或googleing并且我期待找到一个解决方案,比如在向量中添加名称(名称(v)< - names_vector),但我失败了。
I have come myself with two possible solutions but none of them seems elegant to me. R is not my main programming language but I assume that being R so pragmatic a better solution should exist (something like list(keys=x, values=y)).
我有两种可能的解决方案,但对我来说似乎都不优雅。 R不是我的主要编程语言,但我认为R是如此实用,应该存在更好的解决方案(类似于list(keys = x,values = y))。
My solution 1: the classical loop solution:
我的解决方案1:经典循环解决方案:
> xx <- 1:3
> yy <- letters1:3
> zz =list()
>for(i in 1:length(yy)) {zz[[yy[i]]]<-xx[i]}
my solution 2: indirect path through named vectors:
我的解决方案2:通过命名向量的间接路径:
> names(xx) <- letters[1:3]
> as.list(xx)
Seems that I have a solution, but my vectors have 1 million or more elements and I am worried not only about coding style (important to me) but also about efficiency (but I don't know how to do profiling in R). Is there a more appropriate way of doing this? Is it a good practice to use the named vector shortcut?
似乎我有一个解决方案,但我的矢量有100万或更多元素,我不仅担心编码风格(对我很重要)而且还担心效率(但我不知道如何在R中进行分析)。有没有更合适的方法呢?使用指定的矢量快捷方式是一个好习惯吗?
[[UPDATE]] my applogies, probably I oversimplify the question to make it reproducible. I wanted to give names to the elements of a list. I tried names() first but seems that I did something wrong and did not work. So I got the wrong idea that names() does not work with lists. But they indeed do as shown by the accepted answer
[[更新]]我的applogies,可能我过于简化了问题,使其可重现。我想给列表的元素命名。我首先尝试了names(),但似乎我做错了什么并且没有用。所以我错误地认为names()不适用于列表。但它们确实如接受的答案所示
5 个解决方案
#1
14
If your values are all scalars, then there's nothing wrong with having a "key-value store" that's just a vector.
如果你的值都是标量,那么拥有一个只是一个向量的“键值存储”并没有错。
vals <- 1:1000000
keys <- paste0("key", 1:1000000)
names(vals) <- keys
You can then retrieve the value corresponding to a given key with
然后,您可以检索与给定键对应的值
vals["key42"]
[1] 42
IIRC R uses hashing for character-based indexing, so lookups should be fast regardless of the size of your vector.
IIRC R使用散列进行基于字符的索引,因此无论向量的大小如何,查找都应该很快。
If your values can be arbitrary objects, then you do need a list.
如果您的值可以是任意对象,那么您需要一个列表。
vals <- list(1:100, lm(speed ~ dist, data=cars), function(x) x^2)
names(vals) <- c("numbers", "model", "function")
sq <- vals[["function"]]
sq(5)
[1] 25
If your question is about constructing the list, I wouldn't be too worried. R internally is copy-on-write (objects are only copied if their contents are modified), so doing something like
如果你的问题是关于构建列表,我不会太担心。 R内部是写时复制(只有在修改了内容时才复制对象),所以做了类似的事情
vals <- list(1:1000000, 1:1000000, <other big objects>)
will not actually make extra copies of everything.
实际上不会制作一切的额外副本。
Edit: I just checked, and R will copy everything if you do lst <- list(....)
. Go figure. So if you're already close to the memory limit on your machine, this won't work. On the other hand, if you do names(lst) <- ....
, it won't make another copy of lst
. Go figure again.
编辑:我刚检查过,如果你做lst < - list(....),R会复制所有内容。去搞清楚。因此,如果您已经接近计算机的内存限制,则无法使用。另一方面,如果你做名字(lst)< - ....,它将不会制作lst的另一个副本。再来一次。
#2
13
It can be done in one statement using setNames
:
它可以使用setNames在一个语句中完成:
xx <- 1:3
yy <- letters[1:3]
To create a named list:
要创建命名列表:
as.list(setNames(xx, yy))
# $a
# [1] 1
#
# $b
# [1] 2
#
# $c
# [1] 3
Or a named vector:
或者命名向量:
setNames(xx, yy)
# a b c
# 1 2 3
In the case of the list, this is programmatically equivalent to your "named vector" approach but maybe a little more elegant.
在列表的情况下,这在程序上等同于您的“命名向量”方法,但可能更优雅一点。
Here are some benchmarks that show the two approaches are just as fast. Also note that the order of operations is very important in avoiding an unnecessary and costly copy of the data:
以下是一些基准测试,表明这两种方法同样快。另请注意,操作顺序对于避免不必要且昂贵的数据副本非常重要:
f1 <- function(xx, yy) {
names(xx) <- yy
as.list(xx)
}
f2 <- function(xx, yy) {
out <- as.list(xx)
names(out) <- yy
out
}
f3 <- function(xx, yy) as.list(setNames(xx, yy))
f4 <- function(xx, yy) setNames(as.list(xx), yy)
library(microbenchmark)
microbenchmark(
f1(xx, yy),
f2(xx, yy),
f3(xx, yy),
f4(xx, yy)
)
# Unit: microseconds
# expr min lq median uq max neval
# f1(xx, yy) 41.207 42.6390 43.2885 45.7340 114.853 100
# f2(xx, yy) 39.187 40.3525 41.5330 43.7435 107.130 100
# f3(xx, yy) 39.280 41.2900 42.1450 43.8085 109.017 100
# f4(xx, yy) 76.278 78.1340 79.1450 80.7525 180.825 100
#3
4
Another serious option here , is to use data.table
. Which use the key to sort your structure and it is very fast to access elements specially when you have a large numbers . Here an example:
这里另一个严肃的选择是使用data.table。使用密钥对您的结构进行排序,并且特别是当您有大量数据时访问元素非常快。这是一个例子:
library(data.table)
DT <- data.table(xx = 1:1e6,
k = paste0("key", 1:1e6),key="k")
Dt is a data.table with 2 columns , where I set the column k as a key. DT xx k 1: 1 key1 2: 10 key10 3: 100 key100 4: 1000 key1000 5: 10000 key10000 ---
999996: 999995 key999995 999997: 999996 key999996 999998: 999997 key999997 999999: 999998 key999998 1000000: 999999 key999999
Dt是一个包含2列的data.table,其中我将列k设置为键。 DT xx k 1:1 key1 2:10 key10 3:100 key100 4:1000 key1000 5:10000 key10000 --- 999996:999995 key999995 999997:999996 key999996 999998:999997 key999997 999999:999998 key999998 1000000:999999 key999999
Now I can access my data.table using the key like this:
现在我可以使用这样的键访问我的data.table:
DT['key1000']
k xx
1: key1000 1000
Here a benchmarking comparing the data.table solution to a named vector:
这里有一个基准测试,将data.table解决方案与命名向量进行比较:
vals <- 1:1000000
DT <- data.table(xx = vals ,
k = paste0("key", vals),key="k")
keys <- paste0("key", vals)
names(vals) <- keys
library(microbenchmark)
microbenchmark( vals["key42"],DT["key42"],times=100)
Unit: microseconds
expr min lq median uq max neval
vals["key42"] 111938.692 113207.4945 114924.010 130010.832 361077.210 100
DT["key42"] 768.753 797.0085 1055.661 1067.987 2058.985 100
#4
3
Do you mean to do this?...
你的意思是这样做吗?...
xx <- 1:3
yy <- letters[1:3]
zz <- list( xx , yy )
names(zz) <- c("keys" , "values")
zz
#$keys
#[1] 1 2 3
#$values
#[1] "a" "b" "c"
AFAIK this is the canonical way of making a list of vectors. I am happy to be corrected. If you are new to R, I'd advise it is generally unwise to use a for
loop because there are usually vectorised methods to accomplish most tasks that are more efficient and faster.
AFAIK这是制作矢量列表的规范方法。我很高兴得到纠正。如果您是R的新手,我建议使用for循环通常是不明智的,因为通常有矢量化方法来完成大多数更有效和更快速的任务。
#5
0
Hong's output is wrong.
洪的输出是错误的。
Should use vals[["key42"]]
应该使用val [[“key42”]]
> vals[["key42"]]
[1] 42
vals <- 1:1000000
keys <- paste0("key", 1:1000000)
names(vals) <- keys
vals["key42"]
key42
42
#1
14
If your values are all scalars, then there's nothing wrong with having a "key-value store" that's just a vector.
如果你的值都是标量,那么拥有一个只是一个向量的“键值存储”并没有错。
vals <- 1:1000000
keys <- paste0("key", 1:1000000)
names(vals) <- keys
You can then retrieve the value corresponding to a given key with
然后,您可以检索与给定键对应的值
vals["key42"]
[1] 42
IIRC R uses hashing for character-based indexing, so lookups should be fast regardless of the size of your vector.
IIRC R使用散列进行基于字符的索引,因此无论向量的大小如何,查找都应该很快。
If your values can be arbitrary objects, then you do need a list.
如果您的值可以是任意对象,那么您需要一个列表。
vals <- list(1:100, lm(speed ~ dist, data=cars), function(x) x^2)
names(vals) <- c("numbers", "model", "function")
sq <- vals[["function"]]
sq(5)
[1] 25
If your question is about constructing the list, I wouldn't be too worried. R internally is copy-on-write (objects are only copied if their contents are modified), so doing something like
如果你的问题是关于构建列表,我不会太担心。 R内部是写时复制(只有在修改了内容时才复制对象),所以做了类似的事情
vals <- list(1:1000000, 1:1000000, <other big objects>)
will not actually make extra copies of everything.
实际上不会制作一切的额外副本。
Edit: I just checked, and R will copy everything if you do lst <- list(....)
. Go figure. So if you're already close to the memory limit on your machine, this won't work. On the other hand, if you do names(lst) <- ....
, it won't make another copy of lst
. Go figure again.
编辑:我刚检查过,如果你做lst < - list(....),R会复制所有内容。去搞清楚。因此,如果您已经接近计算机的内存限制,则无法使用。另一方面,如果你做名字(lst)< - ....,它将不会制作lst的另一个副本。再来一次。
#2
13
It can be done in one statement using setNames
:
它可以使用setNames在一个语句中完成:
xx <- 1:3
yy <- letters[1:3]
To create a named list:
要创建命名列表:
as.list(setNames(xx, yy))
# $a
# [1] 1
#
# $b
# [1] 2
#
# $c
# [1] 3
Or a named vector:
或者命名向量:
setNames(xx, yy)
# a b c
# 1 2 3
In the case of the list, this is programmatically equivalent to your "named vector" approach but maybe a little more elegant.
在列表的情况下,这在程序上等同于您的“命名向量”方法,但可能更优雅一点。
Here are some benchmarks that show the two approaches are just as fast. Also note that the order of operations is very important in avoiding an unnecessary and costly copy of the data:
以下是一些基准测试,表明这两种方法同样快。另请注意,操作顺序对于避免不必要且昂贵的数据副本非常重要:
f1 <- function(xx, yy) {
names(xx) <- yy
as.list(xx)
}
f2 <- function(xx, yy) {
out <- as.list(xx)
names(out) <- yy
out
}
f3 <- function(xx, yy) as.list(setNames(xx, yy))
f4 <- function(xx, yy) setNames(as.list(xx), yy)
library(microbenchmark)
microbenchmark(
f1(xx, yy),
f2(xx, yy),
f3(xx, yy),
f4(xx, yy)
)
# Unit: microseconds
# expr min lq median uq max neval
# f1(xx, yy) 41.207 42.6390 43.2885 45.7340 114.853 100
# f2(xx, yy) 39.187 40.3525 41.5330 43.7435 107.130 100
# f3(xx, yy) 39.280 41.2900 42.1450 43.8085 109.017 100
# f4(xx, yy) 76.278 78.1340 79.1450 80.7525 180.825 100
#3
4
Another serious option here , is to use data.table
. Which use the key to sort your structure and it is very fast to access elements specially when you have a large numbers . Here an example:
这里另一个严肃的选择是使用data.table。使用密钥对您的结构进行排序,并且特别是当您有大量数据时访问元素非常快。这是一个例子:
library(data.table)
DT <- data.table(xx = 1:1e6,
k = paste0("key", 1:1e6),key="k")
Dt is a data.table with 2 columns , where I set the column k as a key. DT xx k 1: 1 key1 2: 10 key10 3: 100 key100 4: 1000 key1000 5: 10000 key10000 ---
999996: 999995 key999995 999997: 999996 key999996 999998: 999997 key999997 999999: 999998 key999998 1000000: 999999 key999999
Dt是一个包含2列的data.table,其中我将列k设置为键。 DT xx k 1:1 key1 2:10 key10 3:100 key100 4:1000 key1000 5:10000 key10000 --- 999996:999995 key999995 999997:999996 key999996 999998:999997 key999997 999999:999998 key999998 1000000:999999 key999999
Now I can access my data.table using the key like this:
现在我可以使用这样的键访问我的data.table:
DT['key1000']
k xx
1: key1000 1000
Here a benchmarking comparing the data.table solution to a named vector:
这里有一个基准测试,将data.table解决方案与命名向量进行比较:
vals <- 1:1000000
DT <- data.table(xx = vals ,
k = paste0("key", vals),key="k")
keys <- paste0("key", vals)
names(vals) <- keys
library(microbenchmark)
microbenchmark( vals["key42"],DT["key42"],times=100)
Unit: microseconds
expr min lq median uq max neval
vals["key42"] 111938.692 113207.4945 114924.010 130010.832 361077.210 100
DT["key42"] 768.753 797.0085 1055.661 1067.987 2058.985 100
#4
3
Do you mean to do this?...
你的意思是这样做吗?...
xx <- 1:3
yy <- letters[1:3]
zz <- list( xx , yy )
names(zz) <- c("keys" , "values")
zz
#$keys
#[1] 1 2 3
#$values
#[1] "a" "b" "c"
AFAIK this is the canonical way of making a list of vectors. I am happy to be corrected. If you are new to R, I'd advise it is generally unwise to use a for
loop because there are usually vectorised methods to accomplish most tasks that are more efficient and faster.
AFAIK这是制作矢量列表的规范方法。我很高兴得到纠正。如果您是R的新手,我建议使用for循环通常是不明智的,因为通常有矢量化方法来完成大多数更有效和更快速的任务。
#5
0
Hong's output is wrong.
洪的输出是错误的。
Should use vals[["key42"]]
应该使用val [[“key42”]]
> vals[["key42"]]
[1] 42
vals <- 1:1000000
keys <- paste0("key", 1:1000000)
names(vals) <- keys
vals["key42"]
key42
42