如何通过在R中的data.table索引来执行逻辑运算符?

时间:2022-02-14 14:58:31

I am trying to figure out how I can perform logical operators when I use indexing in data.table package in R?

我试图找出当我在R中的data.table包中使用索引时如何执行逻辑运算符?

Following is the example. I make a datatable named as dt. and then make the var2 as the key in my datatable:

以下是示例。我创建了一个名为dt的数据表。然后将var2作为我的数据表中的键:

> dt = data.table(var1 = rep(LETTERS[1:5],2), var2 = seq(1,20, 2), var3 = ceiling(rnorm(10, 3, 2)))
> dt
    var1 var2 var3
 1:    A    1    5
 2:    B    3    3
 3:    C    5    0
 4:    D    7    6
 5:    E    9    3
 6:    A   11    4
 7:    B   13    2
 8:    C   15    1
 9:    D   17    3
10:    E   19    7

> setkey(dt, var2)

So now I want to identify all the values in my already defined key (var2) which are less than 10 ( <10). Doing the following tries give me errors.

所以现在我想识别我已经定义的键(var2)中小于10(<10)的所有值。执行以下尝试会给我错误。

> dt[ < 10]
Error: unexpected '<' in "dt[ <"
> dt[ .< 10]
Error in eval(expr, envir, enclos) : object '.' not found
> dt[ .(< 10)]

my expectation would be :

我的期望是:

     var1 var2 var3

 1:    A   11    4
 2:    B   13    2
 3:    C   15    1
 4:    D   17    3
 5:    E   19    7

BTW, I know that just by doing dt[var2 <10] I will get the result. BUT please consider that I want to get the concept of Indexing in data.table and understand and know how to do it without calling the key(var2) in every each of my command!

顺便说一下,我知道只要做dt [var2 <10]我就会得到结果。但是请考虑我想在data.table中获得Indexing的概念,并且在我没有在每个命令中调用键(var2)的情况下理解并知道如何做到这一点!

Any help with explanation is highly appreciated.

任何有关解释的帮助都非常感谢。

2 个解决方案

#1


3  

From ?setkey, key(dt) get the key columns in a character vector. Assuming your table has a single key column, then you can get what you want with:

从?setkey,key(dt)获取字符向量中的键列。假设您的表具有单个键列,那么您可以获得所需的内容:

dt[dt[[key(dt)]] < 10]

Thanks to David Arenburg, you can also use get():

感谢David Arenburg,你也可以使用get():

dt[get(key(dt)) < 10]

This is a little bit shorter and probably the way to go.

这有点短,可能是要走的路。

The other way I can think to do it is much worse:

我认为这样做的另一种方式更糟糕:

dt[eval(parse(text = paste(key(dt), "< 10")))]

#2


1  

from documentation https://www.rdocumentation.org/packages/data.table/versions/1.10.4/topics/setkey

来自文档https://www.rdocumentation.org/packages/data.table/versions/1.10.4/topics/setkey

Here is a key to the solution , if possible

如果可能,这是解决方案的关键

> library(data.table)
data.table 1.10.4
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
> data(mtcars)
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

> mtcars=data.table(mtcars)
> setkey(mtcars,mpg)
> key(mtcars)
[1] "mpg"


> mtcars[mpg<15,,]
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2: 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3: 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
4: 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
5: 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
> mtcars["mpg"<15,,]
Empty data.table (0 rows) of 11 cols: mpg,cyl,disp,hp,drat,wt...

The problem lies that key(DT) is giving "var2" while the subset in a datatable demands var2 (without the quotes) - we get this using get

问题在于密钥(DT)给出“var2”,而数据表中的子集需要var2(没有引号) - 我们使用get得到这个

So now using Remove quotes from a character vector in R

所以现在使用从R中的字符向量中删除引号

This is the simplest way

这是最简单的方法

#get(key(mtcars))

    > mtcars[get(key(mtcars))<15]
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2: 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3: 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
4: 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
5: 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4

For your datatable it will be

对于你的数据表,它将是

DT[get(key(DT))<10]

which is the same as @DavidArenburg 's simple and elegant answer

这与@DavidArenburg简单而优雅的答案相同

#1


3  

From ?setkey, key(dt) get the key columns in a character vector. Assuming your table has a single key column, then you can get what you want with:

从?setkey,key(dt)获取字符向量中的键列。假设您的表具有单个键列,那么您可以获得所需的内容:

dt[dt[[key(dt)]] < 10]

Thanks to David Arenburg, you can also use get():

感谢David Arenburg,你也可以使用get():

dt[get(key(dt)) < 10]

This is a little bit shorter and probably the way to go.

这有点短,可能是要走的路。

The other way I can think to do it is much worse:

我认为这样做的另一种方式更糟糕:

dt[eval(parse(text = paste(key(dt), "< 10")))]

#2


1  

from documentation https://www.rdocumentation.org/packages/data.table/versions/1.10.4/topics/setkey

来自文档https://www.rdocumentation.org/packages/data.table/versions/1.10.4/topics/setkey

Here is a key to the solution , if possible

如果可能,这是解决方案的关键

> library(data.table)
data.table 1.10.4
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
> data(mtcars)
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

> mtcars=data.table(mtcars)
> setkey(mtcars,mpg)
> key(mtcars)
[1] "mpg"


> mtcars[mpg<15,,]
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2: 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3: 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
4: 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
5: 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
> mtcars["mpg"<15,,]
Empty data.table (0 rows) of 11 cols: mpg,cyl,disp,hp,drat,wt...

The problem lies that key(DT) is giving "var2" while the subset in a datatable demands var2 (without the quotes) - we get this using get

问题在于密钥(DT)给出“var2”,而数据表中的子集需要var2(没有引号) - 我们使用get得到这个

So now using Remove quotes from a character vector in R

所以现在使用从R中的字符向量中删除引号

This is the simplest way

这是最简单的方法

#get(key(mtcars))

    > mtcars[get(key(mtcars))<15]
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2: 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3: 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
4: 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
5: 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4

For your datatable it will be

对于你的数据表,它将是

DT[get(key(DT))<10]

which is the same as @DavidArenburg 's simple and elegant answer

这与@DavidArenburg简单而优雅的答案相同