减去data.frames和数据。表的大小不同

时间:2021-06-08 22:54:56

Two data.frames of same dimensions can be subtracted in R using

两个相同尺寸的帧可以用R减去

df1 - df2

But I want to subtract two data.frames of different dimensions like

但是我要减去两个不同维度的数据

df1 <- data.frame(V1=1:5)
df2 <- data.frame(V1=1:5, V2=6:10)

df1-df2

Error in Ops.data.frame(df1, df2) : 
  ‘-’ only defined for equally-sized data frames

This subtracting can be achieved using for loop but I'm looking for any already established function. Thanks

这个减法可以用for循环来实现,但是我正在寻找任何已经建立的函数。谢谢

Edited

编辑

How about if have to subtract two data.tables of different dimensions?

如果要减去两个数据。表不同的尺寸吗?

library(data.table)
dt1 <- data.table(V1=1:5)
dt2 <- data.table(V1=1:5, V2=6:10)

dt1-dt2

Error in `[.data.table`(dt1, row(dt2), ) : 
  i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please let datatable-help know if you'd like this, or add your comments to FR #1611.
dt1[row(dt2),]-dt2

1 个解决方案

#1


3  

We could do this by making both the datasets have the same length so that we can compare element-by-element of each dataset. In the example given 'df1' has 1 column with 5 rows, where as for 'df2' it is 2 columns with 5 rows. The idea would be to make 'df1' having 10 elements or either 2 columns with 5 rows to match the dimensions of 'df2'. This can be easily done by rep or a convenient function is row.

我们可以通过使两个数据集具有相同的长度来实现这一点,这样我们就可以对每个数据集逐个元素进行比较。在给定的示例“df1”中,有一列有5行,其中“df2”有两列有5行。我们的想法是让'df1'有10个元素,或者是2列和5行,以匹配'df2'的维度。这可以由rep或方便的函数is row轻松完成。

  df1[row(df2),]-df2

Just to make it more clear

我只是想说得更清楚一点

 row(df2)
 #     [,1] [,2]
 #[1,]    1    1
 #[2,]    2    2
 #[3,]    3    3
 #[4,]    4    4
 #[5,]    5    5

gives the row index for each row of 'df2'. By doing

给出“df2”的每一行的索引。通过做

 df1[row(df2),]
 #[1] 1 2 3 4 5 1 2 3 4 5

we replicate each row element twice. Given that the datasets do this in columnwise, it is happening like below

我们复制每一行元素两次。如果数据集以列方式执行此操作,则如下所示

 df1[c(row(df2)[,1],row(df2)[,2]),]

This can be subtracted from df2

这可以从df2中减去。

 df1[row(df2),]-df2
 #  V1 V2
 #1  0 -5
 #2  0 -5
 #3  0 -5
 #4  0 -5
 #5  0 -5

As @David Arenburg mentioned, this would return wrong results for both datasets having multiple columns. So, if you are going to subtract a single column from 'df1' (having multiple columns) from a multiple column dataset ('df2'), then selecting that column and subtracting from 'df2' may be more general (Thanks to @David Arenburg's code)

正如@David Arenburg所提到的,这将返回包含多个列的数据集的错误结果。因此,如果要从“df1”(拥有多个列)从多个列数据集(“df2”)中减去单个列,那么选择该列并从“df2”中减去可能更一般(这要感谢@David Arenburg的代码)

 df1$V1-df2
 #  V1 V2
 #1  0 -5
 #2  0 -5
 #3  0 -5
 #4  0 -5
 #5  0 -5

It works due to the recycling effect, i.e. 'V1' column elements will subtract from the first column of 'df2', then it will again start from the first element to start subtracting from the second column of 'df2' (assuming that both datasets have the same number or rows).

它的工作是由于回收效应,即。“V1”列元素将从“df2”的第一列中减去,然后它将再次从第一个元素开始,从“df2”的第二列开始减去(假设两个数据集都有相同的数字或行)。


For the second example with data.table (single column for 'dt1'), one option would be

对于第二个数据示例。表(dt1的单列),一个选项是

dt1[,rep(names(dt1), ncol(dt2)),with=FALSE]-dt2
#   V1 V1
#1:  0 -5
#2:  0 -5
#3:  0 -5
#4:  0 -5
#5:  0 -5

#1


3  

We could do this by making both the datasets have the same length so that we can compare element-by-element of each dataset. In the example given 'df1' has 1 column with 5 rows, where as for 'df2' it is 2 columns with 5 rows. The idea would be to make 'df1' having 10 elements or either 2 columns with 5 rows to match the dimensions of 'df2'. This can be easily done by rep or a convenient function is row.

我们可以通过使两个数据集具有相同的长度来实现这一点,这样我们就可以对每个数据集逐个元素进行比较。在给定的示例“df1”中,有一列有5行,其中“df2”有两列有5行。我们的想法是让'df1'有10个元素,或者是2列和5行,以匹配'df2'的维度。这可以由rep或方便的函数is row轻松完成。

  df1[row(df2),]-df2

Just to make it more clear

我只是想说得更清楚一点

 row(df2)
 #     [,1] [,2]
 #[1,]    1    1
 #[2,]    2    2
 #[3,]    3    3
 #[4,]    4    4
 #[5,]    5    5

gives the row index for each row of 'df2'. By doing

给出“df2”的每一行的索引。通过做

 df1[row(df2),]
 #[1] 1 2 3 4 5 1 2 3 4 5

we replicate each row element twice. Given that the datasets do this in columnwise, it is happening like below

我们复制每一行元素两次。如果数据集以列方式执行此操作,则如下所示

 df1[c(row(df2)[,1],row(df2)[,2]),]

This can be subtracted from df2

这可以从df2中减去。

 df1[row(df2),]-df2
 #  V1 V2
 #1  0 -5
 #2  0 -5
 #3  0 -5
 #4  0 -5
 #5  0 -5

As @David Arenburg mentioned, this would return wrong results for both datasets having multiple columns. So, if you are going to subtract a single column from 'df1' (having multiple columns) from a multiple column dataset ('df2'), then selecting that column and subtracting from 'df2' may be more general (Thanks to @David Arenburg's code)

正如@David Arenburg所提到的,这将返回包含多个列的数据集的错误结果。因此,如果要从“df1”(拥有多个列)从多个列数据集(“df2”)中减去单个列,那么选择该列并从“df2”中减去可能更一般(这要感谢@David Arenburg的代码)

 df1$V1-df2
 #  V1 V2
 #1  0 -5
 #2  0 -5
 #3  0 -5
 #4  0 -5
 #5  0 -5

It works due to the recycling effect, i.e. 'V1' column elements will subtract from the first column of 'df2', then it will again start from the first element to start subtracting from the second column of 'df2' (assuming that both datasets have the same number or rows).

它的工作是由于回收效应,即。“V1”列元素将从“df2”的第一列中减去,然后它将再次从第一个元素开始,从“df2”的第二列开始减去(假设两个数据集都有相同的数字或行)。


For the second example with data.table (single column for 'dt1'), one option would be

对于第二个数据示例。表(dt1的单列),一个选项是

dt1[,rep(names(dt1), ncol(dt2)),with=FALSE]-dt2
#   V1 V1
#1:  0 -5
#2:  0 -5
#3:  0 -5
#4:  0 -5
#5:  0 -5