强制R中的sample()对所有k个子阵列上的给定部分行进行采样

时间:2022-09-08 14:57:42

My problem is as follows and I have considered many derivations:

我的问题如下,我考虑过许多​​推导:

I have an array, say with dimensions dims = c(10000, 5, 2) - that is 10 rows, 5 columns and 2 subarrays.

我有一个数组,比如尺寸dims = c(10000,5,2) - 即10行,5列和2个子阵列。

I would like to be able to use the sample() function to sample a given proportion (say m) of rows in EACH subarray and move them BETWEEN subarrays.

我希望能够使用sample()函数对EACH子阵列中给定比例(比如m)的行进行采样,并将它们移动到BETWEEN子阵列中。

So, say swap Row 5 subarray 1 with Row 10 of subarray 2 (see example below).

因此,假设将子行阵列1与子阵列2的第10行交换(参见下面的示例)。

I asked a similar question

我问了一个类似的问题

Moving rows between subarrays

在子阵列之间移动行

and got some great help.

并得到了一些很大的帮助。

The solution is useful but restricted to random sampling of rows (meaning that not all subarrays will be sampled).

该解决方案很有用,但仅限于行的随机采样(意味着不会对所有子阵列进行采样)。

, , 1

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    4    3    4    4    3    4    5    2    4     4
 [2,]    1    4    3    5    4    5    4    5    2     4
 [3,]    1    5    2    1    1    2    1    4    5     1
 [4,]    3    1    1    3    5    4    2    4    4     4
 [5,]    3    2    5    1    2    2    5    5    4     3    <-- e.g., switch this row
 [6,]    4    5    5    2    3    4    1    3    5     5
 [7,]    5    5    5    5    1    4    3    1    2     5
 [8,]    3    4    3    1    3    3    4    3    2     3
 [9,]    1    1    3    2    4    4    1    4    2     3
[10,]    1    4    4    2    4    2    4    2    2     1

, , 2

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    5    5    1    1    5    2    1    4    3     1
 [2,]    4    3    2    4    3    5    5    5    4     3
 [3,]    2    4    1    1    4    2    2    2    3     4
 [4,]    5    1    4    5    4    4    3    4    4     5
 [5,]    1    5    5    4    3    3    5    2    2     2
 [6,]    2    2    2    2    5    5    3    4    3     5
 [7,]    5    2    1    1    2    5    3    4    4     2
 [8,]    3    4    3    3    1    3    3    2    3     5
 [9,]    2    1    4    4    3    2    4    5    5     2
[10,]    5    3    4    5    4    3    5    1    2     3    <-- with this row

In the above example, m = 0.10, that is 10% of the rows (1 row) in each subarray are sampled and then swapped.

在上面的例子中,m = 0.10,即每个子阵列中10%的行(1行)被采样然后交换。

Any ideas on how to force sample() to sample within ALL subarrays? Ideally, the number of rows in each subarray will be very large (10000 or more).

关于如何强制sample()在所有子阵列中采样的任何想法?理想情况下,每个子阵列中的行数将非常大(10000或更多)。

Though I have only included 2 subarrays, where a random row or rows swap(s) with a random row or rows in subarray 2 (dictated by m), I need a routine that is generalizable to k subarrays. So if k = 3, then sampling occurs within ALL subarrays and random rows are swapped with neighbouring subarrays.

虽然我只包含了2个子数组,其中随机行或子行在子阵列2中交换一个或多个随机行(由m表示),我需要一个可以推广到k子阵列的例程。因此,如果k = 3,则在所有子阵列内进行采样,并且随机行与相邻子阵列交换。

So, a random row or rows in subarray 1 has equal chance of moving to either subarray 2 or subarray 3 (it doesn't matter which subarray rows go to, so long as they are always moving between subarrays. Then, the corresponding row or rows from subarray 2 or 3 will go to subarray 1.

因此,子阵列1中的一个或多个随机行具有移动到子阵列2或子阵列3的相等机会(只要它们总是在子阵列之间移动而与哪个子阵列行进行无关。然后,相应的行或子阵列2或3中的行将转到子阵列1。

The number of rows must remain constant. For example, there cannot be 11 rows in subarray 1 and only 9 in subarray 2 -- it has to be 10 and 10.

行数必须保持不变。例如,子阵列1中不能有11行,而子阵列2中只有9行 - 它必须是10和10。

I don't know of any packages that will do this. The goal here is to simulate movement of animals.

我不知道会有这样做的任何软件包。这里的目标是模拟动物的运动。

Any ideas are greatly appreciated.

任何想法都非常感谢。

1 个解决方案

#1


0  

This is verbose and could be cleaned up, but should get you there.

这很冗长,可以清理,但应该让你到那里。

set.seed(4)
x <- array(sample(1:10, 200, replace = T), dim = c(10, 10, 2))
nrows_x <- dim(x)[1]

proportion <- 0.10
idx <- sample(1:nrows_x, nrows_x * proportion)

rows_dim_1 <- x[idx, , 1]
rows_dim_2 <- x[idx, , 2]

x[idx, , 1] <- rows_dim_2
x[idx, , 2] <- rows_dim_1

#1


0  

This is verbose and could be cleaned up, but should get you there.

这很冗长,可以清理,但应该让你到那里。

set.seed(4)
x <- array(sample(1:10, 200, replace = T), dim = c(10, 10, 2))
nrows_x <- dim(x)[1]

proportion <- 0.10
idx <- sample(1:nrows_x, nrows_x * proportion)

rows_dim_1 <- x[idx, , 1]
rows_dim_2 <- x[idx, , 2]

x[idx, , 1] <- rows_dim_2
x[idx, , 2] <- rows_dim_1