I am trying to randomly sample 7 numbers from 0 to 7 (with replacement), but subject to the constraint that the numbers chosen add up to 7. So for instance, the output 0 1 1 2 3 0 0 is okay, but the output 1 2 3 4 5 6 7 is not. Is there a way to use the sample command with added constraints?
我试着从0到7(用替换)随机抽取7个数字,但受所选数字加起来等于7的约束。例如,输出0 1 1 2 3 0 0是可以的,但是输出1 2 3 4 5 6 7不是。是否有一种方法可以使用带有附加约束的示例命令?
I intend to use the replicate() function with the sample command as an argument, to return a list of N different vectors form the sample command. The way I am currently using the sample command (without any constraints), I need N to be very large in order to get as many possible vectors that sum to exactly 7 as possible. I figure there must be an easier way to do this!
我打算以样例命令作为参数使用复制()函数,以返回示例命令的N个不同向量的列表。我现在使用样例命令的方式(没有任何约束),我需要N非常大,以得到尽可能多的可能向量。我想一定有更简单的办法!
Here is my code for that part:
这是我这部分的代码:
x <- replicate(100000, sample(0:7, 7, replace=T))
Ideally, I want 10,000 or 100,000 vectors in x to sum to 7, but would need an enormous N value to do this. Thanks for any help.
理想情况下,我想要x中的10000或100000个向量之和等于7,但这需要一个巨大的N值。感谢任何帮助。
5 个解决方案
#1
18
To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:
为了确保你是均匀采样,你可以生成所有的排列并将它们的总和限制为7:
library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]
From nrow(perms7)
, we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:
从nrow(perms7)中,我们看到只有1716种可能的排列总和为7。现在你可以从排列中得到一致的样本:
set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 0 0 0 2 5 0 0
# [2,] 1 3 0 1 2 0 0
# [3,] 1 4 1 1 0 0 0
# [4,] 1 0 0 3 0 3 0
# [5,] 0 2 0 0 0 5 0
# [6,] 1 1 2 0 0 2 1
An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7
took 0.3 seconds on my computer and building a 1 million-row my.perms
took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7
instead of generating each vector separately.
这种方法的一个优点是很容易看出我们是均匀随机抽样的。而且,它也非常快——在我的电脑上构建perms7只花了0.3秒,我就创建了一个100万行。烫发了0.04秒。如果你需要画很多向量,这将比递归方法快得多,因为你只是用矩阵来索引perms7而不是分别生成每个向量。
Here's a distribution of counts of numbers in the sample:
下面是样本中数字的分布:
# 0 1 2 3 4 5 6 7
# 323347 188162 102812 51344 22811 8629 2472 423
#2
8
Start with all zeroes, add one to any element, do 7 times:
从所有的0开始,向任何元素添加1,做7次:
sumTo = function(){
v = rep(0,7)
for(i in 1:7){
addTo=sample(7)[1]
v[addTo]=v[addTo]+1
}
v
}
Or equivalently, just choose which of the 7 elements you are going to increment in one sample of length 7, then tabulate those, making sure you tabulate up to 7:
或者等价地,选择你要增加的7个元素中的一个,在一个长度为7的样本中,然后把它们列成表格,确保你的表格是7:
sumTo = function(){tabulate(sample(7, 7, replace = TRUE), 7)}
> sumTo()
[1] 2 1 0 0 4 0 0
> sumTo()
[1] 1 3 1 0 1 0 1
> sumTo()
[1] 1 1 0 2 1 0 2
I don't know if this will produce a uniform sample from all possible combinations...
我不知道这是否会从所有可能的组合中产生一个统一的样本……
The distribution of individual elements over 100,000 reps is:
10万次以上的个人要素分布如下:
> X = replicate(100000,sumTo())
> table(X)
X
0 1 2 3 4 5 6
237709 277926 138810 38465 6427 627 36
Didn't hit a 0,0,0,0,0,7 that time!
那个时候没有达到0 0 0 0 7。
#3
5
This recursive algorithm will output a distribution with a higher probability for large numbers than the other solutions. The idea is to throw a random number y
in 0:7
in any of the seven available slots, then repeat with a random number in 0:(7-y)
, etc:
这种递归算法将输出一个分布,对于较大的数字,其概率高于其他解。我们的想法是将0:7的随机数y扔进7个可用的插槽中,然后在0:(7-y)中重复一个随机数,等等:
sample.sum <- function(x = 0:7, n = 7L, s = 7L) {
if (n == 1) return(s)
x <- x[x <= s]
y <- sample(x, 1)
sample(c(y, Recall(x, n - 1L, s - y)))
}
set.seed(123L)
sample.sum()
# [1] 0 4 0 2 0 0 1
Drawing 100,000 vectors took 11 seconds on my machine and here is the distribution I get:
在我的机器上画10万个向量花了11秒这是我得到的分布:
# 0 1 2 3 4 5 6 7
# 441607 98359 50587 33364 25055 20257 16527 14244
#4
5
There may be an easier and/or more elegant way, but here's a brute-force method using the LSPM:::.nPri
function. The link includes the definition for an R-only version of the algorithm, for those interested.
可能会有一个更简单或者更优雅的方式,但是这里有一个使用LSPM的蛮力方法::。nPri函数。这个链接包含了算法的R-only版本的定义,对于感兴趣的人来说。
#install.packages("LSPM", repos="http://r-forge.r-project.org")
library(LSPM)
# generate all possible permutations, since there are only ~2.1e6 of them
# (this takes < 40s on my 2.2Ghz laptop)
x <- lapply(seq_len(8^7), nPri, n=8, r=7, replace=TRUE)
# set each permutation that doesn't sum to 7 to NULL
y <- lapply(x, function(p) if(sum(p-1) != 7) NULL else p-1)
# subset all non-NULL permutations
z <- y[which(!sapply(y, is.null))]
Now you can sample from z
and be assured that you're getting a permutation that sums to 7.
现在你可以从z中取样,并确信你得到了一个和7的排列。
#5
3
I find this question intriguing and gave it some extra thought. Another (more general) approach to (approximate) sample uniformly from all feasible solutions, without generating and storing all permutations (which is clearly not possible in the case with much more than 7 numbers), in R by sample()
, could be a simple MCMC implementation:
我发现这个问题很有趣,并给了它一些额外的思考。另一种(更一般的)方法(从所有可行的解决方案中统一抽取样本,而不生成和存储所有排列(显然在大于7个数字的情况下是不可能的),使用R by sample(),可以是一个简单的MCMC实现:
S <- c(0, 1, 1, 2, 3, 0, 0) #initial solution
N <- 100 #number of dependent samples (or burn in period)
series <- numeric(N)
for(i in 1:N){
b <- sample(1:length(S), 2, replace=FALSE) #pick 2 elements at random
opt <- sum(S[-b]) #sum of complementary elements
a <- sample(0:(7-opt), 1) #sample a substistute
S[b[1]] <- a #change elements
S[b[2]] <- 7 - opt - a
}
S #new sample
This is of course really fast for a few samples. The "distribution":
这对于一些样本来说当然是非常快的。“分布”:
#"distribution" N=100.000: 0 1 2 3 4 5 6 7
# 321729 189647 103206 52129 22287 8038 2532 432
Of course in this case, where it's actually possible to find and store all combinations, and if you want a huge sample from all feasible outcomes, just use partitions::compositions(7, 7)
, as also suggested by Josh O'Brien in the comments, to avoid calculating all the permutations, when only a small fraction is needed:
当然在这种情况下,它实际上可能找到并存储所有的组合,如果你想要一个巨大的样本所有可行的结果,只使用分区:成分(7),也建议杰克O ' brien的评论,避免计算所有的排列,当只需要一小部分:
perms7 <- partitions::compositions(7, 7)
>tabulate(perms7[, sample(ncol(perms7), 100000, TRUE)]+1, 8)
#"distribution" N=100.000: 0 1 2 3 4 5 6 7
# 323075 188787 102328 51511 22754 8697 2413 435
#1
18
To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:
为了确保你是均匀采样,你可以生成所有的排列并将它们的总和限制为7:
library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]
From nrow(perms7)
, we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:
从nrow(perms7)中,我们看到只有1716种可能的排列总和为7。现在你可以从排列中得到一致的样本:
set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 0 0 0 2 5 0 0
# [2,] 1 3 0 1 2 0 0
# [3,] 1 4 1 1 0 0 0
# [4,] 1 0 0 3 0 3 0
# [5,] 0 2 0 0 0 5 0
# [6,] 1 1 2 0 0 2 1
An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7
took 0.3 seconds on my computer and building a 1 million-row my.perms
took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7
instead of generating each vector separately.
这种方法的一个优点是很容易看出我们是均匀随机抽样的。而且,它也非常快——在我的电脑上构建perms7只花了0.3秒,我就创建了一个100万行。烫发了0.04秒。如果你需要画很多向量,这将比递归方法快得多,因为你只是用矩阵来索引perms7而不是分别生成每个向量。
Here's a distribution of counts of numbers in the sample:
下面是样本中数字的分布:
# 0 1 2 3 4 5 6 7
# 323347 188162 102812 51344 22811 8629 2472 423
#2
8
Start with all zeroes, add one to any element, do 7 times:
从所有的0开始,向任何元素添加1,做7次:
sumTo = function(){
v = rep(0,7)
for(i in 1:7){
addTo=sample(7)[1]
v[addTo]=v[addTo]+1
}
v
}
Or equivalently, just choose which of the 7 elements you are going to increment in one sample of length 7, then tabulate those, making sure you tabulate up to 7:
或者等价地,选择你要增加的7个元素中的一个,在一个长度为7的样本中,然后把它们列成表格,确保你的表格是7:
sumTo = function(){tabulate(sample(7, 7, replace = TRUE), 7)}
> sumTo()
[1] 2 1 0 0 4 0 0
> sumTo()
[1] 1 3 1 0 1 0 1
> sumTo()
[1] 1 1 0 2 1 0 2
I don't know if this will produce a uniform sample from all possible combinations...
我不知道这是否会从所有可能的组合中产生一个统一的样本……
The distribution of individual elements over 100,000 reps is:
10万次以上的个人要素分布如下:
> X = replicate(100000,sumTo())
> table(X)
X
0 1 2 3 4 5 6
237709 277926 138810 38465 6427 627 36
Didn't hit a 0,0,0,0,0,7 that time!
那个时候没有达到0 0 0 0 7。
#3
5
This recursive algorithm will output a distribution with a higher probability for large numbers than the other solutions. The idea is to throw a random number y
in 0:7
in any of the seven available slots, then repeat with a random number in 0:(7-y)
, etc:
这种递归算法将输出一个分布,对于较大的数字,其概率高于其他解。我们的想法是将0:7的随机数y扔进7个可用的插槽中,然后在0:(7-y)中重复一个随机数,等等:
sample.sum <- function(x = 0:7, n = 7L, s = 7L) {
if (n == 1) return(s)
x <- x[x <= s]
y <- sample(x, 1)
sample(c(y, Recall(x, n - 1L, s - y)))
}
set.seed(123L)
sample.sum()
# [1] 0 4 0 2 0 0 1
Drawing 100,000 vectors took 11 seconds on my machine and here is the distribution I get:
在我的机器上画10万个向量花了11秒这是我得到的分布:
# 0 1 2 3 4 5 6 7
# 441607 98359 50587 33364 25055 20257 16527 14244
#4
5
There may be an easier and/or more elegant way, but here's a brute-force method using the LSPM:::.nPri
function. The link includes the definition for an R-only version of the algorithm, for those interested.
可能会有一个更简单或者更优雅的方式,但是这里有一个使用LSPM的蛮力方法::。nPri函数。这个链接包含了算法的R-only版本的定义,对于感兴趣的人来说。
#install.packages("LSPM", repos="http://r-forge.r-project.org")
library(LSPM)
# generate all possible permutations, since there are only ~2.1e6 of them
# (this takes < 40s on my 2.2Ghz laptop)
x <- lapply(seq_len(8^7), nPri, n=8, r=7, replace=TRUE)
# set each permutation that doesn't sum to 7 to NULL
y <- lapply(x, function(p) if(sum(p-1) != 7) NULL else p-1)
# subset all non-NULL permutations
z <- y[which(!sapply(y, is.null))]
Now you can sample from z
and be assured that you're getting a permutation that sums to 7.
现在你可以从z中取样,并确信你得到了一个和7的排列。
#5
3
I find this question intriguing and gave it some extra thought. Another (more general) approach to (approximate) sample uniformly from all feasible solutions, without generating and storing all permutations (which is clearly not possible in the case with much more than 7 numbers), in R by sample()
, could be a simple MCMC implementation:
我发现这个问题很有趣,并给了它一些额外的思考。另一种(更一般的)方法(从所有可行的解决方案中统一抽取样本,而不生成和存储所有排列(显然在大于7个数字的情况下是不可能的),使用R by sample(),可以是一个简单的MCMC实现:
S <- c(0, 1, 1, 2, 3, 0, 0) #initial solution
N <- 100 #number of dependent samples (or burn in period)
series <- numeric(N)
for(i in 1:N){
b <- sample(1:length(S), 2, replace=FALSE) #pick 2 elements at random
opt <- sum(S[-b]) #sum of complementary elements
a <- sample(0:(7-opt), 1) #sample a substistute
S[b[1]] <- a #change elements
S[b[2]] <- 7 - opt - a
}
S #new sample
This is of course really fast for a few samples. The "distribution":
这对于一些样本来说当然是非常快的。“分布”:
#"distribution" N=100.000: 0 1 2 3 4 5 6 7
# 321729 189647 103206 52129 22287 8038 2532 432
Of course in this case, where it's actually possible to find and store all combinations, and if you want a huge sample from all feasible outcomes, just use partitions::compositions(7, 7)
, as also suggested by Josh O'Brien in the comments, to avoid calculating all the permutations, when only a small fraction is needed:
当然在这种情况下,它实际上可能找到并存储所有的组合,如果你想要一个巨大的样本所有可行的结果,只使用分区:成分(7),也建议杰克O ' brien的评论,避免计算所有的排列,当只需要一小部分:
perms7 <- partitions::compositions(7, 7)
>tabulate(perms7[, sample(ncol(perms7), 100000, TRUE)]+1, 8)
#"distribution" N=100.000: 0 1 2 3 4 5 6 7
# 323075 188787 102328 51511 22754 8697 2413 435