I'm using numpy to create a cube array with sides of length 100, thus containing 1 million entries total. For each of the million entries, I am inserting a 100x100 matrix whose entries are comprised of randomly generated numbers. I am using the following code to do so:
我正在使用numpy创建一个边长为100的立方体数组,因此总共包含100万个条目。对于每百万个条目,我插入一个100x100矩阵,其条目由随机生成的数字组成。我使用以下代码来执行此操作:
import random
from numpy import *
cube = arange(1000000).reshape(100,100,100)
for element in cube.flat:
matrix = arange(10000).reshape(100,100)
for entry in matrix.flat:
entry = random.random()*100
element = matrix
I was expecting this to take a while, but with 10 billion random numbers being generated, I'm not sure my computer can even handle it. How much memory would such an array take up? Would RAM be a limiting factor, i.e. if my computer doesn't have enough RAM, could it fail to actually generate the array?
我期待这需要一段时间,但是生成了100亿个随机数,我不确定我的电脑是否可以处理它。这样一个阵列会占用多少内存? RAM是一个限制因素,即如果我的计算机没有足够的RAM,它是否无法实际生成阵列?
Also, if there is a more efficient to implement this code, I would appreciate tips :)
此外,如果有更高效的实现此代码,我会很感激提示:)
2 个解决方案
#1
21
A couple points:
几点:
- The size in memory of numpy arrays is easy to calculate. It's simply the number of elements times the data size, plus a small constant overhead. For example, if your
cube.dtype
isint64
, and it has 1,000,000 elements, it will require1000000 * 64 / 8 = 8,000,000
bytes (8Mb). - numpy数组的内存大小很容易计算。它只是元素的数量乘以数据大小,加上一个小的常量开销。例如,如果你的cube.dtype是int64,并且它有1,000,000个元素,那么它将需要1000000 * 64/8 = 8,000,000字节(8Mb)。
- However, as @Gabe notes, 100 * 100 * 1,000,000 doubles will require about 80 Gb.
- 但是,正如@Gabe指出的那样,100 * 100 * 1,000,000双打将需要大约80 Gb。
- This will not cause anything to "break", per-se, but operations will be ridiculously slow because of all the swapping your computer will need to do.
- 这不会导致任何“破坏”本身,但由于您的计算机需要进行的所有交换,操作将会非常缓慢。
- Your loops will not do what you expect. Instead of replacing the element in
cube
,element = matrix
will simply overwrite theelement
variable, leaving thecube
unchanged. The same goes for theentry = random.rand() * 100
. - 你的循环不会达到预期的效果。 element = matrix不是替换立方体中的元素,而是简单地覆盖元素变量,保持立方体不变。 entry = random.rand()* 100也是如此。
- Instead, see: http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
- 相反,请参阅:http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
#2
2
for the "inner" part of your function, look at the numpy.random module
对于函数的“内部”部分,请查看numpy.random模块
import numpy as np
matrix = np.random.random((100,100))*100
#1
21
A couple points:
几点:
- The size in memory of numpy arrays is easy to calculate. It's simply the number of elements times the data size, plus a small constant overhead. For example, if your
cube.dtype
isint64
, and it has 1,000,000 elements, it will require1000000 * 64 / 8 = 8,000,000
bytes (8Mb). - numpy数组的内存大小很容易计算。它只是元素的数量乘以数据大小,加上一个小的常量开销。例如,如果你的cube.dtype是int64,并且它有1,000,000个元素,那么它将需要1000000 * 64/8 = 8,000,000字节(8Mb)。
- However, as @Gabe notes, 100 * 100 * 1,000,000 doubles will require about 80 Gb.
- 但是,正如@Gabe指出的那样,100 * 100 * 1,000,000双打将需要大约80 Gb。
- This will not cause anything to "break", per-se, but operations will be ridiculously slow because of all the swapping your computer will need to do.
- 这不会导致任何“破坏”本身,但由于您的计算机需要进行的所有交换,操作将会非常缓慢。
- Your loops will not do what you expect. Instead of replacing the element in
cube
,element = matrix
will simply overwrite theelement
variable, leaving thecube
unchanged. The same goes for theentry = random.rand() * 100
. - 你的循环不会达到预期的效果。 element = matrix不是替换立方体中的元素,而是简单地覆盖元素变量,保持立方体不变。 entry = random.rand()* 100也是如此。
- Instead, see: http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
- 相反,请参阅:http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
#2
2
for the "inner" part of your function, look at the numpy.random module
对于函数的“内部”部分,请查看numpy.random模块
import numpy as np
matrix = np.random.random((100,100))*100