将大型numpy阵列送入张量流

I have a large numpy arrays (X) which I can load onto the CPU but it is too big for the GPU/Tensorflow.I would like to perform array operations on X using tensorflow so I break up the array into batches (using numpy), feed it to tensorflow, and then finally concatenate the final output arrays to give me the numpy array Y. I am new to tensorflow so I think there should be a better/faster way to feed in the numpy array.

我有一个大的numpy数组(X)我可以加载到CPU但它对于GPU / Tensorflow来说太大了。我想使用tensorflow在X上执行数组操作所以我将数组分成批处理(使用numpy) ,将它连接到tensorflow,然后最终连接最终输出数组,给我numpy数组Y.我是tensorflow的新手,所以我认为应该有更好/更快的方式来输入numpy数组。

#X is a large numpy array
#batches is an integer which defines the number of batches

X_list = np.array_split(X,batches)

X_tf = tf.placeholder(tf.float32)
Y_tf = some_function(X_tf)

for batch in range(batches):
    sess = tf.Session()
    sess.run(init)
    Y_list.append(sess.run(Y_tf, feed_dict={X_tf: X_list[batch]}))
    sess.close()

Y = np.hstack(Y_list)

1 个解决方案

#1

You should look at the tensorflow dataset class, as it has capability of handling large np arrays. As long as the array can fit in memory, it can be loaded and batched however you want.

您应该查看tensorflow数据集类,因为它具有处理大型np数组的功能。只要数组可以适合内存,就可以根据需要加载和批处理。

A basic implementation would look like (more detail here)

一个基本的实现看起来像(这里更详细)

#load np array X 

#make placeholders for dataset    
X_placeholder = tf.placeholder(dtype=tf.float32, shape=X.shape)    

#make data set from placeholders    
dataset = Dataset.from_tensor_slices((X_placeholder)) 

#batch
dataset = dataset.batch(batch_size)

#1