I'm trying to load the bottleneck_features that I obtained from running resnet50 into a top layer model. I ran predict_generator on resnet and saved the resultant bottleneck_features to a npy file. I am unable to fit the model I have created because of the following error:
我正在尝试将从运行resnet50获得的bottleneck_features加载到顶层模型中。我在resnet上运行了predict_generator,并将生成的bottleneck_features保存到npy文件中。由于以下错误,我无法适应我创建的模型:
Traceback (most recent call last):
File "Labeled_Image_Recognition.py", line 119, in <module>
callbacks=[checkpointer])
File "/home/dillon/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/models.py", line 963, in fit
validation_steps=validation_steps)
File "/home/dillon/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/training.py", line 1630, in fit
batch_size=batch_size)
File "/home/dillon/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/training.py", line 1490, in _standardize_user_data
_check_array_lengths(x, y, sample_weights)
File "/home/dillon/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/training.py", line 220, in _check_array_lengths
'and ' + str(list(set_y)[0]) + ' target samples.')
ValueError: Input arrays should have the same number of samples as target arrays. Found 940286 input samples and 14951 target samples.
I'm not really sure what it means. I have 940286 total images in my train dir and there are 14951 total subdirs that these images are separated into. My two hypotheses are:
我不太确定这意味着什么。我的火车目录中有940286个总图像,这些图像被分成14951个总数。我的两个假设是:
- It is possible that I am not formatting the train_data and train_labels correctly.
- I set up the model incorrectly
我可能没有正确格式化train_data和train_labels。
我错误地设置了模型
Any guidance into the right direction would be much appreciated!
任何正确方向的指导都将非常感谢!
Here is the code:
这是代码:
# Constants
num_train_dirs = 14951 #This is the total amount of classes I have
num_valid_dirs = 13168
def load_labels(path):
targets = os.listdir(path)
labels = np_utils.to_categorical(targets, len(targets))
return labels
def create_model(train_data):
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(num_train_dirs, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_train_dirs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model
train_data = np.load(open('bottleneck_features/bottleneck_features_train.npy', 'rb'))
train_labels = load_labels(raid_train_dir)
valid_data = np.load(open('bottleneck_features/bottleneck_features_valid.npy', 'rb'))
valid_labels = train_labels
model = create_model(train_data)
model.summary()
checkpointer = ModelCheckpoint(filepath='weights/first_try.hdf5', verbose=1, save_best_only=True)
print("Fitting model...")
model.fit(train_data, train_labels,
epochs=50,
batch_size=100,
verbose=1,
validation_data=(valid_data, valid_labels),
callbacks=[checkpointer])
1 个解决方案
#1
0
In case of supervised learning the number of input samples (X
) must match the number of output (labels) samples (Y
).
在监督学习的情况下,输入样本(X)的数量必须与输出(标签)样本(Y)的数量匹配。
For example: if we want to fit (learn) a NN to recognize handwritten digits and we feed 10.000 images (X
) to our model, then we should also pass 10.000 labels (Y
).
例如:如果我们想要(学习)NN来识别手写数字并且我们将10.000个图像(X)提供给我们的模型,那么我们也应该传递10.000个标签(Y)。
In your case those numbers don't match.
在您的情况下,这些数字不匹配。
#1
0
In case of supervised learning the number of input samples (X
) must match the number of output (labels) samples (Y
).
在监督学习的情况下,输入样本(X)的数量必须与输出(标签)样本(Y)的数量匹配。
For example: if we want to fit (learn) a NN to recognize handwritten digits and we feed 10.000 images (X
) to our model, then we should also pass 10.000 labels (Y
).
例如:如果我们想要(学习)NN来识别手写数字并且我们将10.000个图像(X)提供给我们的模型,那么我们也应该传递10.000个标签(Y)。
In your case those numbers don't match.
在您的情况下,这些数字不匹配。