(修改)Tensorflow tflearn二值图像学习问题

时间:2021-02-12 00:22:13

I would like to learn fingerprint images that have been binarized using PIL for the tensor flow. I'm trying to learn a binarized image, so the shape is not right.


from __future__ import division, print_function, absolute_import
import pickle
import numpy as np
from PIL import Image
import tflearn
import tensorflow as tf
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.estimator import regression

def load_image(img_path):
    img = Image.open(img_path)

    return img

def resize_image(in_image, new_width, new_height, out_image=None,
    img = in_image.resize((new_width, new_height), resize_mode)

    if out_image:

    return img

def pil_to_nparray(pil_image):

    return np.asarray(pil_image, dtype="float32")

def binarization(in_img, threshold):
    im = in_img.convert('L')
    for i in range(im.size[0]):
        for j in range(im.size[1]):
            if im.getpixel((i,j)) > threshold:
                im.putpixel((i,j), 255)
                im.putpixel((i,j), 0)
    return im.convert('F')

def load_data(datafile, num_clss, save=True, save_path='dataset.pkl'):
    train_list = open(datafile,'r')
    labels = []
    images = []
    for line in train_list:
        tmp = line.strip().split(' ')
        fpath = tmp[0]
        img = load_image(fpath)
        img = binarization(img, 128)
        img = resize_image(img, 224, 224)
        np_img = pil_to_nparray(img)

        index = int(tmp[1])
        label = np.zeros(num_clss)
        label[index] = 1
    if save:
        pickle.dump((images, labels), open(save_path, 'wb'))

    return images, labels

def load_from_pkl(dataset_file):
    X, Y = pickle.load(open(dataset_file, 'rb'))
    return X, Y

def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 3], name='input')
    network = conv_2d(network, 64, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 64, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)
    network = conv_2d(network, 128, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 128, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, 2, strides=2)

    network = conv_2d(network, 256, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 256, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 256, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)

    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)

    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)

    network = fully_connected(network, 4096, activation='relu')
    network = dropout(network, 0.5)
    network = fully_connected(network, 4096, activation='relu')
    network = dropout(network, 0.5)
    network = fully_connected(network, num_classes, activation='softmax')

    network = regression(network, optimizer='adam', loss='categorical_crossentropy',

    return network

def train(network, X, Y):
    # Trainingeed data dictionary, with placeholders as keys, and data as values.
    model = tflearn.DNN(network, checkpoint_path='model_vgg',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output')
    model.fit(X, Y, n_epoch=100, validation_set=0.1, shuffle=True, show_metric=True,
              batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id='vgg_fingerprint')

def predict(network, modelfile, images):
    model = tflearn.DNN(network)

    return model.predict(images)

if __name__ == '__main__':
    #image, label = load_data('train.txt', 5)
    X, Y = load_from_pkl('dataset.pkl')
    net = create_vggnet(5)
    train(net, X, Y)

I have tried using numpy reshape the dimensions change. However, the following error is repeated.


The error is as follows. ValueError: Can not feed value of shape (64,224,224) for Tensor u'input / X: 0 ', which has shape (?, 224, 224, 3)

错误如下。ValueError:不能给张量u'输入/ X: 0 '提供形状(64,224,224)的值。、224、224、3)

What is the problem?


1 个解决方案



The problem is with your input shape - it doesn't match the input layer.


The input layer is defined in create_vggnet():


def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 3], name='input')

So you expect None (== any) times (224, 224, 3), that is 224x224 x RGB (3 channels). And you pass 64 (your batch size) times 224x224.

所以你希望没有(= any)乘以(224,224,3)也就是224x224 x RGB(3个通道)通过64(批号)乘以224x224。

There are two fixes:


1) (probably more wasteful) - extend the images to RGB.


So, after you convert the image to 'L' (lightness, that is gray levels) and then binarize, convert it to RGB first. Then you can convert it to 'F'


(See: http://effbot.org/imagingbook/image.htm and How do I save a mode 'F' image? (Python/PIL))

(参见:http://effbot.org/imagingbook/image.htm,如何保存模式'F'图像?(Python /公益诉讼))

def binarization(in_img, threshold):
    im = in_img.convert('L')
    for i in range(im.size[0]):
        for j in range(im.size[1]):
            if im.getpixel((i, j)) > threshold:
                im.putpixel((i, j), 255)
                im.putpixel((i, j), 0)
    return im.convert('RGB').convert('F')

2) (less wasteful, but you're changing your network a bit (just the input layer) - so it can be argued, that this "isn't VGG 16 anymore") You can change the input layer to 1-channel.

2)(更少的浪费,但是您正在稍微改变您的网络(只是输入层)——因此可以认为,这“不再是VGG 16”)您可以将输入层更改为1通道。

def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 1], name='input')

Unfortunately, shape=[None, 224, 224] doesn't work (the error is something about "The Tensor needs to be 4D"). So we have a shape of (224, 224, 1) for a single input value.

不幸的是,shape=[None, 224, 224]不起作用(错误是“张量需要是4D”)。对于一个输入值,我们有(224 224 224 224,224,1)的形状。

So you need to make the images have an extra dimension:


def pil_to_nparray(pil_image):

    return np.expand_dims(np.asarray(pil_image, dtype="float32"), 2)

or (maybe even better):


def pil_to_nparray(pil_image):

    return np.asarray(pil_image, dtype="float32").reshape((224, 224, 1))

(the latter version looks more direct, you know exactly what it does) But this only works if the input image is 224x224, while the expand_dims would always add the extra dimension, for any size.




The problem is with your input shape - it doesn't match the input layer.


The input layer is defined in create_vggnet():


def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 3], name='input')

So you expect None (== any) times (224, 224, 3), that is 224x224 x RGB (3 channels). And you pass 64 (your batch size) times 224x224.

所以你希望没有(= any)乘以(224,224,3)也就是224x224 x RGB(3个通道)通过64(批号)乘以224x224。

There are two fixes:


1) (probably more wasteful) - extend the images to RGB.


So, after you convert the image to 'L' (lightness, that is gray levels) and then binarize, convert it to RGB first. Then you can convert it to 'F'


(See: http://effbot.org/imagingbook/image.htm and How do I save a mode 'F' image? (Python/PIL))

(参见:http://effbot.org/imagingbook/image.htm,如何保存模式'F'图像?(Python /公益诉讼))

def binarization(in_img, threshold):
    im = in_img.convert('L')
    for i in range(im.size[0]):
        for j in range(im.size[1]):
            if im.getpixel((i, j)) > threshold:
                im.putpixel((i, j), 255)
                im.putpixel((i, j), 0)
    return im.convert('RGB').convert('F')

2) (less wasteful, but you're changing your network a bit (just the input layer) - so it can be argued, that this "isn't VGG 16 anymore") You can change the input layer to 1-channel.

2)(更少的浪费,但是您正在稍微改变您的网络(只是输入层)——因此可以认为,这“不再是VGG 16”)您可以将输入层更改为1通道。

def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 1], name='input')

Unfortunately, shape=[None, 224, 224] doesn't work (the error is something about "The Tensor needs to be 4D"). So we have a shape of (224, 224, 1) for a single input value.

不幸的是,shape=[None, 224, 224]不起作用(错误是“张量需要是4D”)。对于一个输入值,我们有(224 224 224 224,224,1)的形状。

So you need to make the images have an extra dimension:


def pil_to_nparray(pil_image):

    return np.expand_dims(np.asarray(pil_image, dtype="float32"), 2)

or (maybe even better):


def pil_to_nparray(pil_image):

    return np.asarray(pil_image, dtype="float32").reshape((224, 224, 1))

(the latter version looks more direct, you know exactly what it does) But this only works if the input image is 224x224, while the expand_dims would always add the extra dimension, for any size.
