(修改)Tensorflow tflearn二值图像学习问题

时间:2021-02-12 00:22:13

I would like to learn fingerprint images that have been binarized using PIL for the tensor flow. I'm trying to learn a binarized image, so the shape is not right.

我想学习用PIL对张量流进行二值化的指纹图像。我正在学习一个二值化的图像,所以这个形状不对。

from __future__ import division, print_function, absolute_import
import pickle
import numpy as np
from PIL import Image
import tflearn
import tensorflow as tf
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.estimator import regression


def load_image(img_path):
    img = Image.open(img_path)

    return img


def resize_image(in_image, new_width, new_height, out_image=None,
                 resize_mode=Image.ANTIALIAS):
    img = in_image.resize((new_width, new_height), resize_mode)

    if out_image:
        img.save(out_image)

    return img


def pil_to_nparray(pil_image):
    pil_image.load()

    return np.asarray(pil_image, dtype="float32")


def binarization(in_img, threshold):
    im = in_img.convert('L')
    for i in range(im.size[0]):
        for j in range(im.size[1]):
            if im.getpixel((i,j)) > threshold:
                im.putpixel((i,j), 255)
            else:
                im.putpixel((i,j), 0)
    return im.convert('F')


def load_data(datafile, num_clss, save=True, save_path='dataset.pkl'):
    train_list = open(datafile,'r')
    labels = []
    images = []
    for line in train_list:
        tmp = line.strip().split(' ')
        fpath = tmp[0]
        print(fpath)
        img = load_image(fpath)
        img = binarization(img, 128)
        img = resize_image(img, 224, 224)
        np_img = pil_to_nparray(img)
        images.append(np_img)

        index = int(tmp[1])
        label = np.zeros(num_clss)
        label[index] = 1
        labels.append(label)
    if save:
        pickle.dump((images, labels), open(save_path, 'wb'))

    return images, labels


def load_from_pkl(dataset_file):
    X, Y = pickle.load(open(dataset_file, 'rb'))
    return X, Y


def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 3], name='input')
    network = conv_2d(network, 64, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 64, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)
    network = conv_2d(network, 128, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 128, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, 2, strides=2)

    network = conv_2d(network, 256, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 256, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 256, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)

    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)

    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = conv_2d(network, 512, filter_size=3, strides=1, activation='relu')
    network = max_pool_2d(network, kernel_size=2, strides=2)

    network = fully_connected(network, 4096, activation='relu')
    network = dropout(network, 0.5)
    network = fully_connected(network, 4096, activation='relu')
    network = dropout(network, 0.5)
    network = fully_connected(network, num_classes, activation='softmax')

    network = regression(network, optimizer='adam', loss='categorical_crossentropy',
                         learning_rate=0.001)

    return network


def train(network, X, Y):
    # Trainingeed data dictionary, with placeholders as keys, and data as values.
    model = tflearn.DNN(network, checkpoint_path='model_vgg',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output')
    model.fit(X, Y, n_epoch=100, validation_set=0.1, shuffle=True, show_metric=True,
              batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id='vgg_fingerprint')
    model.save('model_save.model')


def predict(network, modelfile, images):
    model = tflearn.DNN(network)
    model.load(modelfile)

    return model.predict(images)


if __name__ == '__main__':
    #image, label = load_data('train.txt', 5)
    X, Y = load_from_pkl('dataset.pkl')
    net = create_vggnet(5)
    train(net, X, Y)

I have tried using numpy reshape the dimensions change. However, the following error is repeated.

我尝试过使用numpy重塑尺寸变化。然而,下面的错误是重复的。

The error is as follows. ValueError: Can not feed value of shape (64,224,224) for Tensor u'input / X: 0 ', which has shape (?, 224, 224, 3)

错误如下。ValueError:不能给张量u'输入/ X: 0 '提供形状(64,224,224)的值。、224、224、3)

What is the problem?

这个问题是什么?

1 个解决方案

#1


0  

The problem is with your input shape - it doesn't match the input layer.

问题是你的输入形状——它与输入层不匹配。

The input layer is defined in create_vggnet():

输入层在create_vggnet()中定义:

def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 3], name='input')

So you expect None (== any) times (224, 224, 3), that is 224x224 x RGB (3 channels). And you pass 64 (your batch size) times 224x224.

所以你希望没有(= any)乘以(224,224,3)也就是224x224 x RGB(3个通道)通过64(批号)乘以224x224。

There are two fixes:

有两个补丁:

1) (probably more wasteful) - extend the images to RGB.

1)(可能更浪费)——将图像扩展到RGB。

So, after you convert the image to 'L' (lightness, that is gray levels) and then binarize, convert it to RGB first. Then you can convert it to 'F'

因此,当你将图像转换为“L”(即亮度,即灰度级),然后进行二值化后,首先将其转换为RGB。然后你可以把它转换成F

(See: http://effbot.org/imagingbook/image.htm and How do I save a mode 'F' image? (Python/PIL))

(参见:http://effbot.org/imagingbook/image.htm,如何保存模式'F'图像?(Python /公益诉讼))

def binarization(in_img, threshold):
    im = in_img.convert('L')
    for i in range(im.size[0]):
        for j in range(im.size[1]):
            if im.getpixel((i, j)) > threshold:
                im.putpixel((i, j), 255)
            else:
                im.putpixel((i, j), 0)
    return im.convert('RGB').convert('F')

2) (less wasteful, but you're changing your network a bit (just the input layer) - so it can be argued, that this "isn't VGG 16 anymore") You can change the input layer to 1-channel.

2)(更少的浪费,但是您正在稍微改变您的网络(只是输入层)——因此可以认为,这“不再是VGG 16”)您可以将输入层更改为1通道。

def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 1], name='input')

Unfortunately, shape=[None, 224, 224] doesn't work (the error is something about "The Tensor needs to be 4D"). So we have a shape of (224, 224, 1) for a single input value.

不幸的是,shape=[None, 224, 224]不起作用(错误是“张量需要是4D”)。对于一个输入值,我们有(224 224 224 224,224,1)的形状。

So you need to make the images have an extra dimension:

所以你需要让这些图像有一个额外的维度:

def pil_to_nparray(pil_image):
    pil_image.load()

    return np.expand_dims(np.asarray(pil_image, dtype="float32"), 2)

or (maybe even better):

或(甚至更好):

def pil_to_nparray(pil_image):
    pil_image.load()

    return np.asarray(pil_image, dtype="float32").reshape((224, 224, 1))

(the latter version looks more direct, you know exactly what it does) But this only works if the input image is 224x224, while the expand_dims would always add the extra dimension, for any size.

(后一个版本看起来更直接,您确切地知道它是做什么的)但是这只在输入映像为224x224时有效,而expand_dims总是为任何大小添加额外的维度。

#1


0  

The problem is with your input shape - it doesn't match the input layer.

问题是你的输入形状——它与输入层不匹配。

The input layer is defined in create_vggnet():

输入层在create_vggnet()中定义:

def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 3], name='input')

So you expect None (== any) times (224, 224, 3), that is 224x224 x RGB (3 channels). And you pass 64 (your batch size) times 224x224.

所以你希望没有(= any)乘以(224,224,3)也就是224x224 x RGB(3个通道)通过64(批号)乘以224x224。

There are two fixes:

有两个补丁:

1) (probably more wasteful) - extend the images to RGB.

1)(可能更浪费)——将图像扩展到RGB。

So, after you convert the image to 'L' (lightness, that is gray levels) and then binarize, convert it to RGB first. Then you can convert it to 'F'

因此,当你将图像转换为“L”(即亮度,即灰度级),然后进行二值化后,首先将其转换为RGB。然后你可以把它转换成F

(See: http://effbot.org/imagingbook/image.htm and How do I save a mode 'F' image? (Python/PIL))

(参见:http://effbot.org/imagingbook/image.htm,如何保存模式'F'图像?(Python /公益诉讼))

def binarization(in_img, threshold):
    im = in_img.convert('L')
    for i in range(im.size[0]):
        for j in range(im.size[1]):
            if im.getpixel((i, j)) > threshold:
                im.putpixel((i, j), 255)
            else:
                im.putpixel((i, j), 0)
    return im.convert('RGB').convert('F')

2) (less wasteful, but you're changing your network a bit (just the input layer) - so it can be argued, that this "isn't VGG 16 anymore") You can change the input layer to 1-channel.

2)(更少的浪费,但是您正在稍微改变您的网络(只是输入层)——因此可以认为,这“不再是VGG 16”)您可以将输入层更改为1通道。

def create_vggnet(num_classes):
    # Building 'VGGNet'
    network = input_data(shape=[None, 224, 224, 1], name='input')

Unfortunately, shape=[None, 224, 224] doesn't work (the error is something about "The Tensor needs to be 4D"). So we have a shape of (224, 224, 1) for a single input value.

不幸的是,shape=[None, 224, 224]不起作用(错误是“张量需要是4D”)。对于一个输入值,我们有(224 224 224 224,224,1)的形状。

So you need to make the images have an extra dimension:

所以你需要让这些图像有一个额外的维度:

def pil_to_nparray(pil_image):
    pil_image.load()

    return np.expand_dims(np.asarray(pil_image, dtype="float32"), 2)

or (maybe even better):

或(甚至更好):

def pil_to_nparray(pil_image):
    pil_image.load()

    return np.asarray(pil_image, dtype="float32").reshape((224, 224, 1))

(the latter version looks more direct, you know exactly what it does) But this only works if the input image is 224x224, while the expand_dims would always add the extra dimension, for any size.

(后一个版本看起来更直接,您确切地知道它是做什么的)但是这只在输入映像为224x224时有效,而expand_dims总是为任何大小添加额外的维度。