【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

CRNN是OCR领域非常经典且被广泛使用的识别算法，其理论基础可以参考我上一篇文章，本文将着重讲解CRNN代码实现过程以及识别效果。

数据处理

利用图像处理技术我们手工大批量生成文字图像，一共360万张图像样本，效果如下：

我们划分了训练集和测试集（10:1），并单独存储为两个文本文件：

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

文本文件里的标签格式如下：

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

我们获取到的是最原始的数据集，在图像深度学习训练中我们一般都会把原始数据集转化为lmdb格式以方便后续的网络训练。因此我们也需要对该数据集进行lmdb格式转化。下面代码就是用于lmdb格式转化，思路比较简单，就是首先读入图像和对应的文本标签，先使用字典将该组合存储起来（cache），再利用lmdb包的put函数把字典(cache)存储的k,v写成lmdb格式存储好（cache当有了1000个元素就put一次）。



import lmdb

import cv2

import numpy as np

import os

def checkImageIsValid(imageBin):

    if imageBin is None:

        return False

    try:

        imageBuf = np.fromstring(imageBin, dtype=np.uint8)

        img = cv2.imdecode(imageBuf, cv2.IMREAD_GRAYSCALE)

        imgH, imgW = img.shape[0], img.shape[1]

    except:

        return False

    else:

        if imgH * imgW == 0:

            return False

    return True

def writeCache(env, cache):

    with env.begin(write=True) as txn:

        for k, v in cache.items():

            txn.put(k, v)

def createDataset(outputPath, imagePathList, labelList, lexiconList=None, checkValid=True):

    """

    Create LMDB dataset for CRNN training.

    ARGS:

        outputPath    : LMDB output path

        imagePathList : list of image path

        labelList     : list of corresponding groundtruth texts

        lexiconList   : (optional) list of lexicon lists

        checkValid    : if true, check the validity of every image

    """

    assert (len(imagePathList) == len(labelList))

    nSamples = len(imagePathList)

    env = lmdb.open(outputPath, map_size=1099511627776)

    cache = {}

    cnt = 1

    for i in range(nSamples):

        imagePath = ''.join(imagePathList[i]).split()[0].replace('\n', '').replace('\r\n', '')

        # print(imagePath)

        label = ''.join(labelList[i])

        print(label)

        # if not os.path.exists(imagePath):

        #     print('%s does not exist' % imagePath)

        #     continue

        with open('.' + imagePath, 'r') as f:

            imageBin = f.read()

        if checkValid:

            if not checkImageIsValid(imageBin):

                print('%s is not a valid image' % imagePath)

                continue

        imageKey = 'image-%09d' % cnt

        labelKey = 'label-%09d' % cnt

        cache[imageKey] = imageBin

        cache[labelKey] = label

        if lexiconList:

            lexiconKey = 'lexicon-%09d' % cnt

            cache[lexiconKey] = ' '.join(lexiconList[i])

        if cnt % 1000 == 0:

            writeCache(env, cache)

            cache = {}

            print('Written %d / %d' % (cnt, nSamples))

        cnt += 1

        print(cnt)

    nSamples = cnt - 1

    cache['num-samples'] = str(nSamples)

    writeCache(env, cache)

    print('Created dataset with %d samples' % nSamples)

OUT_PATH = '../crnn_train_lmdb'

IN_PATH = './train.txt'

if __name__ == '__main__':

    outputPath = OUT_PATH

    if not os.path.exists(OUT_PATH):

        os.mkdir(OUT_PATH)

    imgdata = open(IN_PATH)

    imagePathList = list(imgdata)

    labelList = []

    for line in imagePathList:

        word = line.split()[1]

        labelList.append(word)

    createDataset(outputPath, imagePathList, labelList)

我们运行上面的代码，可以得到训练集和测试集的lmdb

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

在数据准备部分还有一个操作需要强调的，那就是文字标签数字化，即我们用数字来表示每一个文字（汉字，英文字母，标点符号）。比如“我”字对应的id是1，“l”对应的id是1000，“？”对应的id是90，如此类推，这种编解码工作使用字典数据结构存储即可，训练时先把标签编码（encode），预测时就将网络输出结果解码(decode)成文字输出。



class strLabelConverter(object):

    """Convert between str and label.

    NOTE:

        Insert `blank` to the alphabet for CTC.

    Args:

        alphabet (str): set of the possible characters.

        ignore_case (bool, default=True): whether or not to ignore all of the case.

    """

    def __init__(self, alphabet, ignore_case=False):

        self._ignore_case = ignore_case

        if self._ignore_case:

            alphabet = alphabet.lower()

        self.alphabet = alphabet + '-'  # for `-1` index

        self.dict = {}

        for i, char in enumerate(alphabet):

            # NOTE: 0 is reserved for 'blank' required by wrap_ctc

            self.dict[char] = i + 1

    def encode(self, text):

        """Support batch or single str.

        Args:

            text (str or list of str): texts to convert.

        Returns:

            torch.IntTensor [length_0 + length_1 + ... length_{n - 1}]: encoded texts.

            torch.IntTensor [n]: length of each text.

        """

        length = []

        result = []

        for item in text:

            item = item.decode('utf-8', 'strict')

            length.append(len(item))

            for char in item:

                index = self.dict[char]

                result.append(index)

        text = result

        # print(text,length)

        return (torch.IntTensor(text), torch.IntTensor(length))

    def decode(self, t, length, raw=False):

        """Decode encoded texts back into strs.

        Args:

            torch.IntTensor [length_0 + length_1 + ... length_{n - 1}]: encoded texts.

            torch.IntTensor [n]: length of each text.

        Raises:

            AssertionError: when the texts and its length does not match.

        Returns:

            text (str or list of str): texts to convert.

        """

        if length.numel() == 1:

            length = length[0]

            assert t.numel() == length, "text with length: {} does not match declared length: {}".format(t.numel(),

                                                                                                         length)

            if raw:

                return ''.join([self.alphabet[i - 1] for i in t])

            else:

                char_list = []

                for i in range(length):

                    if t[i] != 0 and (not (i > 0 and t[i - 1] == t[i])):

                        char_list.append(self.alphabet[t[i] - 1])

                return ''.join(char_list)

        else:

            # batch mode

            assert t.numel() == length.sum(), "texts with length: {} does not match declared length: {}".format(

                t.numel(), length.sum())

            texts = []

            index = 0

            for i in range(length.numel()):

                l = length[i]

                texts.append(

                    self.decode(

                        t[index:index + l], torch.IntTensor([l]), raw=raw))

                index += l

            return texts

网络设计

根据CRNN的论文描述，CRNN是由CNN-》RNN-》CTC三大部分架构而成，分别对应卷积层、循环层和转录层。首先CNN部分用于底层的特征提取，RNN采取了BiLSTM，用于学习关联序列信息并预测标签分布，CTC用于序列对齐，输出预测结果。

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

为了将特征输入到Recurrent Layers，做如下处理：

首先会将图像缩放到 32×W×3 大小
然后经过CNN后变为 1×（W/4）× 512
接着针对LSTM，设置 T=(W/4) ， D=512 ，即可将特征输入LSTM。

以上是理想训练时的操作，但是CRNN论文提到的网络输入是归一化好的100×32大小的灰度图像，即高度统一为32个像素。下面是CRNN的深度神经网络结构图，CNN采取了经典的VGG16，值得注意的是，在VGG16的第3第4个max pooling层CRNN采取的是1×2的矩形池化窗口(w×h)，这有别于经典的VGG16的2×2的正方形池化窗口，这个改动是因为文本图像多数都是高较小而宽较长，所以其feature map也是这种高小宽长的矩形形状，如果使用1×2的池化窗口则更适合英文字母识别（比如区分i和l）。VGG16部分还引入了BatchNormalization模块，旨在加速模型收敛。还有值得注意一点，CRNN的输入是灰度图像，即图像深度为1。CNN部分的输出是512x1x16（c×h×w）的特征向量。

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

接下来分析RNN层。RNN部分使用了双向LSTM，隐藏层单元数为256，CRNN采用了两层BiLSTM来组成这个RNN层，RNN层的输出维度将是（s,b,class_num），其中class_num为文字类别总数。

值得注意的是：Pytorch里的LSTM单元接受的输入都必须是3维的张量（Tensors）.每一维代表的意思不能弄错。第一维体现的是序列（sequence）结构，第二维度体现的是小块（mini-batch）结构，第三位体现的是输入的元素（elements of input）。如果在应用中不适用小块结构，那么可以将输入的张量中该维度设为1，但必须要体现出这个维度。

LSTM的输入

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.

The input can also be a packed variable length sequence.

input shape(a,b,c)

a:seq_len  -> 序列长度

b:batch

c:input_size   输入特征数目

根据LSTM的输入要求，我们要对CNN的输出做些调整，即把CNN层的输出调整为[seq_len, batch, input_size]形式，下面为具体操作：先使用squeeze函数移除h维度，再使用permute函数调整各维顺序，即从原来[w, b, c]的调整为[seq_len, batch, input_size]，具体尺寸为[16,batch,512]，调整好之后即可以将该矩阵送入RNN层。



x = self.cnn(x)

b, c, h, w = x.size()

# print(x.size()): b,c,h,w

assert h == 1   # "the height of conv must be 1"

x = x.squeeze(2)  # remove h dimension, b *512 * width

x = x.permute(2, 0, 1)  # [w, b, c] = [seq_len, batch, input_size]

x = self.rnn(x)

RNN层输出格式如下，因为我们采用的是双向BiLSTM，所以输出维度将是hidden_unit * 2

Outputs: output, (h_n, c_n)

output of shape (seq_len, batch, num_directions * hidden_size)

h_n of shape (num_layers * num_directions, batch, hidden_size)

c_n (num_layers * num_directions, batch, hidden_size)

然后我们再通过线性变换操作self.embedding1 = torch.nn.Linear(hidden_unit * 2, 512)是的输出维度再次变为512，继续送入第二个LSTM层。第二个LSTM层后继续接线性操作torch.nn.Linear(hidden_unit * 2, class_num)使得整个RNN层的输出为文字类别总数。

import torch

import torch.nn.functional as F

class Vgg_16(torch.nn.Module):

    def __init__(self):

        super(Vgg_16, self).__init__()

        self.convolution1 = torch.nn.Conv2d(1, 64, 3, padding=1)

        self.pooling1 = torch.nn.MaxPool2d(2, stride=2)

        self.convolution2 = torch.nn.Conv2d(64, 128, 3, padding=1)

        self.pooling2 = torch.nn.MaxPool2d(2, stride=2)

        self.convolution3 = torch.nn.Conv2d(128, 256, 3, padding=1)

        self.convolution4 = torch.nn.Conv2d(256, 256, 3, padding=1)

        self.pooling3 = torch.nn.MaxPool2d((1, 2), stride=(2, 1)) # notice stride of the non-square pooling

        self.convolution5 = torch.nn.Conv2d(256, 512, 3, padding=1)

        self.BatchNorm1 = torch.nn.BatchNorm2d(512)

        self.convolution6 = torch.nn.Conv2d(512, 512, 3, padding=1)

        self.BatchNorm2 = torch.nn.BatchNorm2d(512)

        self.pooling4 = torch.nn.MaxPool2d((1, 2), stride=(2, 1))

        self.convolution7 = torch.nn.Conv2d(512, 512, 2)

    def forward(self, x):

        x = F.relu(self.convolution1(x), inplace=True)

        x = self.pooling1(x)

        x = F.relu(self.convolution2(x), inplace=True)

        x = self.pooling2(x)

        x = F.relu(self.convolution3(x), inplace=True)

        x = F.relu(self.convolution4(x), inplace=True)

        x = self.pooling3(x)

        x = self.convolution5(x)

        x = F.relu(self.BatchNorm1(x), inplace=True)

        x = self.convolution6(x)

        x = F.relu(self.BatchNorm2(x), inplace=True)

        x = self.pooling4(x)

        x = F.relu(self.convolution7(x), inplace=True)

        return x  # b*512x1x16

class RNN(torch.nn.Module):

    def __init__(self, class_num, hidden_unit):

        super(RNN, self).__init__()

        self.Bidirectional_LSTM1 = torch.nn.LSTM(512, hidden_unit, bidirectional=True)

        self.embedding1 = torch.nn.Linear(hidden_unit * 2, 512)

        self.Bidirectional_LSTM2 = torch.nn.LSTM(512, hidden_unit, bidirectional=True)

        self.embedding2 = torch.nn.Linear(hidden_unit * 2, class_num)

    def forward(self, x):

        x = self.Bidirectional_LSTM1(x)   # LSTM output: output, (h_n, c_n)

        T, b, h = x[0].size()   # x[0]: (seq_len, batch, num_directions * hidden_size)

        x = self.embedding1(x[0].view(T * b, h))  # pytorch view() reshape as [T * b, nOut]

        x = x.view(T, b, -1)  # [16, b, 512]

        x = self.Bidirectional_LSTM2(x)

        T, b, h = x[0].size()

        x = self.embedding2(x[0].view(T * b, h))

        x = x.view(T, b, -1)

        return x  # [16,b,class_num]

# output: [s,b,class_num]

class CRNN(torch.nn.Module):

    def __init__(self, class_num, hidden_unit=256):

        super(CRNN, self).__init__()

        self.cnn = torch.nn.Sequential()

        self.cnn.add_module('vgg_16', Vgg_16())

        self.rnn = torch.nn.Sequential()

        self.rnn.add_module('rnn', RNN(class_num, hidden_unit))

    def forward(self, x):

        x = self.cnn(x)

        b, c, h, w = x.size()

        # print(x.size()): b,c,h,w

        assert h == 1   # "the height of conv must be 1"

        x = x.squeeze(2)  # remove h dimension, b *512 * width

        x = x.permute(2, 0, 1)  # [w, b, c] = [seq_len, batch, input_size]

        # x = x.transpose(0, 2)

        # x = x.transpose(1, 2)

        x = self.rnn(x)

        return x

损失函数设计

刚刚完成了CNN层和RNN层的设计，现在开始设计转录层，即将RNN层输出的结果翻译成最终的识别文字结果，从而实现不定长的文字识别。pytorch没有内置的CTC loss，所以只能去Github下载别人实现的CTC loss来完成损失函数部分的设计。安装CTC-loss的方式如下：

git clone https://github.com/SeanNaren/warp-ctc.git

cd warp-ctc

mkdir build; cd build

cmake ..

make

cd ../pytorch_binding/

python setup.py install

cd ../build

cp libwarpctc.so ../../usr/lib

待安装完毕后，我们可以直接调用CTC loss了，以一个小例子来说明ctc loss的用法。

import torch

from warpctc_pytorch import CTCLoss

ctc_loss = CTCLoss()

# expected shape of seqLength x batchSize x alphabet_size

probs = torch.FloatTensor([[[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]]]).transpose(0, 1).contiguous()

labels = torch.IntTensor([1, 2])

label_sizes = torch.IntTensor([2])

probs_sizes = torch.IntTensor([2])

probs.requires_grad_(True)  # tells autograd to compute gradients for probs

cost = ctc_loss(probs, labels, probs_sizes, label_sizes)

cost.backward()

CTCLoss(size_average=False, length_average=False)

    # size_average (bool): normalize the loss by the batch size (default: False)

    # length_average (bool): normalize the loss by the total number of frames in the batch. If True, supersedes size_average (default: False)

forward(acts, labels, act_lens, label_lens)

    # acts: Tensor of (seqLength x batch x outputDim) containing output activations from network (before softmax)

    # labels: 1 dimensional Tensor containing all the targets of the batch in one large sequence

    # act_lens: Tensor of size (batch) containing size of each output sequence from the network

    # label_lens: Tensor of (batch) containing label length of each example

从上面的代码可以看出，CTCLoss的输入为[probs, labels, probs_sizes, label_sizes]，即预测结果、标签、预测结果的数目和标签数目。那么我们仿照这个例子开始设计CRNN的CTC LOSS。



preds = net(image)

preds_size = Variable(torch.IntTensor([preds.size(0)] * batch_size))  # preds.size(0)=w=16

cost = criterion(preds, text, preds_size, length) / batch_size   # 这里的length就是包含每个文本标签的长度的list，除以batch_size来求平均loss

cost.backward()

网络训练设计

接下来我们需要完善具体的训练流程，我们还写了个trainBatch函数用于bacth形式的梯度更新。

def trainBatch(net, criterion, optimizer, train_iter):

    data = train_iter.next()

    cpu_images, cpu_texts = data

    batch_size = cpu_images.size(0)

    lib.dataset.loadData(image, cpu_images)

    t, l = converter.encode(cpu_texts)

    lib.dataset.loadData(text, t)

    lib.dataset.loadData(length, l)

    preds = net(image)

    #print("preds.size=%s" % preds.size)

    preds_size = Variable(torch.IntTensor([preds.size(0)] * batch_size))  # preds.size(0)=w=22

    cost = criterion(preds, text, preds_size, length) / batch_size  # length= a list that contains the len of text label in a batch

    net.zero_grad()

    cost.backward()

    optimizer.step()

    return cost

整个网络训练的流程如下：CTC-LOSS对象->CRNN网络对象->image,text,len的tensor初始化->优化器初始化，然后开始循环每个epoch，指定迭代次数就进行模型验证和模型保存。CRNN论文提到所采用的优化器是Adadelta，但是经过我实验看来，Adadelta的收敛速度非常慢，所以改用了RMSprop优化器，模型收敛速度大幅度提升。



    criterion = CTCLoss()

    net = Net.CRNN(n_class)

    print(net)

    net.apply(lib.utility.weights_init)

    image = torch.FloatTensor(Config.batch_size, 3, Config.img_height, Config.img_width)

    text = torch.IntTensor(Config.batch_size * 5)

    length = torch.IntTensor(Config.batch_size)

    if cuda:

        net.cuda()

        image = image.cuda()

        criterion = criterion.cuda()

    image = Variable(image)

    text = Variable(text)

    length = Variable(length)

    loss_avg = lib.utility.averager()

    optimizer = optim.RMSprop(net.parameters(), lr=Config.lr)

    #optimizer = optim.Adadelta(net.parameters(), lr=Config.lr)

    #optimizer = optim.Adam(net.parameters(), lr=Config.lr,

                           #betas=(Config.beta1, 0.999))

    for epoch in range(Config.epoch):

        train_iter = iter(train_loader)

        i = 0

        while i < len(train_loader):

            for p in net.parameters():

                p.requires_grad = True

            net.train()

            cost = trainBatch(net, criterion, optimizer, train_iter)

            loss_avg.add(cost)

            i += 1

            if i % Config.display_interval == 0:

                print('[%d/%d][%d/%d] Loss: %f' %

                      (epoch, Config.epoch, i, len(train_loader), loss_avg.val()))

                loss_avg.reset()

            if i % Config.test_interval == 0:

                val(net, test_dataset, criterion)

            # do checkpointing

            if i % Config.save_interval == 0:

                torch.save(

                    net.state_dict(), '{0}/netCRNN_{1}_{2}.pth'.format(Config.model_dir, epoch, i))

训练过程与测试设计

下面这幅图表示的就是CRNN训练过程，文字类别数为6732，一共训练20个epoch，batch_Szie设置为64，所以一共是51244次迭代/epoch。

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

在迭代4个epoch时，loss降到0.1左右，acc上升到0.98。

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

接下来我们设计推断预测部分的代码，首先需初始化CRNN网络，载入训练好的模型，读入待预测的图像并resize为高为32的灰度图像，接着讲该图像送入网络，最后再将网络输出解码成文字即可输出。



import time

import torch

import os

from torch.autograd import Variable

import lib.convert

import lib.dataset

from PIL import Image

import Net.net as Net

import alphabets

import sys

import Config

os.environ['CUDA_VISIBLE_DEVICES'] = "4"

crnn_model_path = './bs64_model/netCRNN_9_48000.pth'

IMG_ROOT = './test_images'

running_mode = 'gpu'

alphabet = alphabets.alphabet

nclass = len(alphabet) + 1

def crnn_recognition(cropped_image, model):

    converter = lib.convert.strLabelConverter(alphabet)  # 标签转换

    image = cropped_image.convert('L')  # 图像灰度化

    ### Testing images are scaled to have height 32. Widths are

    # proportionally scaled with heights, but at least 100 pixels

    w = int(image.size[0] / (280 * 1.0 / Config.infer_img_w))

    #scale = image.size[1] * 1.0 / Config.img_height

    #w = int(image.size[0] / scale)

    transformer = lib.dataset.resizeNormalize((w, Config.img_height))

    image = transformer(image)

    if torch.cuda.is_available():

        image = image.cuda()

    image = image.view(1, *image.size())

    image = Variable(image)

    model.eval()

    preds = model(image)

    _, preds = preds.max(2)

    preds = preds.transpose(1, 0).contiguous().view(-1)

    preds_size = Variable(torch.IntTensor([preds.size(0)]))

    sim_pred = converter.decode(preds.data, preds_size.data, raw=False)  # 预测输出解码成文字

    print('results: {0}'.format(sim_pred))

if __name__ == '__main__':

    # crnn network

    model = Net.CRNN(nclass)

    # 载入训练好的模型，CPU和GPU的载入方式不一样，需分开处理

    if running_mode == 'gpu' and torch.cuda.is_available():

        model = model.cuda()

        model.load_state_dict(torch.load(crnn_model_path))

    else:

        model.load_state_dict(torch.load(crnn_model_path, map_location='cpu'))

    print('loading pretrained model from {0}'.format(crnn_model_path))

    files = sorted(os.listdir(IMG_ROOT))  # 按文件名排序

    for file in files:

        started = time.time()

        full_path = os.path.join(IMG_ROOT, file)

        print("=============================================")

        print("ocr image is %s" % full_path)

        image = Image.open(full_path)

        crnn_recognition(image, model)

        finished = time.time()

        print('elapsed time: {0}'.format(finished - started))

识别效果和总结

首先我从测试集中抽取几张图像送入模型识别，识别全部正确。

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

我也随机在一些文档图片、扫描图像上截取了一段文字图像送入我们该模型进行识别，识别效果也挺好的，基本识别正确，表明模型泛化能力很强。

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

我还截取了增值税扫描发票上的文本图像来看看我们的模型能否还可以表现出稳定的识别效果：

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现

这里做个小小的总结：对于端到端不定长的文字识别，CRNN是最为经典的识别算法，而且实战看来效果非常不错。上面识别结果可以看出，虽然我们用于训练的数据集是自己生成的，但是我们该模型对于pdf文档、扫描图像等都有很不错的识别结果，如果需要继续提升对特定领域的文本图像的识别，直接大量加入该类图像用于训练即可。CRNN的完整代码可以参考我的Github。

【OCR技术系列之八】端到端不定长文本识别CRNN代码实现的更多相关文章

【OCR技术系列之七】端到端不定长文字识别CRNN算法详解
在以前的OCR任务中,识别过程分为两步:单字切割和分类任务.我们一般都会讲一连串文字的文本文件先利用投影法切割出单个字体,在送入CNN里进行文字分类.但是此法已经有点过时了,现在更流行的是基于深度学习 ...
【OCR技术系列之四】基于深度学习的文字识别（3755个汉字）
上一篇提到文字数据集的合成,现在我们手头上已经得到了3755个汉字(一级字库)的印刷体图像数据集,我们可以利用它们进行接下来的3755个汉字的识别系统的搭建.用深度学习做文字识别,用的网络当然是CNN ...
【OCR技术系列之四】基于深度学习的文字识别
上一篇提到文字数据集的合成,现在我们手头上已经得到了3755个汉字(一级字库)的印刷体图像数据集,我们可以利用它们进行接下来的3755个汉字的识别系统的搭建.用深度学习做文字识别,用的网络当然是CNN ...
【OCR技术系列一】光学字符识别技术介绍
注:此篇内容主要是综合整理了光学字符识别和OCR技术系列之一]字符识别技术总览,详情见文末参考文献什么是 OCR? OCR(Optical Character Recognition,光学字符识别 ...
端到端文本识别CRNN论文解读
CRNN 论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Applica ...
【OCR技术系列之一】字符识别技术总览
最近入坑研究OCR,看了比较多关于OCR的资料,对OCR的前世今生也有了一个比较清晰的了解.所以想写一篇关于OCR技术的综述,对OCR相关的知识点都好好总结一遍,以加深个人理解. 什么是OCR? OC ...
【OCR技术系列之五】自然场景文本检测技术综述（CTPN, SegLink, EAST）
文字识别分为两个具体步骤:文字的检测和文字的识别,两者缺一不可,尤其是文字检测,是识别的前提条件,若文字都找不到,那何谈文字识别.今天我们首先来谈一下当今流行的文字检测技术有哪些. 文本检测不是一件简 ...
【OCR技术系列之六】文本检测CTPN的代码实现
这几天一直在用Pytorch来复现文本检测领域的CTPN论文,本文章将从数据处理.训练标签生成.神经网络搭建.损失函数设计.训练主过程编写等这几个方面来一步一步复现CTPN.CTPN算法理论可以参考这 ...
【OCR技术系列之二】文字定位与切割
要做文字识别,第一步要考虑的就是怎么将每一个字符从图片中切割下来,然后才可以送入我们设计好的模型进行字符识别.现在就以下面这张图片为例,说一说最一般的字符切割的步骤是哪些. 当然,我们实际上要识别的图 ...

随机推荐

&period;NET开源进行时：消除误解、努力前行（本文首发于《程序员》2015第10A期的原始版本）
2014年11月12日,ASP.NET之父.微软云计算与企业级产品工程部执行副总裁Scott Guthrie,在Connect全球开发者在线会议上宣布,微软将开源全部.NET核心运行时,并将.NET ...
2015&period;5&period;2-2015&period;5&period;8 Tip jQuery ，前端组件库，inline-block元素间距等
有忙于它事,故延迟了,但在坚持! 1.Tip jQuery 2.给span加display: inline-block; 怎样能对齐? 解决方法:vertical-align: bottom: ...
JVM调优（这里主要是针对优化基于分布式Mahout的推荐引擎）
优化推荐系统的JVM关键参数 -Xmx 设定Java允许使用的最大堆空间.例如-Xmx512m表示堆空间上限为512MB -server 现代JVM有两个重要标志:-client和-server,分别 ...
mirantis fuel puppet执行顺序和对整个项目代码的执行流程理解
stage执行顺序 stage {'zero': } -> stage {'first': } -> stage {'openstack-custom-repo': } -> sta ...
Android开发性能优化大总结（二）
接上篇: 1.在JAVA+Oracle 的应用系统开发中,Java中内嵌的sql语句尽量使用大写的形式,以减少Oracle解析器的解析负担: 2.由于JVM的有其自身的GC机制,但是也有遗漏的隐患,建 ...
Struts学习之手动验证
* 首先要从页面中获取对应的标签name属性的值,在动作类action中声明同名的属性,提供get和set方法 * 要继承ActionSupport类或者实现Validateable接口 ...
Linux下 jenkins 的使用
在上一张jenkins已经安装成功了,现在开始使用需要先安装几个插件选择maven项目这里会出现错误,点进去去配置配置JDK,这里用JDK1.8 ,不然jenkins会报错的这里写上mave ...
如何使用Java读写系统属性？
如何使用Java读写系统属性? 读: Properties props = System.getProperties(); Enumeration prop_names = props.propert ...
sql server merge 的用法
CREATE TABLE tTable ( id INT , f1 VARCHAR(10) , f2 VARCHAR(10) , f3 VARCHAR(10) ) GO INSERT INTO tTa ...
Oracle 11gR2 RAC 数据库不能连接(ORA-12537&colon; TNS&colon;connection closed)
Oracle 11gR2 RAC 数据库不能连接(ORA-12537: TNS:connection closed)的解决 [oracle@rac01 ~]$ sqlplus /nolog SQL*P ...