记CTC原理

CTC，Connectionist temporal classification。从字面上理解它是用来解决时序类数据的分类问题。语音识别端到端解决方案中应用的技术。主要是解决以下两个问题

解决语音输入和标签的对齐问题。对于一段语音输入，将其转化为声学频谱图，传统的声学模型需要对其频谱图上的每一帧对应的发音因素，而采用CTC作为损失函数，只需要一个输入序列和输出序列即可。
CTC是一种损失函数，用来衡量输入的序列经过神经网络之后，和真实的输出相差有多少。对于nihao这个发音，不同的人有不同的发音方式，可能是nnnnniiiihhhaaaooo... 等等，CTC能衡量长度不一的输入经过神经网络后与实际结果的损失值大小。

Keras中CTC实现

from keras import backend as K

from keras.models import Model

from keras.layers import (Input, Lambda)

from keras.optimizers import SGD

from keras.callbacks import ModelCheckpoint

import os

def ctc_lambda_func(args):

    y_pred, labels, input_length, label_length = args

    return K.ctc_batch_cost(labels, y_pred, input_length, label_length)

def add_ctc_loss(input_to_softmax):

    the_labels = Input(name='the_labels', shape=(None,), dtype='float32')

    input_lengths = Input(name='input_length', shape=(1,), dtype='int64')

    label_lengths = Input(name='label_length', shape=(1,), dtype='int64')

    output_lengths = Lambda(input_to_softmax.output_length)(input_lengths)

    # CTC loss is implemented in a lambda layer

    loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')(

        [input_to_softmax.output, the_labels, output_lengths, label_lengths])

    model = Model(

        inputs=[input_to_softmax.input, the_labels, input_lengths, label_lengths],

        outputs=loss_out)

    return model

CTC算法概念

 CTC原理

相关文章