梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficients,MFCCs)
这个词涉及到梅尔频率和倒谱,所以先分别了解这俩词的意思。
1. 倒谱(Cepstral):
(这个词是频谱spectrum的前面四个字母顺序倒过来,所以和频谱有一定联系)倒谱是一种信号的频谱经过对数运算后再进行傅里叶反变换得到的谱。其计算过程可用下面的框图表示:
因为语音信号实际上是一个卷性信号(把声道看作一个线性时不变系统,声音的产生则为一个激励通过这个系统),记原时域信号为,经过DFT变换后得到的信号为,对两边同时取幅度的对数,则有
再对两边进行反傅里叶变换则可以得到倒谱:。可以发现原来的时域卷性信号变成了时域加性信号。这是一种同态信号处理,可以将非线性问题转化为线性问题。
2. 梅尔频率:
梅尔频率是一种给予人耳对等距的音高变化的感官判断而定的非线性频率刻度。它与频率赫兹的关系为
从频率变换到梅尔频率可以通过一组三角滤波器实现:
利用梅尔频率刻度的滤波器组对频域信号进行切分,最后每个频率段对应一个数值。
3. 因此,梅尔频率倒谱(MFC)是基于声音频率的非线性梅尔刻度(mel scale)的对数能量频谱的线性变换。梅尔频率倒谱系数 (MFCCs)就是组成梅尔频率倒谱的系数。
获得MFCC一般需要以下步骤:
4. matlab代码实现
分享一个 作者为:Kamil Wojcicki写的matlab代码。代码注释非常详细,方便用于学习和使用。
function [ CC, FBE, frames ] = mfcc( speech, fs, Tw, Ts, alpha, window, R, M, N, L )
% MFCC Mel frequency cepstral coefficient feature extraction.
%
% MFCC(S,FS,TW,TS,ALPHA,WINDOW,R,M,N,L) returns mel frequency
% cepstral coefficients (MFCCs) computed from speech signal given
% in vector S and sampled at FS (Hz). The speech signal is first
% preemphasised using a first order FIR filter with preemphasis
% coefficient ALPHA. The preemphasised speech signal is subjected
% to the short-time Fourier transform analysis with frame durations
% of TW (ms), frame shifts of TS (ms) and analysis window function
% given as a function handle in WINDOW. This is followed by magnitude
% spectrum computation followed by filterbank design with M triangular
% filters uniformly spaced on the mel scale between lower and upper
% frequency limits given in R (Hz). The filterbank is applied to
% the magnitude spectrum values to produce filterbank energies (FBEs)
% (M per frame). Log-compressed FBEs are then decorrelated using the
% discrete cosine transform to produce cepstral coefficients. Final
% step applies sinusoidal lifter to produce liftered MFCCs that
% closely match those produced by HTK [1].
%
% [CC,FBE,FRAMES]=MFCC(...) also returns FBEs and windowed frames,
% with feature vectors and frames as columns.
%
% This framework is based on Dan Ellis' rastamat routines [2]. The
% emphasis is placed on closely matching MFCCs produced by HTK [1]
% (refer to p.337 of [1] for HTK's defaults) with simplicity and
% compactness as main considerations, but at a cost of reduced
% flexibility. This routine is meant to be easy to extend, and as
% a starting point for work with cepstral coefficients in MATLAB.
% The triangular filterbank equations are given in [3].
%
% Inputs
% S is the input speech signal (as vector)
%
% FS is the sampling frequency (Hz)
%
% TW is the analysis frame duration (ms)
%
% TS is the analysis frame shift (ms)
%
% ALPHA is the preemphasis coefficient
%
% WINDOW is a analysis window function handle
%
% R is the frequency range (Hz) for filterbank analysis
%
% M is the number of filterbank channels
%
% N is the number of cepstral coefficients
% (including the 0th coefficient)
%
% L is the liftering parameter
%
% Outputs
% CC is a matrix of mel frequency cepstral coefficients
% (MFCCs) with feature vectors as columns
%
% FBE is a matrix of filterbank energies
% with feature vectors as columns
%
% FRAMES is a matrix of windowed frames
% (one frame per column)
%
% References
%
% [1] Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.,
% Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D.,
% Valtchev, V., Woodland, P., 2006. The HTK Book (for HTK
% Version 3.4.1). Engineering Department, Cambridge University.
% (see also: http://htk.eng.cam.ac.uk)
%
% [2] Ellis, D., 2005. Reproducing the feature outputs of
% common programs using Matlab and melfcc.m. url:
% http://labrosa.ee.columbia.edu/matlab/rastamat/mfccs.html
%
% [3] Huang, X., Acero, A., Hon, H., 2001. Spoken Language
% Processing: A guide to theory, algorithm, and system
% development. Prentice Hall, Upper Saddle River, NJ,
% USA (pp. 314-315).
%
% See also EXAMPLE, COMPARE, FRAMES2VEC, TRIFBANK.
% Author: Kamil Wojcicki, September 2011
%% PRELIMINARIES
% Ensure correct number of inputs
if( nargin~= 10 ), help mfcc; return; end;
% Explode samples to the range of 16 bit shorts
if( max(abs(speech))<=1 ), speech = speech * 2^15; end;
Nw = round( 1E-3*Tw*fs ); % frame duration (samples)
Ns = round( 1E-3*Ts*fs ); % frame shift (samples)
nfft = 2^nextpow2( Nw ); % length of FFT analysis
K = nfft/2+1; % length of the unique part of the FFT
%% HANDY INLINE FUNCTION HANDLES
% Forward and backward mel frequency warping (see Eq. (5.13) on p.76 of [1])
% Note that base 10 is used in [1], while base e is used here and in HTK code
hz2mel = @( hz )( 1127*log(1+hz/700) ); % Hertz to mel warping function
mel2hz = @( mel )( 700*exp(mel/1127)-700 ); % mel to Hertz warping function
% Type III DCT matrix routine (see Eq. (5.14) on p.77 of [1])
dctm = @( N, M )( sqrt(2.0/M) * cos( repmat([0:N-1].',1,M) ...
.* repmat(pi*([1:M]-0.5)/M,N,1) ) );
% Cepstral lifter routine (see Eq. (5.12) on p.75 of [1])
ceplifter = @( N, L )( 1+0.5*L*sin(pi*[0:N-1]/L) );
%% FEATURE EXTRACTION
% Preemphasis filtering (see Eq. (5.1) on p.73 of [1]) 预加重
speech = filter( [1 -alpha], 1, speech ); % fvtool( [1 -alpha], 1 );
% Framing and windowing (frames as columns) 分帧加窗
frames = vec2frames( speech, Nw, Ns, 'cols', window, false );
% Magnitude spectrum computation (as column vectors) 取频谱幅度值
MAG = abs( fft(frames,nfft,1) );
% Triangular filterbank with uniformly spaced filters on mel scale 三角滤波器组滤波
H = trifbank( M, K, R, fs, hz2mel, mel2hz ); % size of H is M x K
% Filterbank application to unique part of the magnitude spectrum
FBE = H * MAG(1:K,:); % FBE( FBE<1.0 ) = 1.0; % apply mel floor
% DCT matrix computation
DCT = dctm( N, M );
% Conversion of logFBEs to cepstral coefficients through DCT
CC = DCT * log( FBE );
% Cepstral lifter computation
lifter = ceplifter( N, L );
% Cepstral liftering gives liftered cepstral coefficients
CC = diag( lifter ) * CC; % ~ HTK's MFCCs
% EOF
该代码及其中包含的函数均已上传到资源共享(见下方链接),下载即可用,(仅供学习使用)https://download.****.net/download/qq_36999901/12641846