[时间:2016-07] [状态:Open]
MKV是一种开源的多媒体封装格式,是Matroska中应用比较多的格式之一。常见的后缀格式是.mkv(视频,包括音频和字幕)、.mka(纯音频)、.mks(纯字幕)、.mk3d(3d视频,包括音频和字幕)。
0. 学习多媒体容器格式的目的
主要是为了回答以下问题:
- 该容器中数据是如何组织的?
- 该容器包含哪些编码格式的数据?这些数据是如何存储的?
- 该容器包含哪些元数据信息?包含哪些节目信息?
- 对于支持多节目的容器格式,如何找到对应的音频流、视频流、字幕流?
- 如何确定该容器的节目播放时长?
- 如何从该容器中提取音频、视频、字幕数据,并交给解码器解码,有时间戳否?
- 该容器是否支持seek?有哪些辅助信息?
- 是否支持直接流化?
- 哪里可以找到该容器格式最标准的文档资料?
- 有哪些可用的工具,方便分析容器格式异常或者错误?
1. MKV文件总体结构
MKV是基于EBML(Extensible Binary Meta Language)基础上的,EBML是参考XML实现的用于存储二进制数据的格式。所以在说明MKV之前,先简单了解下EBML。
EBML
更具体的标准在这里EBML specifications。
既然是基于XML的,很明显的具有很多的嵌套存在,比较多的是下面这种:
<root>
<header vaue="123"/>
</root>
那么EBML是如何构成的呢?
构成EBML最基础的是EBML Element,通过多个EBML Element构成一个Document。EBML Element定义如下:
typedef struct {
vint ID; // EBML-ID
vint size; // size of element
char data[size]; // data
} EBML_ELEMENT;
这里的数据可以包括二进制数据,也可以包括其他EBML Element。
vint(Unsigned Integer Values of Variable Length)是可变长度无符号整型,比传统32/64位整型更加节省空间。vint有三个部分构成: VINT_WIDTH,VINT_MARKER,VINT_DATA。VINT_MAKRER指的是二进制数据中第一个1的位置;VINT_WIDTH指的是在VINT_MARKER之前的0的个数(可以是0),VINT_WIDTH+1表示对应的vint占用的字节数目。比如比较经典的mkv文件开头的字节:
42 82 88 6d 61 74 72 6f 73 6b 61
这个字段是一个完整的DocType Element,0x282是EBML-ID,8是Elemet-size,后面8个字符就是"matroska"。
0x42写成二进制就是0100 0010
,那么ID的vint的字节数是0+1=2byte,所以id就是0x282;接下来size的vint解析下,10001000
,字节数目是0+1=1字节,值为8,解析完成,读出来后面的string就可以了。
MKV整体概述
从总体结构来看MKV跟AVI、ASF、MP4文件格式类似,主要包括下面几个部分:
Header |
Meta Seek Information |
Segment Information |
Track |
Chapters |
Clusters |
Cueing Data |
Attachment |
Tagging |
注意这里仅仅是简化之后的文件结构示例,mkv各部分如何存储并不是直接按照上面结构来的,需要参考标准解析。以下是各部分简要功能介绍:
- Header部分包含EBML版本信息以及EBML的类型(表明是Matroska文件)。
- Metaseek section Info部分包含用于定位文件其他部分(例如Track Info、Chapters、Tags、Cues、Attachments等)的索引信息。这一部分不是必须的,如果不存在的话可以通过扫描整个文件的其他字段获取。
- Segment Info部分包含整个文件相关的基本信息,例如title信息,并包含唯一的ID,如果是连续多个相关文件,还会包含下一个文件的ID。
- Track部分包含track相关的信息,比如音频、视频、字幕,视频分辨率、音频采样率、编码方式等信息。
- Chapters部分给出所有Chapters。其中每个Chapters是臃肿预设音视频播放点的方式。
- Clusters部分主要包含每个track的音频帧和视频帧。
- Cueing Data部分包含所有的Cue信息。Cues是每个track的索引信息,跟MetaSeek Info类似,但主要用于播放时seek到特定时间。
- Attachment部分主要是用于支持在MKV文件中附加任何类型的文件,包括图片、网页、程序等。
- Tagging部分包含跟文件和各个track相关的Tag。这些Tag跟MP3文件中的ID3 tag类似,包含主入writer、singer、ctor等信息。
上面的结构是一个整体的概览。实际上EBML是分Level的。标准规定,位于Level n的元素只能包含Level n+1的元素。Matroska最顶层的是Level 0的元素,主要有两个:EBML Header和Segment。
2. EBML Header
EBML Header位于MKV文件开头,是level 0元素之一,主要包含两个level 1的元素。
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EBML Header | |||||||||||||
EBML | 0 | [1A][45][DF][A3] | * | * | - | - | m | * | * | * | * | * | Set the EBML characteristics of the data to follow. Each EBML document has to start with this. |
EBMLVersion | 1 | [42][86] | * | - | - | 1 | u | * | * | * | * | * | The version of EBML parser used to create the file. |
EBMLReadVersion | 1 | [42][F7] | * | - | - | 1 | u | * | * | * | * | * | The minimum EBML version a parser has to support to read this file. |
EBMLMaxIDLength | 1 | [42][F2] | * | - | - | 4 | u | * | * | * | * | * | The maximum length of the IDs you\'ll find in this file (4 or less in Matroska). |
EBMLMaxSizeLength | 1 | [42][F3] | * | - | - | 8 | u | * | * | * | * | * | The maximum length of the sizes you\'ll find in this file (8 or less in Matroska). This does not override the element size indicated at the beginning of an element. Elements that have an indicated size which is larger than what is allowed by EBMLMaxSizeLength shall be considered invalid. |
DocType | 1 | [42][82] | * | - | - | matroska | s | * | * | * | * | * | A string that describes the type of document that follows this EBML header. \'matroska\' in our case or \'webm\' for webm files. |
DocTypeVersion | 1 | [42][87] | * | - | - | 1 | u | * | * | * | * | * | The version of DocType interpreter used to create the file. |
DocTypeReadVersion | 1 | [42][85] | * | - | - | 1 | u | * | * | * | * | * | The minimum DocType version an interpreter has to support to read this file. |
这里首先介绍下上表中各列表框的意义:
- Element Name:给出所描述元素的名称。
- L:EBML中元素出现的Level。
+
表示可以递归包含,g
表示全局元素,可以位于任意level。 - EBML ID:ID的字节码。
- Ma:强制出现的标志,如果表格上是
*
表示强制标志,标准中缩写为»mand.«。 - Mu:多重性标志,如果表格上是
*
表示该元素可以出现多次,标准中缩写为 »mult.«。 - Rng:所存储元素的有效范围,通常针对整型或浮点型数据类型。
- Default:默认元素对应的负载的值。
- T:元素包含的数据类型。其中具体取值含义如下,m: Master(可变长度,可以包含一个或多个其他类型元素), u: unsigned int, i: signed integer, s: string, 8: UTF-8 string, b: binary, f: float, d: date。
- 1:表示该元素包含在Matroska version 1中。
- 2:表示该元素包含在Matroska version 2中。
- 3:表示该元素包含在Matroska version 3中。
- 4:表示该元素包含在Matroska version 4中。
- W:表示该元素在WebM中使用。
- Description:简要描述元素的功能。
通常EBML Element的ID是vint的,可以直接通过固定字段即可确定,比如上面的0x1A45DFA3
。通过EBML Header的ID可以用于唯一的识别MKV文件。
在解析EBML Header的时候需要通过DocType判断实际封装格式,常规的mkv文件该字段必须是"mastroka"。
3. Segment
除了EBML Header,MKV中其它部分都是Segment,其中包含了音视频数据和音视频信息。segment的id是[18][53][80][67]
,位于Level 0,并且包含了所有位于Level 1的元素。接下来依次说明Level 1的元素。
Meta Seek Info
Meta Seek Info是一个快速索引的信息,不是必须有的,但存在的话通常只有一个,如果不存在需要顺序扫描文件重建这些信息。Meta Seek Info包含一个SeekHead及多个seek entry,每个seek entry包含一个seek point,每个seek point包含seekID和SeekPostion两个元素。标准中定义格式如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SeekHead | 1 | [11][4D][9B][74] | - | * | - | - | m | * | * | * | * | * | Contains the position of other Top-Level Elements. |
Seek | 2 | [4D][BB] | * | * | - | - | m | * | * | * | * | * | Contains a single seek entry to an EBML Element. |
SeekID | 3 | [53][AB] | * | - | - | - | b | * | * | * | * | * | The binary ID corresponding to the Element name. |
SeekPosition | 3 | [53][AC] | * | - | - | - | u | * | * | * | * | * | The position of the Element in the Segment in octets (0 = first level 1 Element). |
注意这里的SeekID包含了level 1元素的ID,位置是相对Segment起始位置的偏移。比如下面数据解析出来的,第一个seek entry数据如下:
11 4d 9b 74 bb 4d bb 8b 53 ab 84 15 49 a9 66 53 ac 81 40
解析之后seekID=0x1549a966
,seekPostion=0x40
。查询Matroska标准知道这个ID是Segment Info的ID,偏移量加上Segment起始位置正好是Segment Info段的存储位置。
Segment Info
Segment Info部分包含了用于识别文件的信息(SegmentUID),也包括duration字段。标准中定义结构如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Info | 1 | [15][49][A9][66] | * | * | - | - | m | * | * | * | * | * | Contains miscellaneous general information and statistics on the file. |
SegmentUID | 2 | [73][A4] | - | - | not 0 | - | b | * | * | * | * | A randomly generated unique ID to identify the current segment between many others (128 bits). | |
SegmentFilename | 2 | [73][84] | - | - | - | - | 8 | * | * | * | * | A filename corresponding to this segment. | |
PrevUID | 2 | [3C][B9][23] | - | - | - | - | b | * | * | * | * | A unique ID to identify the previous chained segment (128 bits). | |
PrevFilename | 2 | [3C][83][AB] | - | - | - | - | 8 | * | * | * | * | An escaped filename corresponding to the previous segment. | |
NextUID | 2 | [3E][B9][23] | - | - | - | - | b | * | * | * | * | A unique ID to identify the next chained segment (128 bits). | |
NextFilename | 2 | [3E][83][BB] | - | - | - | - | 8 | * | * | * | * | An escaped filename corresponding to the next segment. | |
SegmentFamily | 2 | [44][44] | - | * | - | - | b | * | * | * | * | A randomly generated unique ID that all segments related to each other must use (128 bits). | |
ChapterTranslate | 2 | [69][24] | - | * | - | - | m | * | * | * | * | A tuple of corresponding ID used by chapter codecs to represent this segment. | |
ChapterTranslateEditionUID | 3 | [69][FC] | - | * | - | - | u | * | * | * | * | Specify an edition UID on which this correspondance applies. When not specified, it means for all editions found in the segment. | |
ChapterTranslateCodec | 3 | [69][BF] | * | - | - | - | u | * | * | * | * | The chapter codec using this ID (0: Matroska Script, 1: DVD-menu). | |
ChapterTranslateID | 3 | [69][A5] | * | - | - | - | b | * | * | * | * | The binary value used to represent this segment in the chapter codec data. The format depends on theChapProcessCodecID used. | |
TimecodeScale | 2 | [2A][D7][B1] | * | - | - | 1000000 | u | * | * | * | * | * | Timestamp scale in nanoseconds (1.000.000 means all timestamps in the Segment are expressed in milliseconds). |
Duration | 2 | [44][89] | - | - | > 0 | - | f | * | * | * | * | * | Duration of the segment (based on TimecodeScale). |
DateUTC | 2 | [44][61] | - | - | - | - | d | * | * | * | * | * | Date of the origin of timecode (value 0), i.e. production date. |
Title | 2 | [7B][A9] | - | - | - | - | 8 | * | * | * | * | General name of the segment. | |
MuxingApp | 2 | [4D][80] | * | - | - | - | 8 | * | * | * | * | * | Muxing application or library ("libmatroska-0.4.3"). |
WritingApp | 2 | [57][41] | * | - | - | - | 8 | * | * | * | * | * | Writing application ("mkvmerge-0.3.3"). |
Track
Track包含了音视频的基本信息,如音视频解码器类型、视频分辨率、音频采样率等这。通过对Track部分的解析。我们就能得到音视频的基本信息。为选择相应解码器以及初始化这些解码器做好准备工作。Track中包含至少一个TrackEntry,每个TrackEntry代表着1条轨道信息;TrackEntry包含Name、TrackNumber、TrackType等信息。标准定义的相关字段如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tracks | 1 | [16][54][AE][6B] | - | * | - | - | m | * | * | * | * | * | A Top-Level Element of information with many tracks described. |
TrackEntry | 2 | [AE] | * | * | - | - | m | * | * | * | * | * | Describes a track with all Elements. |
TrackNumber | 3 | [D7] | * | - | not 0 | - | u | * | * | * | * | * | The track number as used in the Block Header (using more than 127 tracks is not encouraged, though the design allows an unlimited number). |
TrackUID | 3 | [73][C5] | * | - | not 0 | - | u | * | * | * | * | * | A unique ID to identify the Track. This should be kept the same when making a direct stream copy of the Track to another file. |
TrackType | 3 | [83] | * | - | 1-254 | - | u | * | * | * | * | * | A set of track types coded on 8 bits (1: video, 2: audio, 3: complex, 0x10: logo, 0x11: subtitle, 0x12: buttons, 0x20: control). |
FlagEnabled | 3 | [B9] | * | - | 0-1 | 1 | u | * | * | * | * | Set if the track is usable. (1 bit) | |
FlagDefault | 3 | [88] | * | - | 0-1 | 1 | u | * | * | * | * | * | Set if that track (audio, video or subs) SHOULD be active if no language found matches the user preference. (1 bit) |
FlagForced | 3 | [55][AA] | * | - | 0-1 | 0 | u | * | * | * | * | * | Set if that track MUST be active during playback. There can be many forced track for a kind (audio, video or subs), the player should select the one which language matches the user preference or the default + forced track. Overlay MAY happen between a forced and non-forced track of the same kind. (1 bit) |
FlagLacing | 3 | [9C] | * | - | 0-1 | 1 | u | * | * | * | * | * | Set if the track may contain blocks using lacing. (1 bit) |
MinCache | 3 | [6D][E7] | * | - | - | 0 | u | * | * | * | * | The minimum number of frames a player should be able to cache during playback. If set to 0, the reference pseudo-cache system is not used. | |
MaxCache | 3 | [6D][F8] | - | - | - | - | u | * | * | * | * | The maximum cache size required to store referenced frames in and the current frame. 0 means no cache is needed. | |
DefaultDuration | 3 | [23][E3][83] | - | - | not 0 | - | u | * | * | * | * | * | Number of nanoseconds (not scaled via TimecodeScale) per frame (\'frame\' in the Matroska sense -- one Element put into a (Simple)Block). |
DefaultDecodedFieldDuration | 3 | [23][4E][7A] | - | - | not 0 | - | u | * | The period in nanoseconds (not scaled by TimcodeScale) between two successive fields at the output of the decoding process (see the notes) | ||||
TrackTimecodeScale | 3 | [23][31][4F] | * | - | > 0 | 1.0 | f | * | * | * | DEPRECATED, DO NOT USE. The scale to apply on this track to work at normal speed in relation with other tracks (mostly used to adjust video speed when the audio length differs). | ||
TrackOffset | 3 | [53][7F] | - | - | - | 0 | i | A value to add to the Block\'s Timestamp. This can be used to adjust the playback offset of a track. | |||||
MaxBlockAdditionID | 3 | [55][EE] | * | - | - | 0 | u | * | * | * | * | The maximum value of BlockAddID. A value 0 means there is no BlockAdditions for this track. | |
Name | 3 | [53][6E] | - | - | - | - | 8 | * | * | * | * | * | A human-readable track name. |
Language | 3 | [22][B5][9C] | - | - | - | eng | s | * | * | * | * | * | Specifies the language of the track in the Matroska languages form. |
CodecID | 3 | [86] | * | - | - | - | s | * | * | * | * | * | An ID corresponding to the codec, see the codec page for more info. |
CodecPrivate | 3 | [63][A2] | - | - | - | - | b | * | * | * | * | * | Private data only known to the codec. |
CodecName | 3 | [25][86][88] | - | - | - | - | 8 | * | * | * | * | * | A human-readable string specifying the codec. |
AttachmentLink | 3 | [74][46] | - | - | not 0 | - | u | * | * | * | * | The UID of an attachment that is used by this codec. | |
CodecDecodeAll | 3 | [AA] | * | - | 0-1 | 1 | u | * | * | * | The codec can decode potentially damaged data (1 bit). | ||
TrackOverlay | 3 | [6F][AB] | - | * | - | - | u | * | * | * | * | Specify that this track is an overlay track for the Track specified (in the u-integer). That means when this track has a gap (see SilentTracks) the overlay track should be used instead. The order of multiple TrackOverlay matters, the first one is the one that should be used. If not found it should be the second, etc. | |
CodecDelay | 3 | [56][AA] | - | - | - | 0 | u | * | |||||
SeekPreRoll | 3 | [56][BB] | * | - | - | 0 | u | * | |||||
TrackTranslate | 3 | [66][24] | - | * | - | - | m | * | * | * | * | The track identification for the given Chapter Codec. | |
TrackTranslateEditionUID | 4 | [66][FC] | - | * | - | - | u | * | * | * | * | Specify an edition UID on which this translation applies. When not specified, it means for all editions found in the Segment. | |
TrackTranslateCodec | 4 | [66][BF] | * | - | - | - | u | * | * | * | * | The chapter codec using this ID (0: Matroska Script, 1: DVD-menu). | |
TrackTranslateTrackID | 4 | [66][A5] | * | - | - | - | b | * | * | * | * | The binary value used to represent this track in the chapter codec data. The format depends on the ChapProcessCodecID used. | |
Video | 3 | [E0] | - | - | - | - | m | * | * | * | * | * | Video settings. |
FlagInterlaced | 4 | [9A] | * | - | 0-2 | 0 | u | * | * | * | * | A flag to declare is the video is known to be progressive or interlaced and if applicable to declare details about the interlacement. (0: undetermined, 1: interlaced, 2: progressive) | |
FieldOrder | 4 | [9D] | * | - | 0-14 | 2 | u | * | Declare the field ordering of the video. If FlagInterlaced is not set to 1, this Element MUST be ignored. (0: Progressive, 1: Interlaced with top field display first and top field stored first, 2: Undetermined field order, 6: Interlaced with bottom field displayed first and bottom field stored first, 9: Interlaced with bottom field displayed first and top field stored first, 14: Interlaced with top field displayed first and bottom field stored first) | ||||
StereoMode | 4 | [53][B8] | - | - | - | 0 | u | * | * | * | Stereo-3D video mode (0: mono, 1: side by side (left eye is first), 2: top-bottom (right eye is first), 3: top-bottom (left eye is first), 4: checkboard (right is first), 5: checkboard (left is first), 6: row interleaved (right is first), 7: row interleaved (left is first), 8: column interleaved (right is first), 9: column interleaved (left is first), 10: anaglyph (cyan/red), 11: side by side (right eye is first), 12: anaglyph (green/magenta), 13 both eyes laced in one Block (left eye is first), 14 both eyes laced in one Block (right eye is first)) . There are some more details on 3D support in the Specification Notes. | ||
AlphaMode | 4 | [53][C0] | - | - | - | 0 | u | * | * | * | Alpha Video Mode. Presence of this Element indicates that the BlockAdditional Element could contain Alpha data. | ||
PixelWidth | 4 | [B0] | * | - | not 0 | - | u | * | * | * | * | * | Width of the encoded video frames in pixels. |
PixelHeight | 4 | [BA] | * | - | not 0 | - | u | * | * | * | * | * | Height of the encoded video frames in pixels. |
PixelCropBottom | 4 | [54][AA] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove at the bottom of the image (for HDTV content). |
PixelCropTop | 4 | [54][BB] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove at the top of the image. |
PixelCropLeft | 4 | [54][CC] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove on the left of the image. |
PixelCropRight | 4 | [54][DD] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove on the right of the image. |
DisplayWidth | 4 | [54][B0] | - | - | not 0 | PixelWidth - PixelCropLeft - Pi | u | * | * | * | * | * | Width of the video frames to display. Applies to the video frame after cropping (PixelCrop* Elements). The default value is only valid when DisplayUnit is 0. |
DisplayHeight | 4 | [54][BA] | - | - | not 0 | PixelHeight - PixelCropTop - Pi | u | * | * | * | * | * | Height of the video frames to display. Applies to the video frame after cropping (PixelCrop* Elements). The default value is only valid when DisplayUnit is 0. |
DisplayUnit | 4 | [54][B2] | - | - | - | 0 | u | * | * | * | * | * | How DisplayWidth & DisplayHeight should be interpreted (0: pixels, 1: centimeters, 2: inches, 3: Display Aspect Ratio). |
AspectRatioType | 4 | [54][B3] | - | - | - | 0 | u | * | * | * | * | * | Specify the possible modifications to the aspect ratio (0: free resizing, 1: keep aspect ratio, 2: fixed). |
ColourSpace | 4 | [2E][B5][24] | - | - | - | - | b | * | * | * | * | Same value as in AVI (32 bits). | |
Audio | 3 | [E1] | - | - | - | - | m | * | * | * | * | * | Audio settings. |
SamplingFrequency | 4 | [B5] | *- | > 0 | 8000.0 | f | * | * | * | * | * | Sampling frequency in Hz. | |
OutputSamplingFrequency | 4 | [78][B5] | - | - | > 0 | SamplingFrequency | f | * | * | * | * | * | Real output sampling frequency in Hz (used for SBR techniques). |
Channels | 4 | [9F] | * | - | not 0 | 1 | u | * | * | * | * | * | Numbers of channels in the track. |
BitDepth | 4 | [62][64] | - | - | not 0 | - | u | * | * | * | * | * | Bits per sample, mostly used for PCM. |
ContentEncodings | 3 | [6D][80] | - | - | - | - | m | * | * | * | * | Settings for several content encoding mechanisms like compression or encryption. | |
ContentEncoding | 4 | [62][40] | * | * | - | - | m | * | * | * | * | Settings for one content encoding like compression or encryption. | |
ContentEncodingOrder | 5 | [50][31] | * | - | - | 0 | u | * | * | * | * | Tells when this modification was used during encoding/muxing starting with 0 and counting upwards. The decoder/demuxer has to start with the highest order number it finds and work its way down. This value has to be unique over all ContentEncodingOrder Elements in the Segment. | |
ContentEncodingScope | 5 | [50][32] | * | - | not 0 | 1 | u | * | * | * | * | A bit field that describes which Elements have been modified in this way. Values (big endian) can be OR\'ed. Possible values:1 - all frame contents, 2 - the track\'s private data, 4 - the next ContentEncoding (next ContentEncodingOrder. Either the data inside ContentCompression and/or ContentEncryption) | |
ContentEncodingType | 5 | [50][33] | * | - | - | 0 | u | * | * | * | * | A value describing what kind of transformation has been done. Possible values: 0 - compression, 1 - encryption | |
ContentCompression | 5 | [50][34] | - | - | - | - | m | * | * | * | * | Settings describing the compression used. Must be present if the value of ContentEncodingType is 0 and absent otherwise. Each block must be decompressable even if no previous block is available in order not to prevent seeking. | |
ContentEncryption | 5 | [50][35] | - | - | - | - | m | * | * | * | * | Settings describing the encryption used. Must be present if the value of ContentEncodingType is 1 and absent otherwise. |
Chapters
Chapter的功能有点类似给媒体文件添加章节目录信息,比如片头、片尾、铺垫等。如果你对这部分感兴趣建议参考Chapter Specifications。Chapter的ID是[10][43][A7][70]
。
Clusters
Clusters部分中包含了所有的音视频数据,是由多个Cluster构成的。
每个Cluster中可能包含多个BlockGroup,每个BlockGroup由多个Block(ReferenceBlock)构成,音视频数据可以交织存储在Block中,但是每个Block存储的数据必须是音频、视频、字幕的一种。
这一部分标准中定义如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cluster | 1 | [1F][43][B6][75] | - | * | - | - | m | * | * | * | * | * | The Top-Level Element containing the (monolithic) Block structure. |
Timecode | 2 | [E7] | * | - | - | - | u | * | * | * | * | * | Absolute timestamp of the cluster (based on TimecodeScale). |
SilentTracks | 2 | [58][54] | - | - | - | - | m | * | * | * | * | The list of tracks that are not used in that part of the stream. It is useful when using overlay tracks on seeking. Then you should decide what track to use. | |
SilentTrackNumber | 3 | [58][D7] | - | * | - | - | u | * | * | * | * | One of the track number that are not used from now on in the stream. It could change later if not specified as silent in a further Cluster. | |
Position | 2 | [A7] | - | - | - | - | u | * | * | * | * | The Position of the Cluster in the Segment (0 in live broadcast streams). It might help to resynchronise offset on damaged streams. | |
PrevSize | 2 | [AB] | - | - | - | - | u | * | * | * | * | * | Size of the previous Cluster, in octets. Can be useful for backward playing. |
SimpleBlock | 2 | [A3] | - | * | - | - | b | * | * | * | * | Similar to Block but without all the extra information, mostly used to reduced overhead when no extra feature is needed. (see SimpleBlock Structure) | |
BlockGroup | 2 | [A0] | - | * | - | - | m | * | * | * | * | * | Basic container of information containing a single Block and information specific to that Block. |
Block | 3 | [A1] | * | - | - | - | b | * | * | * | * | * | Block containing the actual data to be rendered and a timestamp relative to the Cluster Timecode. (see Block Structure) |
BlockAdditions | 3 | [75][A1] | - | - | - | - | m | * | * | * | * | Contain additional blocks to complete the main one. An EBML parser that has no knowledge of the Block structure could still see and use/skip these data. | |
BlockMore | 4 | [A6] | * | * | - | - | m | * | * | * | * | Contain the BlockAdditional and some parameters. | |
BlockAddID | 5 | [EE] | * | - | not 0 | 1 | u | * | * | * | * | An ID to identify the BlockAdditional level. | |
BlockAdditional | 5 | [A5] | * | - | - | - | b | * | * | * | * | Interpreted by the codec as it wishes (using the BlockAddID). | |
BlockDuration | 3 | [9B] | - | - | - | DefaultDuration | u | * | * | * | * | * | The duration of the Block (based on TimecodeScale). This Element is mandatory when DefaultDuration is set for the track (but can be omitted as other default values). When not written and with no DefaultDuration, the value is assumed to be the difference between the timestamp of this Block and the timestamp of the next Block in "display" order (not coding order). This Element can be useful at the end of a Track (as there is not other Block available), or when there is a break in a track like for subtitle tracks. When set to 0 that means the frame is not a keyframe. |
ReferencePriority | 3 | [FA] | * | - | - | 0 | u | * | * | * | * | This frame is referenced and has the specified cache priority. In cache only a frame of the same or higher priority can replace this frame. A value of 0 means the frame is not referenced. | |
ReferenceBlock | 3 | [FB] | - | * | - | - | i | * | * | * | * | * | Timestamp of another frame used as a reference (ie: B or P frame). The timestamp is relative to the block it\'s attached to. |
Slices | 3 | [8E] | - | - | - | - | m | * | * | * | * | * | Contains slices description. |
TimeSlice | 4 | [E8] | - | * | - | - | m | * | * | * | * | * | Contains extra time information about the data contained in the Block. While there are a few files in the wild with this Element, it is no longer in use and has been deprecated. Being able to interpret this Element is not required for playback. |
LaceNumber | 5 | [CC] | - | - | - | 0 | u | * | * | * | * | * | The reverse number of the frame in the lace (0 is the last frame, 1 is the next to last, etc). While there are a few files in the wild with this Element, it is no longer in use and has been deprecated. Being able to interpret this Element is not required for playback. |
详细数据解析建议参考标准文档或者mkv分析工具对比。
Cueing Data
Cueing Data这部分内容其实是关键帧的索引表,如果没有关键帧的索引表的话,在做seek、快进快退的时候是十分困难的。需要逐个包去找。之前说过flv文件中官方对关键帧的索引表的规定。但是在民间已经做了补充。mkv官方有对索引表的规范。那就是Cueing Data。标准中对其定义如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cues | 1 | [1C][53][BB][6B] | - | - | - | - | m | * | * | * | * | * | A Top-Level Element to speed seeking access. All entries are local to the Segment. Should be mandatory for non "live" streams. |
CuePoint | 2 | [BB] | * | * | - | - | m | * | * | * | * | * | Contains all information relative to a seek point in the Segment. |
CueTime | 3 | [B3] | * | - | - | - | u | * | * | * | * | * | Absolute timestamp according to the Segment time base. |
CueTrackPositions | 3 | [B7] | * | * | - | - | m | * | * | * | * | * | Contain positions for different tracks corresponding to the timestamp. |
CueTrack | 4 | [F7] | * | - | not 0 | - | u | * | * | * | * | * | The track for which a position is given. |
CueClusterPosition | 4 | [F1] | * | - | - | - | u | * | * | * | * | * | The position of the Cluster containing the required Block. |
CueRelativePosition | 4 | [F0] | - | - | - | - | u | * | The relative position of the referenced block inside the cluster with 0 being the first possible position for an Element inside that cluster. | ||||
CueDuration | 4 | [B2] | - | - | - | - | u | * | The duration of the block according to the Segment time base. If missing the track\'s DefaultDuration does not apply and no duration information is available in terms of the cues. | ||||
CueBlockNumber | 4 | [53][78] | - | - | not 0 | 1 | u | * | * | * | * | * | Number of the Block in the specified Cluster. |
CueCodecState | 4 | [EA] | - | - | - | 0 | u | * | * | * | The position of the Codec State corresponding to this Cue Element. 0 means that the data is taken from the initial Track Entry. | ||
CueReference | 4 | [DB] | - | * | - | - | m | * | * | * | The Clusters containing the required referenced Blocks. | ||
CueRefTime | 5 | [96] | * | - | - | - | u | * | * | * | Timestamp of the referenced Block. |
至于最后两个部分:Attachment、Tagging,建议参考标准中介绍的内容。这里面包含metadata相关的很多信息,也可以自定诸多其他自定义信息。
4. 关于其他问题的概述
-
对于支持多节目的容器格式,如何找到对应的音频流、视频流、字幕流?
在MKV文件的Track部分,包含的每个TrackEntry都是一个独立的音频流、视频流或字幕流。通过这个可以知道当前容器中的多媒体格式。 -
如何确定该容器的节目播放时长?
Segment Info部分中有个duration字段,可以通过这个直接读取节目时长。 -
MKV容器是否支持seek?有哪些辅助信息?
很明显MKV的索引表保存在Cues部分,可以通过这里面提供的关键帧索引表实现快速seek。 -
哪里可以找到该容器格式最标准的文档资料?
Matroska是开源的,可以通过https://www.matroska.org/直接访问。也可以参考其提供的标准文档。 -
有哪些可用的工具,方便分析容器格式异常或者错误?
比较常用的工具是mkvtoolnix,其他工具在Matroska-download中介绍很多,可以按照实际需求选择。
5. 总结和参考资料
MKV是一个相对复杂的容器格式,但是在理解了基本原则基础上阅读标准文档,整体还是非常清晰的,顺序解析就可以完成。
主要参考如下: