HTML
Introduction To WebM File Structure
June 3, 2010
0

WebM is the new media-container format announced by Google during the Google I/O conference this year. For a broad overview of WebM, see this post: https://www.permadi.com/blog/2010/05/webm-overview/

WebM is based on MKV (Matroska) container format, which uses EBML structure to store informations, tagging and organizing data in hierachial tree-like structure.  EBML is a basically binary version (non human readable) of XML, while XML is text-based.  For more EBML info, see http://ebml.sourceforge.net/

All WebM files must follow a recommended set of tags to be considered a valid WebM file (for example, a WebM file without the Header element is not valid).  Another example: every WebM file must at least have two Level 0 elements: an EBML and a Segment element.  To best illustrate this, let’s dissect a .webm video file.  This file contains an audio and a video track, there is no subtitle track.  There are several software that dissects EBML files, two that I used are: MKVInfo (http://www.bunkus.org/videotools/mkvtoolnix/doc/mkvinfo.html) and EBML-Viewer (requires Java): http://code.google.com/p/ebml-viewer/.

Here’s an example of a WebM file, parsed using MKVViewer GUI:

You can see the two Level 0 elments there: EBML and Segment (yes, that is an element name).  The EBML element basically tells that the file is actually a valid EBML file, what version is the EBML (so that a parser won’t attempt to read EBML file that it does not support).

Let’s see the Segment element.

As you can see, the Segment element is a lot more complex, but there are four basic elements that a well-formed WebM file should contain:
Level 1 Element: Meta Seek Information (Seek Head element):

This element contains information (offset to the seek position) to other Level 1 elements.  The KaxCues sub-element is used for seeking/scrubbing support, so that a player can examine the cues (in WemB, all cues are keyframes) and jump to the nearest one.  A Seek Entry can point to another Seek Head, as you see in the example, denoted by KaxSeekHead.

Level 1 Element: Segment Information
This contains top level information about this segment, such as duration, MuxingApp and WritingApp (string identifiers of the applications that created the media).

Level 1 Element: Tracks Information
Contains track(s) definitions.  There are quite a lot of stuff here but they are self explanatory as shown in the example below.  There are two tracks in this example. Several things worth pointing out: – UID is a unique track identifier, no two tracks should have the same UID.
In a WebM file, the video track Codec ID must be V_VP8, and the audio track Codec ID must be A_VORBIS.


Following the above elements is usually the actual tracks data organized in Clusters.

Level 1 Element: Clusters

Expanding one of the Clusters, we see blocks of data.  The timecode must always be increasing.  Note that the blocks do not have to be SimpleBlock elements as WebM also support Block and BlockGroup elements.