ISO/IEC 14496-12:2020
Information technology — Coding of audio-visual objects — Part 12: ISO base media file format
This document specifies the ISO base media file format, which is a general format forming the basis for a number of other more specific file formats. This format contains the timing, structure, and media information for timed sequences of media data, such as audio-visual presentations.
Foreword
Introduction
1 Scope
2 Normative references
3 Terms, definitions and abbreviated terms
3.1 Terms and definitions
3.2 Abbreviated terms
4 Object-structured file organization
4.1 File structure
4.2 Object structure
4.2.1 Object syntax conventions
4.2.2 Object definitions
4.3 File-type box
4.3.1 Definition
4.3.2 Syntax
4.3.3 Semantics
4.4 Extended type box
4.4.1 Definition
4.4.2 Syntax
4.4.3 Semantics
5 Design considerations
5.1 Usage
5.1.1 Multi-purpose
5.1.2 Interchange
5.1.3 Content creation
5.1.4 Preparation for streaming
5.1.5 Local presentation
5.1.6 Streamed presentation
5.2 Design principles
6 ISO base media file organization
6.1 Presentation structure
6.1.1 Referencing external data
6.1.2 Object structure
6.1.3 Meta data and media data
6.1.4 Track identifiers
6.2 Metadata structure (objects)
6.2.1 Box
6.2.2 Data types and fields
6.2.3 Box order
6.2.4 URIs as type indicators
6.3 Brand identification
6.4 Time structure overview
7 Streaming support
7.1 Handling of streaming protocols
7.2 Protocol ‘hint’ tracks
7.3 Hint track format
8 Box structures
8.1 File structure and general boxes
8.1.1 Media data box
8.1.2 Free space box
8.1.3 Progressive download information box
8.1.4 Identified media data box
8.2 Movie structure
8.2.1 Movie box
8.2.2 Movie header box
8.3 Track structure
8.3.1 Track box
8.3.2 Track header box
8.3.3 Track reference box
8.3.4 Track group box
8.3.5 Track type box
8.4 Track media structure
8.4.1 Media box
8.4.2 Media header box
8.4.3 Handler reference box
8.4.4 Media information box
8.4.5 Media information header boxes
8.4.6 Extended language tag
8.5 Sample tables
8.5.1 Sample table box
8.5.2 Sample description box
8.5.3 Degradation priority box
8.5.4 Sample scale box
8.6 Track time structures
8.6.1 Time to sample boxes
8.6.2 Sync sample box
8.6.3 Shadow sync sample box
8.6.4 Independent and disposable samples box
8.6.5 Edit box
8.6.6 Edit list box
8.7 Track data layout structures
8.7.1 Data information box
8.7.2 Data reference box
8.7.3 Sample size boxes
8.7.4 Sample to chunk box
8.7.5 Chunk offset box
8.7.6 Padding bits box
8.7.7 Sub-sample information box
8.7.8 Sample auxiliary information sizes box
8.7.9 Sample auxiliary information offsets box
8.8 Movie fragments
8.8.1 Movie extends box
8.8.2 Movie extends header box
8.8.3 Track extends box
8.8.4 Movie fragment box
8.8.5 Movie fragment header box
8.8.6 Track fragment box
8.8.7 Track fragment header box
8.8.8 Track fragment run box
8.8.9 Movie fragment random access box
8.8.10 Track fragment random access box
8.8.11 Movie fragment random access offset box
8.8.12 Track fragment decode time box
8.8.13 Level assignment box
8.8.14 Sample auxiliary information in movie fragments
8.8.15 Track Extension Properties box
8.8.16 Alternative startup sequence properties box
8.8.17 Metadata and user data in movie fragments
8.9 Sample group structures
8.9.1 Overview
8.9.2 Sample to group box
8.9.3 Sample group description box
8.9.4 Representation of group structures in movie fragments
8.9.5 Compact sample to group box
8.10 User data
8.10.1 User data box
8.10.2 Copyright box
8.10.3 Track selection box
8.10.4 Track kind
8.11 Metadata support
8.11.1 MetaBox
8.11.2 XML boxes
8.11.3 Item location box
8.11.4 Primary item box
8.11.5 Item protection box
8.11.6 Item information box
8.11.7 Additional metadata container box
8.11.8 Metabox Relation box
8.11.9 URL forms for MetaBoxes
8.11.10 Static metadata
8.11.11 Item data box
8.11.12 Item reference box
8.11.13 Auxiliary video metadata
8.11.14 Item properties box
8.11.15 Brand item property
8.12 Support for protected streams
8.12.1 Overview
8.12.2 Protection scheme information box
8.12.3 Original format box
8.12.4 IPMPInfoBox
8.12.5 IPMP control box
8.12.6 Scheme type box
8.12.7 Scheme information box
8.12.8 Scramble Scheme Information Box
8.13 File delivery format support
8.13.1 Overview
8.13.2 FD item information box
8.13.3 File partition box
8.13.4 FEC reservoir box
8.13.5 FD session group box
8.13.6 Group ID to name box
8.13.7 File reservoir box
8.14 Sub tracks
8.14.1 Overview
8.14.2 Backward compatibility
8.14.3 Sub track box
8.14.4 Sub track information box
8.14.5 Sub track definition box
8.14.6 Sub track sample group box
8.15 Post-decoder requirements on media
8.15.1 General
8.15.2 Restricted sample entry transformation
8.15.3 Restricted scheme information box
8.15.4 Scheme for stereoscopic video arrangements
8.15.5 Compatible scheme type box
8.16 Segments
8.16.1 Overview
8.16.2 Segment type box
8.16.3 Segment index box
8.16.4 Subsegment index box
8.16.5 Producer reference time box
8.17 Support for incomplete tracks
8.17.1 General
8.17.2 Transformation
8.17.3 Complete track information box
8.18 Entity grouping
8.18.1 General
8.18.2 Groups list box
8.18.3 Entity to group box
8.19 Compressed boxes
8.19.1 Overview and processing
8.19.2 Processing model
8.19.3 General syntax
8.19.4 General semantics
8.19.5 Original file-type box
8.19.6 Compressed movie box
8.19.7 Compressed movie fragment box
8.19.8 Compressed segment index box
8.19.9 Compressed subsegment index box
9 Hint track formats
9.1 RTP and SRTP hint track format
9.1.1 Overview
9.1.2 Sample description format
9.1.3 Sample format
9.1.4 SDP information
9.1.5 Statistical information
9.2 ALC/LCT and FLUTE hint track format
9.2.1 Overview
9.2.2 Design principles
9.2.3 Sample description format
9.2.4 Sample format
9.3 MPEG-2 transport hint track format
9.3.1 Overview
9.3.2 Design principles
9.3.3 Sample description format
9.3.4 Sample format
9.3.5 Protected MPEG 2 transport stream hint track
9.4 RTP, RTCP, SRTP and SRTCP reception hint tracks
9.4.1 RTP reception hint track
9.4.2 RTCP reception hint track
9.4.3 SRTP reception hint track
9.4.4 SRTCP reception hint tracks
9.4.5 Protected RTP reception hint track
9.4.6 Recording procedure
9.4.7 Parsing procedure
10 Sample groups
10.1 Random access recovery points
10.1.1 Definition
10.1.2 Syntax
10.1.3 Semantics
10.2 Rate share groups
10.2.1 Overview
10.2.2 Rate share sample group entry
10.2.3 Relationship between tracks
10.2.4 Bitrate allocation
10.3 Alternative startup sequences
10.3.1 Definition
10.3.2 Syntax
10.3.3 Semantics
10.3.4 Examples
10.4 Random access point (RAP) sample group
10.4.1 Definition
10.4.2 Syntax
10.4.3 Semantics
10.5 Temporal level sample group
10.5.1 Definition
10.5.2 Syntax
10.5.3 Semantics
10.6 Stream access point sample group
10.6.1 Definition
10.6.2 Syntax
10.6.3 Semantics
10.7 Sample-to-item sample group
10.7.1 Definition
10.7.2 Syntax
10.7.3 Semantics
10.8 Dependent random access point (DRAP) sample group
10.8.1 Definition
10.8.2 Syntax
10.8.3 Semantics
11 Extensibility
11.1 Objects
11.2 Storage formats
11.3 Derived file formats
12 Media-specific definitions
12.1 Video media
12.1.1 Media handler
12.1.2 Video media header
12.1.3 Sample entry
12.1.4 Pixel aspect ratio and clean aperture
12.1.5 Colour information
12.1.6 Content light level
12.1.7 Mastering display colour volume
12.1.8 Content colour volume
12.2 Audio media
12.2.1 Media handler
12.2.2 Sound media header
12.2.3 Sample entry
12.2.4 Channel layout
12.2.5 Downmix instructions
12.2.6 DRC information
12.2.7 Audio stream loudness
12.3 Metadata media
12.3.1 Media handler
12.3.2 Media header
12.3.3 Sample entry
12.4 Hint media
12.4.1 Media handler
12.4.2 Hint media header
12.4.3 Sample entry
12.5 Text media
12.5.1 Media handler
12.5.2 Media header
12.5.3 Sample entry
12.6 Subtitle media
12.6.1 Media handler
12.6.2 Subtitle media header
12.6.3 Sample entry
12.7 Font media
12.7.1 Media handler
12.7.2 Media header
12.7.3 Sample entry
12.8 Transformed media
12.8.1 General
12.8.2 Multiple transformations for a single transformed media track
12.8.3 Determining the untransformed sample entry type
12.8.4 The 'codecs' MIME parameter for a transformed media track
12.9 Multiplexed timed metadata tracks
12.9.1 General
12.9.2 Overall design
12.9.3 Sample format
12.9.4 Sample entry format
12.9.5 Defined formats
Annex A (informative) Overview of the file format
Annex B (informative) Guidance on deriving from this document
Annex C (normative) Fragment identifiers for ISO base media resources
Annex D (informative) Management of extension code points
Annex E (normative) File format brands
Annex F (normative) MIME type registration of segments
Annex G (informative) URI-labelled metadata forms
Annex H (informative) Processing of RTP streams and reception hint tracks
Annex I (normative) Stream access points
Annex J (informative) Segment index examples
Annex K (normative) Use of IETF RFC 6381 for ISOBMFF files
Bibliography
Keep me up-to-date
Sign up to receive updates when there are changes to this standard
Related Information
Similar Standards
-
BS EN 61937-10:2011
Digital audio. Interface for non-linear PCM encoded audio bitstreams applying IEC 60958, Non-linear PCM bitstreams according to the MPEG-4 Audio Lossless Coding (ALS) format
-
BS EN 61937-5:2006
Digital audio. Interface for non-linear PCM encoded audio bitstreams applying IEC 60958, Non-linear PCM bitstreams according to the DTS (digital theatre systems) format(s)
-
BS EN 61937-6:2006+A1:2014
Digital audio. Interface for non-linear PCM encoded audio bitstreams applying IEC 60958, Non-linear PCM bitstreams according to the MPEG-2 AAC and MPEG-4 AAC formats
-
BS EN 61937-8:2007
Digital audio. Interface for non-linear PCM encoded audio bitstreams applying IEC 60958, Non-linear PCM bitstreams according to the Windows Media Audio (WMA) Professional format
Foreword
Introduction
1 Scope
2 Normative references
3 Terms, definitions and abbreviated terms
3.1 Terms and definitions
3.2 Abbreviated terms
4 Object-structured file organization
4.1 File structure
4.2 Object structure
4.2.1 Object syntax conventions
4.2.2 Object definitions
4.3 File-type box
4.3.1 Definition
4.3.2 Syntax
4.3.3 Semantics
4.4 Extended type box
4.4.1 Definition
4.4.2 Syntax
4.4.3 Semantics
5 Design considerations
5.1 Usage
5.1.1 Multi-purpose
5.1.2 Interchange
5.1.3 Content creation
5.1.4 Preparation for streaming
5.1.5 Local presentation
5.1.6 Streamed presentation
5.2 Design principles
6 ISO base media file organization
6.1 Presentation structure
6.1.1 Referencing external data
6.1.2 Object structure
6.1.3 Meta data and media data
6.1.4 Track identifiers
6.2 Metadata structure (objects)
6.2.1 Box
6.2.2 Data types and fields
6.2.3 Box order
6.2.4 URIs as type indicators
6.3 Brand identification
6.4 Time structure overview
7 Streaming support
7.1 Handling of streaming protocols
7.2 Protocol ‘hint’ tracks
7.3 Hint track format
8 Box structures
8.1 File structure and general boxes
8.1.1 Media data box
8.1.2 Free space box
8.1.3 Progressive download information box
8.1.4 Identified media data box
8.2 Movie structure
8.2.1 Movie box
8.2.2 Movie header box
8.3 Track structure
8.3.1 Track box
8.3.2 Track header box
8.3.3 Track reference box
8.3.4 Track group box
8.3.5 Track type box
8.4 Track media structure
8.4.1 Media box
8.4.2 Media header box
8.4.3 Handler reference box
8.4.4 Media information box
8.4.5 Media information header boxes
8.4.6 Extended language tag
8.5 Sample tables
8.5.1 Sample table box
8.5.2 Sample description box
8.5.3 Degradation priority box
8.5.4 Sample scale box
8.6 Track time structures
8.6.1 Time to sample boxes
8.6.2 Sync sample box
8.6.3 Shadow sync sample box
8.6.4 Independent and disposable samples box
8.6.5 Edit box
8.6.6 Edit list box
8.7 Track data layout structures
8.7.1 Data information box
8.7.2 Data reference box
8.7.3 Sample size boxes
8.7.4 Sample to chunk box
8.7.5 Chunk offset box
8.7.6 Padding bits box
8.7.7 Sub-sample information box
8.7.8 Sample auxiliary information sizes box
8.7.9 Sample auxiliary information offsets box
8.8 Movie fragments
8.8.1 Movie extends box
8.8.2 Movie extends header box
8.8.3 Track extends box
8.8.4 Movie fragment box
8.8.5 Movie fragment header box
8.8.6 Track fragment box
8.8.7 Track fragment header box
8.8.8 Track fragment run box
8.8.9 Movie fragment random access box
8.8.10 Track fragment random access box
8.8.11 Movie fragment random access offset box
8.8.12 Track fragment decode time box
8.8.13 Level assignment box
8.8.14 Sample auxiliary information in movie fragments
8.8.15 Track Extension Properties box
8.8.16 Alternative startup sequence properties box
8.8.17 Metadata and user data in movie fragments
8.9 Sample group structures
8.9.1 Overview
8.9.2 Sample to group box
8.9.3 Sample group description box
8.9.4 Representation of group structures in movie fragments
8.9.5 Compact sample to group box
8.10 User data
8.10.1 User data box
8.10.2 Copyright box
8.10.3 Track selection box
8.10.4 Track kind
8.11 Metadata support
8.11.1 MetaBox
8.11.2 XML boxes
8.11.3 Item location box
8.11.4 Primary item box
8.11.5 Item protection box
8.11.6 Item information box
8.11.7 Additional metadata container box
8.11.8 Metabox Relation box
8.11.9 URL forms for MetaBoxes
8.11.10 Static metadata
8.11.11 Item data box
8.11.12 Item reference box
8.11.13 Auxiliary video metadata
8.11.14 Item properties box
8.11.15 Brand item property
8.12 Support for protected streams
8.12.1 Overview
8.12.2 Protection scheme information box
8.12.3 Original format box
8.12.4 IPMPInfoBox
8.12.5 IPMP control box
8.12.6 Scheme type box
8.12.7 Scheme information box
8.12.8 Scramble Scheme Information Box
8.13 File delivery format support
8.13.1 Overview
8.13.2 FD item information box
8.13.3 File partition box
8.13.4 FEC reservoir box
8.13.5 FD session group box
8.13.6 Group ID to name box
8.13.7 File reservoir box
8.14 Sub tracks
8.14.1 Overview
8.14.2 Backward compatibility
8.14.3 Sub track box
8.14.4 Sub track information box
8.14.5 Sub track definition box
8.14.6 Sub track sample group box
8.15 Post-decoder requirements on media
8.15.1 General
8.15.2 Restricted sample entry transformation
8.15.3 Restricted scheme information box
8.15.4 Scheme for stereoscopic video arrangements
8.15.5 Compatible scheme type box
8.16 Segments
8.16.1 Overview
8.16.2 Segment type box
8.16.3 Segment index box
8.16.4 Subsegment index box
8.16.5 Producer reference time box
8.17 Support for incomplete tracks
8.17.1 General
8.17.2 Transformation
8.17.3 Complete track information box
8.18 Entity grouping
8.18.1 General
8.18.2 Groups list box
8.18.3 Entity to group box
8.19 Compressed boxes
8.19.1 Overview and processing
8.19.2 Processing model
8.19.3 General syntax
8.19.4 General semantics
8.19.5 Original file-type box
8.19.6 Compressed movie box
8.19.7 Compressed movie fragment box
8.19.8 Compressed segment index box
8.19.9 Compressed subsegment index box
9 Hint track formats
9.1 RTP and SRTP hint track format
9.1.1 Overview
9.1.2 Sample description format
9.1.3 Sample format
9.1.4 SDP information
9.1.5 Statistical information
9.2 ALC/LCT and FLUTE hint track format
9.2.1 Overview
9.2.2 Design principles
9.2.3 Sample description format
9.2.4 Sample format
9.3 MPEG-2 transport hint track format
9.3.1 Overview
9.3.2 Design principles
9.3.3 Sample description format
9.3.4 Sample format
9.3.5 Protected MPEG 2 transport stream hint track
9.4 RTP, RTCP, SRTP and SRTCP reception hint tracks
9.4.1 RTP reception hint track
9.4.2 RTCP reception hint track
9.4.3 SRTP reception hint track
9.4.4 SRTCP reception hint tracks
9.4.5 Protected RTP reception hint track
9.4.6 Recording procedure
9.4.7 Parsing procedure
10 Sample groups
10.1 Random access recovery points
10.1.1 Definition
10.1.2 Syntax
10.1.3 Semantics
10.2 Rate share groups
10.2.1 Overview
10.2.2 Rate share sample group entry
10.2.3 Relationship between tracks
10.2.4 Bitrate allocation
10.3 Alternative startup sequences
10.3.1 Definition
10.3.2 Syntax
10.3.3 Semantics
10.3.4 Examples
10.4 Random access point (RAP) sample group
10.4.1 Definition
10.4.2 Syntax
10.4.3 Semantics
10.5 Temporal level sample group
10.5.1 Definition
10.5.2 Syntax
10.5.3 Semantics
10.6 Stream access point sample group
10.6.1 Definition
10.6.2 Syntax
10.6.3 Semantics
10.7 Sample-to-item sample group
10.7.1 Definition
10.7.2 Syntax
10.7.3 Semantics
10.8 Dependent random access point (DRAP) sample group
10.8.1 Definition
10.8.2 Syntax
10.8.3 Semantics
11 Extensibility
11.1 Objects
11.2 Storage formats
11.3 Derived file formats
12 Media-specific definitions
12.1 Video media
12.1.1 Media handler
12.1.2 Video media header
12.1.3 Sample entry
12.1.4 Pixel aspect ratio and clean aperture
12.1.5 Colour information
12.1.6 Content light level
12.1.7 Mastering display colour volume
12.1.8 Content colour volume
12.2 Audio media
12.2.1 Media handler
12.2.2 Sound media header
12.2.3 Sample entry
12.2.4 Channel layout
12.2.5 Downmix instructions
12.2.6 DRC information
12.2.7 Audio stream loudness
12.3 Metadata media
12.3.1 Media handler
12.3.2 Media header
12.3.3 Sample entry
12.4 Hint media
12.4.1 Media handler
12.4.2 Hint media header
12.4.3 Sample entry
12.5 Text media
12.5.1 Media handler
12.5.2 Media header
12.5.3 Sample entry
12.6 Subtitle media
12.6.1 Media handler
12.6.2 Subtitle media header
12.6.3 Sample entry
12.7 Font media
12.7.1 Media handler
12.7.2 Media header
12.7.3 Sample entry
12.8 Transformed media
12.8.1 General
12.8.2 Multiple transformations for a single transformed media track
12.8.3 Determining the untransformed sample entry type
12.8.4 The 'codecs' MIME parameter for a transformed media track
12.9 Multiplexed timed metadata tracks
12.9.1 General
12.9.2 Overall design
12.9.3 Sample format
12.9.4 Sample entry format
12.9.5 Defined formats
Annex A (informative) Overview of the file format
Annex B (informative) Guidance on deriving from this document
Annex C (normative) Fragment identifiers for ISO base media resources
Annex D (informative) Management of extension code points
Annex E (normative) File format brands
Annex F (normative) MIME type registration of segments
Annex G (informative) URI-labelled metadata forms
Annex H (informative) Processing of RTP streams and reception hint tracks
Annex I (normative) Stream access points
Annex J (informative) Segment index examples
Annex K (normative) Use of IETF RFC 6381 for ISOBMFF files
Bibliography