Author: Slavik Lozben (Veovera Software Organization)(VSO)
Contributors: Adobe, Google, Twitch, Jean-Baptiste Kempf (FFmpeg, VideoLAN), pkv (OBS), Dennis Sädtler (OBS), Xavier Hallade (Intel Corporation), Luxoft, SplitmediaLabs Limited (XSplit), Craig Barberich (VSO), Michael Thornburgh
Status: v2-2024-10-22-b1
This section outlines our standardized approach for versioning our specification documentation. Effective versioning ensures consistency, enables users to identify the latest version easily, and facilitates collaboration among team members.
We name the documentation files with a clear identifier and the major version number.
Example:
enhanced-rtmp-v2.pdf
We include a dedicated section or metadata within each document to specify the version details which includes the major version number, date, and stage of development (alpha/beta/release).
Example:
Status: v2-2024-02-26-a1
The format for versioning documents is structured as follows:
This format provides a comprehensive overview of each version’s status and chronological order, facilitating effective tracking and management of the E-RTMP specification development.
We define distinct stages for the development of the E-RTMP protocol specification to indicate its maturity and readiness for implementation. Each stage serves a different purpose for those implementing the protocol:
This document details an alpha version of the enhanced Real-Time Messaging Protocol (a.k.a., E-RTMP) specification, version “*”. As we continue to refine and enhance the protocol, we remain open to implementing necessary updates based on user feedback and further testing. While there is a possibility of introducing breaking changes during the alpha stage, we are committed to maintaining the integrity of the General Availability (GA) versions and strive to ensure they remain free from breaking changes.
We encourage developers, implementers, and stakeholders to actively participate in this development phase. Your feedback, whether it be bug reports, feature suggestions, or usability improvements, is invaluable and can be submitted via new issues in our GitHub repository at <https://github.com/veovera/enhanced-rtmp>. We are committed to transparently communicating updates and changes, ensuring that all stakeholders are informed and involved.
Engaging with the alpha version provides a unique opportunity to influence the final specifications of E-RTMP version “*”. We value your input and look forward to collaborating with you on this exciting journey towards developing a more robust and efficient protocol.
Copyright 2022-2024 Veovera Software Organization
Licensed under the Apache License, Version 2.0 (the “License”);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
<https://www.apache.org/licenses/LICENSE-2.0>
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Definitions below are reproduced from [RFC2119].
Additionally we add the keyword [DEPRECATED] to the set of keywords above.
In the rapidly evolving media streaming landscape, there is a pressing need to update legacy protocols to align with modern technological standards. The Real-Time Messaging Protocol [RTMP] and Flash Video [FLV] file format, introduced in 2002, have been pivotal and continue to be vital especially in live broadcasting. Despite RTMP widespread use, it has shown signs of aging, particularly in the lack of support for contemporary video codecs (e.g., VP8, VP9, HEVC, AV1) and audio codecs (Opus, FLAC, AC-3, E-AC-3). Recognizing this, Veovera Software Organization (VSO), in collaboration with industry giants like Adobe, YouTube, and Twitch, and other key stakeholders, has embarked on a mission to rejuvenate RTMP, ensuring it meets the demands of contemporary streaming needs.
This document details the comprehensive enhancements made to the RTMP and FLV specifications, aimed at revitalizing the technology for current and future media demands. Our strategic approach prioritizes innovation while maintaining backward compatibility, thereby augmenting RTMP’s utility without undermining existing infrastructures. Some of the key advancements include:
The additional audio and video codecs supported by enhanced RTMP are summarized in the following table:
Table: Additional audio and video codecs for E-RTMP
+----------------------------------------------+------------------------------------------------------------+
¦ Additional Audio Codec ¦ Notes ¦
+----------------------------------------------+------------------------------------------------------------+
¦AC-3 ¦AC-3 and E-AC-3 have significantly influenced the surround ¦
+----------------------------------------------+sound market by offering versatile and scalable audio ¦
¦ ¦solutions for both physical and streaming media. Their ¦
¦E-AC-3 ¦balance of complexity and performance makes them enduring ¦
¦ ¦standards in multichannel audio technology. ¦
+----------------------------------------------+------------------------------------------------------------+
¦Opus ¦ ¦
+----------------------------------------------+ ¦
¦FLAC ¦Popular in both hardware and software streaming solutions, ¦
+----------------------------------------------+the [WebCodecs] audio codec registry also includes support ¦
¦AAC (added FOURCC signaling) ¦for these widely used audio formats. ¦
+----------------------------------------------+ ¦
¦MP3 (added FOURCC signaling) ¦ ¦
+----------------------------------------------+------------------------------------------------------------+
¦ Additional Video Codec ¦ ¦
+----------------------------------------------+------------------------------------------------------------+
¦AVC (a.k.a., H.264, added FOURCC signaling) ¦ ¦
+----------------------------------------------+ ¦
¦HEVC (a.k.a., H.265) ¦ ¦
+----------------------------------------------+Popular in both hardware and software streaming solutions, ¦
¦VP8 (webRTC officially supports this codec) ¦the [WebCodecs] video codec registry also includes support ¦
+----------------------------------------------+for these widely used video formats. ¦
¦VP9 ¦ ¦
+----------------------------------------------+ ¦
¦AV1 ¦ ¦
+----------------------------------------------+------------------------------------------------------------+
These strategic enhancements position RTMP as a robust, future-proof standard in the streaming technology arena. Veovera is committed to open collaboration and values community input, believing that protocols and standards should be open and free to foster innovation and create a thriving ecosystem. Companies can capitalize on solutions built around open standards; the more popular and accessible a protocol is, the stronger the foundation for developing compelling solutions. A standard’s popularity fuels adoption, allowing companies to leverage its widespread use. In contrast, fragmentation caused by proprietary protocols hampers industry growth, while open standards empower everyone to innovate freely, creating a healthier marketplace. E-RTMP’s shift toward openness aligns with the principles of open standards, emphasizing its potential to become a foundational technology. We encourage participation in the ongoing development process through our GitHub repository, where you can access detailed documentation, contribute to the project, and share insights to foster a vibrant ecosystem around enhanced E-RTMP.
This document describes enhancements to legacy [RTMP] and legacy [FLV], introducing support for new media codecs, HDR capability, and more. A primary objective is to ensure these enhancements do not introduce breaking changes for established clients or the content they stream. As such, legacy RTMP and legacy FLV specifications remain integral to the RTMP ecosystem. While this updated specification aims to minimize redundancy with previous versions, when combined with previous-generation documentation, it provides a comprehensive overview of the RTMP solution. We’ve drawn from several legacy references, which are as follows:
This document employs certain conventions to convey particular meanings and requirements. The following section outlines the notation, terminology, and symbols used throughout to ensure clarity and consistency. These conventions provide insight into the ethos of how the E-RTMP specification has been crafted and should be interpreted.
Pseudocode: Pseudocode has been provided to convey logic on how to interpret the E-RTMP binary format. The code style imitates a cross between TypeScript and C. The pseudocode was written in TypeScript and validated using VSCode to ensure correct syntax and catch any minor typographical errors. Below are some further explanations:
videoCommand = UI8 as VideoCommand
Quotation Marks and Emphasis Guidelines: Ultimately, the context should drive the meaning, but we make an effort to leverage quotation marks and emphasis (i.e., bold) to maintain readability. We aim to avoid syntactic sugar as much as possible to ensure the document remains straightforward, easy to read, scan, and understand. The conventions for using double quotes (“), back quotes (`), and emphasis in this document to ensure clarity and consistency are as follows:
The following data types are used in [RTMP] bitstreams and [FLV] files. FOURCC was introduced to support E-RTMP.
Table: Simple data types
+-------------------------------+-----------------------------------------------------------------+
¦Type ¦Definition ¦
+-------------------------------+-----------------------------------------------------------------+
¦0x... ¦Hexadecimal value ¦
+-------------------------------+-----------------------------------------------------------------+
¦UB[n] ¦Bit field with unsigned n-bit integer, where n is in the range 1 ¦
¦ ¦to 31, excluding 8, 16, 24 ¦
+-------------------------------+-----------------------------------------------------------------+
¦FOURCC ¦Four-character ASCII code, such as "av01", encoded as UI32 ¦
+-------------------------------+-----------------------------------------------------------------+
¦SI8 ¦Signed 8-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦SI16 ¦Signed 16-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦SI24 ¦Signed 24-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦SI32 ¦Signed 32-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦UI8 ¦Unsigned 8-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦UI16 ¦Unsigned 16-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦UI24 ¦Unsigned 24-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦UI32 ¦Unsigned 32-bit integer ¦
+-------------------------------+-----------------------------------------------------------------+
¦xxx[] ¦Array of type xxx. Number of elements to be inferred ¦
+-------------------------------+-----------------------------------------------------------------+
¦xxx[n] ¦Array of n elements of type xxx ¦
+-------------------------------+-----------------------------------------------------------------+
¦[xxx] ¦Array of one element of type xxx ¦
+-------------------------------+-----------------------------------------------------------------+
Note: Unless specifically called out, multi-byte integers SHALL be stored in big-endian byte order
Adobe’s Real-Time Messaging Protocol [RTMP] is an application-level protocol designed for the multiplexing and packetizing of multimedia streams—such as audio, video, and interactive content, for transmission over network protocols like TCP. A fundamental feature of RTMP is the Chunk Stream, which facilitates the multiplexing, packetizing, and prioritization of messages, integral to the protocol’s real-time capabilities.
The legacy RTMP specification in Section 6.1 elaborates on the RTMP Message Format, providing precise encoding guidelines for the RTMP message header, inclusive of field widths and byte order. However, this portrayal might be somewhat confusing because RTMP messages, when transported over the Chunk Stream, don’t literally conform to this depicted format. An RTMP Message is divided into two principal components: a message virtual header and a message payload. The “virtual” descriptor indicates that while RTMP messages are carried within the RTMP Chunk Stream, their headers are conceptually encoded as Chunk Message Headers. When these are decoded from the RTMP Chunk Stream, the underlying transport layer, the resulting format is to be understood as a virtual header. This abstract representation aligns with the structured format and semantics detailed in the legacy RTMP specification. Detailed next is the format of the message virtual header and some additional related information.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|MessageType ID | Payload length |
| (1 byte) | (3 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
| (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stream ID |
| (3 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
An [FLV] file is a container for AV (Audio and Video) data. The file consists of alternating back-pointers and tags, each accompanied by data related to that tag. Each TagType within an FLV file is unsigned and defined by 5 bits. AUDIODATA has a TagType of 8, and VIDEODATA has a TagType of 9.
Note: Each TagType corresponds directly to the same MessageType ID, defined by UI8, in the [RTMP] specification. This alignment is intentional.
TagType values of 8 or 9 are accompanied by an AudioTagHeader or VideoTagHeader respectively. While RTMP is commonly associated with FLV, it is important to note that RTMP is a protocol, whereas FLV is a file container format. This distinction is why they were originally defined in separate specifications. This enhancement specification aims to improve both RTMP and FLV.
Below is the AudioTagHeader format for the legacy FLV specification:
Table: FLV specification AudioTagHeader
+------------------+----------------------+----------------------------------------------------------+
¦Field ¦Type ¦Comment ¦
+------------------+----------------------+----------------------------------------------------------+
¦ ¦ ¦Format of SoundData. The following values are defined: ¦
¦ ¦ ¦ 0 = Linear PCM, platform-endian ¦
¦ ¦ ¦ 1 = ADPCM ¦
¦ ¦ ¦ 2 = MP3 ¦
¦ ¦ ¦ 3 = Linear PCM, little-endian ¦
¦ ¦ ¦ 4 = Nellymoser 16 kHz mono ¦
¦ ¦ ¦ 5 = Nellymoser 8 kHz mono ¦
¦ ¦ ¦ 6 = Nellymoser ¦
¦ ¦ ¦ 7 = G.711 A-law logarithmic PCM ¦
¦SoundFormat ¦UB[4] ¦ 8 = G.711 mu-law logarithmic PCM ¦
¦ ¦ ¦ 9 = Reserved ¦
¦ ¦ ¦10 = AAC ¦
¦ ¦ ¦11 = Speex ¦
¦ ¦ ¦12 = Reserved ¦
¦ ¦ ¦13 = Reserved ¦
¦ ¦ ¦14 = MP3 8 kHz ¦
¦ ¦ ¦15 = Device-specific sound ¦
¦ ¦ ¦Formats 7, 8, 14, and 15 are reserved. ¦
¦ ¦ ¦AAC is supported in Flash Player 9,0,115,0 and higher. ¦
¦ ¦ ¦Speex is supported in Flash Player 10 and higher. ¦
+------------------+----------------------+----------------------------------------------------------+
¦ ¦ ¦Sampling rate. The following values are defined: ¦
¦ ¦ ¦0 = 5.5 kHz ¦
¦SoundRate ¦UB[2] ¦1 = 11 kHz ¦
¦ ¦ ¦2 = 22 kHz ¦
¦ ¦ ¦3 = 44 kHz ¦
+------------------+----------------------+----------------------------------------------------------+
¦ ¦ ¦Size of each audio sample. This parameter only pertains to¦
¦ ¦ ¦uncompressed formats. Compressed formats always decode ¦
¦SoundSize ¦UB[1] ¦to 16 bits internally. ¦
¦ ¦ ¦0 = 8-bit samples ¦
¦ ¦ ¦1 = 16-bit samples ¦
+------------------+----------------------+----------------------------------------------------------+
¦SoundType ¦UB[1] ¦Mono or stereo sound 0 = Mono sound ¦
¦ ¦ ¦1 = Stereo sound ¦
+------------------+----------------------+----------------------------------------------------------+
¦AACPacketType ¦IF SoundFormat == 10 ¦The following values are defined: 0 = AAC sequence header ¦
¦ ¦UI8 ¦1 = AAC raw ¦
+------------------+----------------------+----------------------------------------------------------+
Below is the VideoTagHeader format for the legacy FLV specification:
Table: FLV specification VideoTagHeader
+------------------+----------------------+-----------------------------------------------------------+
¦ Field ¦ Type ¦ Comment ¦
+------------------+----------------------+-----------------------------------------------------------+
¦ ¦ ¦Type of video frame. The following values are defined: ¦
¦ ¦ ¦1 = key frame (for AVC, a seekable frame) ¦
¦Frame Type ¦UB[4] ¦2 = inter frame (for AVC, a non-seekable frame) ¦
¦ ¦ ¦3 = disposable inter frame (H.263 only) ¦
¦ ¦ ¦4 = generated key frame (reserved for server use only) ¦
¦ ¦ ¦5 = video info/command frame ¦
+------------------+----------------------+-----------------------------------------------------------+
¦ ¦ ¦Codec Identifier. The following values are defined: ¦
¦ ¦ ¦2 = Sorenson H.263 ¦
¦ ¦ ¦3 = Screen video ¦
¦CodecID ¦UB[4] ¦4 = On2 VP6 ¦
¦ ¦ ¦5 = On2 VP6 with alpha channel ¦
¦ ¦ ¦6 = Screen video version 2 ¦
¦ ¦ ¦7 = AVC ¦
+------------------+----------------------+-----------------------------------------------------------+
¦ ¦ ¦The following values are defined: ¦
¦ ¦IF CodecID == 7 ¦0 = AVC sequence header ¦
¦AVCPacketType ¦UI8 ¦1 = AVC NALU ¦
¦ ¦ ¦2 = AVC end of sequence (lower level NALU sequence ender is¦
¦ ¦ ¦not REQUIRED or supported) ¦
+------------------+----------------------+-----------------------------------------------------------+
¦ ¦ ¦IF AVCPacketType == 1 ¦
¦ ¦ ¦ Composition time offset ¦
¦ ¦IF CodecID == 7 ¦ELSE ¦
¦CompositionTime ¦SI24 ¦ 0 ¦
¦ ¦ ¦See ISO/IEC 14496-12, 8.15.3 for an explanation of ¦
¦ ¦ ¦composition times. The offset in an FLV file is always in ¦
¦ ¦ ¦milliseconds. ¦
+------------------+----------------------+-----------------------------------------------------------+
Within the following sections, this document provides a comprehensive overview of the enhancements made to [RTMP] and [FLV]. Together, these improvements constitute the enhanced RTMP also known as E-RTMP. These enhancements are discussed in detail, highlighting their impact and benefits.
[FLV] metadata SHALL be encapsulated within a [SCRIPTDATA] segment, which includes a [ScriptTagBody] encoded in the Action Message Format (AMF). Importantly, this metadata SHALL always remain unencrypted, even when the FLV content itself is encrypted. This design choice is essential for allowing various FLV parsers to successfully stream the FLV content and for enabling media players to provide contextual information to the user.
The ScriptTagBody is structured to encapsulate method invocations. It consists of an item containing a method name (e.g., onMetaData) along with a corresponding set of arguments.
To signal FLV metadata, the item within the ScriptTagBody MUST encapsulate the method name onMetaData, along with a single argument of type ECMA array. This array holds metadata properties, the availability of which may vary depending on the software used to create the FLV. Typical onMetaData argument properties include, but are not limited to:
Table: Typical properties found in the onMetaData argument object
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦Property ¦Type ¦Comment ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦audiocodecid ¦number ¦Audio codec ID used in the file: See AudioTagHeader of the legacy [FLV] ¦
¦ ¦ ¦specification for available CodecID values. ¦
¦ ¦ ¦ ¦
¦ ¦ ¦ ¦
¦ ¦ ¦When [FourCC] is used to signal the codec, this property is set to a FOURCC ¦
¦ ¦ ¦value. Note: A FOURCC value is big-endian relative to the underlying ASCII ¦
¦ ¦ ¦character sequence (e.g., "Opus" == 0x4F707573 == 1332770163.0). ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦audiodatarate ¦number ¦Audio bitrate, in kilobits per second ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦audiodelay ¦number ¦Delay introduced by the audio codec, in seconds ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦audiosamplerate ¦number ¦Frequency at which the audio stream is replayed ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦audiosamplesize ¦number ¦Resolution of a single audio sample ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦canSeekToEnd ¦boolean ¦Indicating the last video frame is a key frame ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦creationdate ¦string ¦Creation date and time ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦duration ¦number ¦Total duration of the file, in seconds ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦filesize ¦number ¦Total size of the file, in bytes ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦framerate ¦number ¦Number of frames per second ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦height ¦number ¦Height of the video, in pixels ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦stereo ¦boolean ¦Indicates stereo audio ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦videocodecid ¦number ¦Video codec ID used in the file: See VideoTagHeader of the legacy [FLV] ¦
¦ ¦ ¦specification for available CodecID values. ¦
¦ ¦ ¦ ¦
¦ ¦ ¦When [FourCC] is used to signal the codec, this property is set to a FOURCC ¦
¦ ¦ ¦value. Note: A FOURCC value is big-endian relative to the underlying ASCII ¦
¦ ¦ ¦character sequence (e.g., "av01" == 0x61763031 == 1635135537.0). ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦videodatarate ¦number ¦Video bitrate, in kilobits per second ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦width ¦number ¦Width of the video, in pixels ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
¦audioTrackIdInfoMap ¦Object ¦The audioTrackIdInfoMap and videoTrackIdInfoMap objects are designed to store ¦
+----------------------+ ¦metadata for audio and video tracks respectively. Each object uses a TrackId as¦
¦videoTrackIdInfoMap ¦ ¦a key to map to properties that detail the unique characteristics of each ¦
¦ ¦ ¦individual track, diverging from the default configurations. ¦
¦ ¦ ¦ ¦
¦ ¦ ¦- Key-Value Structure: ¦
¦ ¦ ¦ * Keys: Each TrackId acts as a unique identifier for a specific audio or ¦
¦ ¦ ¦ video track. ¦
¦ ¦ ¦ * Values: Track Objects containing metadata that specify characteristics ¦
¦ ¦ ¦ which deviate from the default track settings. ¦
¦ ¦ ¦- Properties of Each Track Object: ¦
¦ ¦ ¦ * These properties detail non-standard configurations needed for custom ¦
¦ ¦ ¦ handling of the track, facilitating specific adjustments to enhance track ¦
¦ ¦ ¦ performance and quality for varied conditions. ¦
¦ ¦ ¦ * For videoTrackIdInfoMap: ¦
¦ ¦ ¦ + Properties such as width, height, videodatarate, etc. specify video ¦
¦ ¦ ¦ characteristics that differ from standard settings. ¦
¦ ¦ ¦ * For audioTrackIdInfoMap: ¦
¦ ¦ ¦ + Properties such as audiodatarate, channels, etc., define audio ¦
¦ ¦ ¦ characteristics that differ from standard configurations. ¦
¦ ¦ ¦- Purpose: ¦
¦ ¦ ¦ * The purpose of these maps is to specify unique properties for each track, ¦
¦ ¦ ¦ ensuring tailored configurations that optimize performance and quality for ¦
¦ ¦ ¦ specific media content and delivery scenarios. ¦
¦ ¦ ¦This structure provides a framework for detailed customization and control over¦
¦ ¦ ¦the media tracks, ensuring optimal management and delivery across various types¦
¦ ¦ ¦of content and platforms. ¦
¦ ¦ ¦ ¦
¦ ¦ ¦Examples: ¦
¦ ¦ ¦ ¦
¦ ¦ ¦e.g., 1 ¦
¦ ¦ ¦videoTrackIdInfoMap = { ¦
¦ ¦ ¦ 1: { ¦
¦ ¦ ¦ width: 1024, ¦
¦ ¦ ¦ height: 768, ¦
¦ ¦ ¦ videodatarate: 2000, ¦
¦ ¦ ¦ }, ¦
¦ ¦ ¦ ¦
¦ ¦ ¦ 2 : { ¦
¦ ¦ ¦ width: 3840, ¦
¦ ¦ ¦ height: 2160, ¦
¦ ¦ ¦ videodatarate: 30000, ¦
¦ ¦ ¦ }, ¦
¦ ¦ ¦} ¦
¦ ¦ ¦ ¦
¦ ¦ ¦e.g., 2 ¦
¦ ¦ ¦audioTrackIdInfoMap = { ¦
¦ ¦ ¦ 1: { ¦
¦ ¦ ¦ audiodatarate: 256, ¦
¦ ¦ ¦ channels: 2, ¦
¦ ¦ ¦ samplerate: 44100, ¦
¦ ¦ ¦ }, ¦
¦ ¦ ¦ ¦
¦ ¦ ¦ 2: { ¦
¦ ¦ ¦ audiodatarate: 320, ¦
¦ ¦ ¦ channels: 1, ¦
¦ ¦ ¦ samplerate: 22050, ¦
¦ ¦ ¦ }, ¦
¦ ¦ ¦} ¦
+----------------------+----------------------------+-------------------------------------------------------------------------------+
Note:
- The properties audiocodecid and videocodecid have been enhanced to support FOURCC (Four-byte ASCII code) values. These values are interpreted as UI32 (e.g., “av01”).
- The properties audioTrackIdInfoMap and videoTrackIdInfoMap are new.
[RTMP] packetizes multimedia streams using a suitable transport protocol, typically a persistent TCP connection. There are instances when a streaming platform may request the streaming client to reconnect, such as:
To accommodate these needs, a NetConnection.Connect.ReconnectRequest status event has been introduced as part of the NetConnection onStatus command.
NetConnection establishes a bidirectional link between a client and a server, allowing for asynchronous Remote Procedure Calls (RPCs). The following commands (a.k.a., predefined RPCs) can be issued via NetConnection:
The onStatus command has been enhanced to include the capability to request a client to reconnect. Servers can issue an onStatus command to prompt clients to adapt to changes in NetConnection status. The structure of this command, as relayed from the server to the client, is outlined below:
Table: Server to client, NetConnection onStatus command
+----------------------+--------------+-----------------------------------------------------------------------------------------------+
¦ Field Name ¦ Type ¦ Description ¦
+----------------------+--------------+-----------------------------------------------------------------------------------------------+
¦Command Name ¦string ¦Name of the command. Set to onStatus ¦
+----------------------+--------------+-----------------------------------------------------------------------------------------------+
¦Transaction ID ¦number ¦Transaction ID set to 0. (i.e., no response needed) ¦
+----------------------+--------------+-----------------------------------------------------------------------------------------------+
¦Command Object ¦null ¦There is no command object for onStatus command. ¦
+----------------------+--------------+-----------------------------------------------------------------------------------------------+
¦Info Object ¦Object ¦An AMF-encoded object, the properties of which are utilized by the onStatus command. The Info ¦
¦ ¦ ¦Object provides information about the status of the current connection. ¦
+----------------------+--------------+-----------------------------------------------------------------------------------------------+
The following is a description of AMF-encoded name-value pairs in the Info Object for the onStatus command when handling reconnect. It MAY contain other properties as appropriate to the client.
Table: Info Object parameter for onStatus command when handling reconnect
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦ Property ¦ Type ¦ Description ¦ Example Value ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦tcUrl ¦string ¦Absolute or relative URI reference of the server to which to ¦1. rtmp://foo.mydomain.com:1935/realtimeapp ¦
¦(optional) ¦ ¦reconnect. If not specified, use the tcUrl for the current ¦2. rtmp://127.0.0.1/realtimeapp ¦
¦ ¦ ¦connection. A relative URI reference should be resolved ¦3. //192.0.2.0/realtimeapp ¦
¦ ¦ ¦relative to the tcUrl for the current connection. ¦4. /realtimeapp ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦code ¦string ¦A string identifying the event that occurred. To reconnect ¦NetConnection.Connect.ReconnectRequest ¦
¦ ¦ ¦code MUST be set to NetConnection.Connect.ReconnectRequest ¦ ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦description ¦string ¦A string containing human-readable information about the ¦The streaming server is undergoing updates. ¦
¦(optional) ¦ ¦message. Not every information object includes this property.¦ ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦level ¦string ¦A string indicating the severity of the event. To reconnect ¦status ¦
¦ ¦ ¦the level MUST be set to status. ¦ ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
The capability to support the NetConnection.Connect.ReconnectRequest event becomes evident during the initial connect phase. Detailed guidelines for signaling reconnect ability can be found in the Enhancing NetConnection connect Command section.
The server-to-client onStatus command for NetConnection, serves a crucial function within the RTMP framework. Though the legacy RTMP specification may not have detailed this command, the goal here is to offer an overview for a better understanding.
Both clients and servers can initiate RPCs at the receiving end, with some RPCs being predefined as commands. onStatus stands out as one such essential command.
When using the onStatus command, the goal is to inform the client about the status of the connection. Each dispatched command message comprises the following elements:
Both the Command Object and the Info Object offer additional context and details for the command. The onStatus command is triggered whenever there’s a status change or an error concerning the NetConnection. To handle this information, you should define a callback function.
// Sample pseudocode for the onStatus callback function
nc.onStatus = function(infoObject) {
// Handle the status change or error here.
}
infoObject is an AMF-encoded object with properties that provide information about the status of a NetConnection. It contains at least the following three properties, but MAY contain other properties as appropriate to the client.
Table: infoObject for onStatus command
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦ Property ¦ Type ¦ Description ¦ Example Value ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦code ¦string ¦A string identifying the event that occurred. ¦NetConnection.Connect.Success ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦description ¦string ¦A string containing human-readable information about the ¦The connection attempt succeeded. ¦
¦(optional) ¦ ¦message. Not every information object includes this property.¦ ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
¦level ¦string ¦There are three established values for level: status, ¦status ¦
¦ ¦ ¦warning, and error. ¦ ¦
+----------------+-----------+-------------------------------------------------------------+------------------------------------------------+
The table below provides examples of code, level, and description property values. Please note that this is not an exhaustive list, and not all entries may apply to every type of client. Additionally, the description property values included are merely illustrative examples; developers are responsible for conveying the appropriate meaning in their specific solutions.
Table: code, level and description values for infoObject used by onStatus
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦ Code ¦ Level ¦ Description ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Call.Failed ¦error ¦The NetConnection.call() method was not able to invoke the server-side method or command. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Connect.AppShutdown ¦error ¦The application has been shut down (for example, if the application is out of memory resources ¦
¦ ¦ ¦and must shut down to prevent the server from crashing) or the server has shut down. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Connect.Closed ¦status ¦The connection was closed successfully. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Connect.Failed ¦error ¦The connection attempt failed. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Connect.Rejected ¦error ¦The client does not have permission to connect to the application. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Connect.Success ¦status ¦The connection attempt succeeded. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Connect.ReconnectRequest ¦status ¦The server is requesting the client to reconnect. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
¦NetConnection.Proxy.NotResponding ¦error ¦The proxy server is not responding. See the ProxyStream class. ¦
+-----------------------------------------+-------------+-----------------------------------------------------------------------------------------------+
The AudioTagHeader has been extended to define additional audio codecs, multichannel audio, multitrack capabilities, signaling support, and additional miscellaneous enhancements, while ensuring backward compatibility. This extension is termed the ExAudioTagHeader and is designed to be future-proof, allowing for the definition of additional audio codecs, features, and corresponding signaling.
During the parsing process, the logic MUST handle unexpected or unknown elements gracefully. Specifically, if any critical signaling or flags (e.g., AudioPacketType and AudioFourCc) are not recognized, the system MUST fail in a controlled and predictable manner.
Important: A single audio message for a unique timestamp may include a batch of AudioPacketType values (e.g., multiple TrackId values). When parsing an audio message, the bitstream MUST be processed completely to ensure all payload data has been handled.
Table: Extended AudioTagHeader
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦ Description Of Bitstream ¦ Enumerated Types ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦soundFormat = UB[4] as SoundFormat ¦enum SoundFormat { ¦
¦ ¦ LPcmPlatformEndian = 0, ¦
¦if (soundFormat != SoundFormat.ExHeader) { ¦ AdPcm = 1, ¦
¦ // See AudioTagHeader of the legacy [FLV] specification for for detailed format ¦ Mp3 = 2, ¦
¦ // of the four bits used for soundRate/soundSize/soundType ¦ LPcmLittleEndian = 3, ¦
¦ // ¦ Nellymoser16KMono = 4, ¦
¦ // NOTE: soundRate, soundSize and soundType formats have not changed. ¦ Nellymoser8KMono = 5, ¦
¦ // if (soundFormat == SoundFormat.ExHeader) we switch into FOURCC audio mode ¦ Nellymoser = 6, ¦
¦ // as defined below. This means that soundRate, soundSize and soundType ¦ G711ALaw = 7, ¦
¦ // bits are not interpreted, instead the UB[4] bits are interpreted as an ¦ G711MuLaw = 8, ¦
¦ // AudioPacketType ¦ ExHeader = 9, // new, used to signal FOURCC mode ¦
¦ soundRate = UB[2] ¦ Aac = 10, ¦
¦ soundSize = UB[1] ¦ Speex = 11, ¦
¦ soundType = UB[1] ¦ // 12 - reserved ¦
¦} ¦ // 13 - reserved ¦
¦ ¦ Mp3_8K = 14, ¦
¦ ¦ Native = 15, // Device specific sound ¦
¦ ¦} ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦ ExAudioTagHeader Section ¦
¦ Note: ExAudioTagHeader is present if (soundFormat == SoundFormat.ExHeader) ¦
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
¦ Description Of Bitstream ¦ Enumerated Types ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦/// ¦enum AudioPacketType { ¦
¦// process ExAudioTagHeader ¦ SequenceStart = 0, ¦
¦// ¦ CodedFrames = 1, ¦
¦processAudioBody = false ¦ ¦
¦if (soundFormat == SoundFormat.ExHeader) { ¦ // RTMP includes a previously undocumented "audio silence" message. ¦
¦ processAudioBody = true ¦ // This silence message is identified when an audio message contains ¦
¦ ¦ // a zero-length payload, or more precisely, an empty audio message ¦
¦ // Interpret UB[4] bits as AudioPacketType instead of sound rate, size, and type. ¦ // without an AudioTagHeader, indicating a period of silence. The ¦
¦ audioPacketType = UB[4] as AudioPacketType // at byte boundary after this read ¦ // action to take after receiving a silence message is system ¦
¦ ¦ // dependent. The semantics of the silence message in the Flash ¦
¦ // Process each ModEx data packet ¦ // Media playback and timing model are as follows: ¦
¦ while (audioPacketType == AudioPacketType.ModEx) { ¦ // ¦
¦ // Determine the size of the packet ModEx data (ranging from 1 to 256 bytes) ¦ // - Ensure all buffered audio data is played out before entering the ¦
¦ modExDataSize = UI8 + 1 ¦ // silence period: ¦
¦ ¦ // Make sure that any audio data currently in the buffer is fully ¦
¦ // If maximum 8-bit size is not sufficient, use a 16-bit value ¦ // processed and played. This ensures a clean transition into the ¦
¦ if (modExDataSize == 256) { ¦ // silence period without cutting off any audio. ¦
¦ modExDataSize = UI16 + 1; ¦ // ¦
¦ } ¦ // - After playing all buffered audio data, flush the audio decoder: ¦
¦ ¦ // Clear the audio decoder to reset its state and prepare it for new ¦
¦ // Fetch the packet ModEx data based on its determined size ¦ // input after the silence period. ¦
¦ modExData = UI8[modExDataSize] ¦ // ¦
¦ ¦ // - During the silence period, the audio clock can't be used as the ¦
¦ // fetch the AudioPacketModExType ¦ // master clock for synchronizing playback: ¦
¦ audioPacketModExType = UB[4] as AudioPacketModExType ¦ // Switch to using the system's wall-clock time to maintain the correct ¦
¦ ¦ // timing for video and other data streams. ¦
¦ // Update audioPacketType ¦ // ¦
¦ audioPacketType = UB[4] as AudioPacketType // at byte boundary after this read ¦ // - Don't wait for audio frames for synchronized A+V playback: ¦
¦ ¦ // Normally, audio frames drive the synchronization of audio and video ¦
¦ if (audioPacketModExType == AudioPacketModExType.TimestampOffsetNano) { ¦ // (A/V) playback. During the silence period, playback should not stall ¦
¦ // This block processes TimestampOffsetNano to enhance RTMP timescale ¦ // waiting for audio frames. Video and other data streams should ¦
¦ // accuracy and compatibility with formats like MP4, M2TS, and Safari's ¦ // continue to play based on the wall-clock time, ensuring smooth ¦
¦ // Media Source Extensions. It ensures precise synchronization without ¦ // playback without audio. ¦
¦ // altering core RTMP timestamps, applying only to the current media ¦ // ¦
¦ // message. These adjustments enhance synchronization and timing ¦ // AudioPacketType.SequenceEnd is to have no less than the same meaning as ¦
¦ // accuracy in media messages while preserving the core RTMP timestamp ¦ // a silence message. While it may seem redundant, we need to introduce ¦
¦ // integrity. ¦ // this enum to ensure we can signal the end of the audio sequence for any ¦
¦ // ¦ // audio track. ¦
¦ // NOTE: ¦ SequenceEnd = 2, ¦
¦ // - 1 millisecond (ms) = 1,000,000 nanoseconds (ns). ¦ ¦
¦ // - Maximum value representable with 20 bits is 1,048,575 ns ¦ // 3 - Reserved ¦
¦ // (just over 1 ms), allowing precise sub-millisecond adjustments. ¦ ¦
¦ // - modExData must be at least 3 bytes, storing values up to 999,999 ns. ¦ MultichannelConfig = 4, ¦
¦ audioTimestampNanoOffset = bytesToUI24(modExData) ¦ ¦
¦ ¦ // Turns on audio multitrack mode ¦
¦ // TODO: Integrate this nanosecond offset into timestamp management ¦ Multitrack = 5, ¦
¦ // to accurately adjust the presentation time. ¦ ¦
¦ } ¦ // 6 - reserved ¦
¦ } ¦ ¦
¦ ¦ // ModEx is a special signal within the AudioPacketType enum that ¦
¦ if (audioPacketType == AudioPacketType.Multitrack) { ¦ // serves to both modify and extend the behavior of the current packet. ¦
¦ isAudioMultitrack = true; ¦ // When this signal is encountered, it indicates the presence of ¦
¦ audioMultitrackType = UB[4] as AvMultitrackType ¦ // additional modifiers or extensions, requiring further processing to ¦
¦ ¦ // adjust or augment the packet's functionality. ModEx can be used to ¦
¦ // Fetch AudioPacketType for all audio tracks in the audio message. ¦ // introduce new capabilities or modify existing ones, such as ¦
¦ // This fetch MUST not result in a AudioPacketType.Multitrack ¦ // enabling support for high-precision timestamps or other advanced ¦
¦ audioPacketType = UB[4] as AudioPacketType ¦ // features that enhance the base packet structure. ¦
¦ ¦ ModEx = 7, ¦
¦ if (audioMultitrackType != AvMultitrackType.ManyTracksManyCodecs) { ¦ ¦
¦ // The tracks are encoded with the same codec. Fetch the FOURCC for them ¦ // ... ¦
¦ audioFourCc = FOURCC as AudioFourCc ¦ // 14 - reserved ¦
¦ } ¦ // 15 - reserved ¦
¦ } else { ¦} ¦
¦ audioFourCc = FOURCC as AudioFourCc ¦ ¦
¦ } ¦enum AudioPacketModExType { ¦
¦} ¦ TimestampOffsetNano = 0, ¦
¦ ¦ ¦
¦ ¦ // ... ¦
¦ ¦ // 14 - reserved ¦
¦ ¦ // 15 - reserved ¦
¦ ¦} ¦
¦ ¦ ¦
¦ ¦enum AudioFourCc { ¦
¦ ¦ // ¦
¦ ¦ // Valid FOURCC values for signaling support of audio codecs ¦
¦ ¦ // in the enhanced FourCC pipeline. In this context, support ¦
¦ ¦ // for a FourCC codec MUST be signaled via the enhanced ¦
¦ ¦ // "connect" command. ¦
¦ ¦ // ¦
¦ ¦ ¦
¦ ¦ // AC-3/E-AC-3 - <https://en.wikipedia.org/wiki/Dolby_Digital> ¦
¦ ¦ Ac3 = makeFourCc("ac-3"), ¦
¦ ¦ Eac3 = makeFourCc("ec-3"), ¦
¦ ¦ ¦
¦ ¦ // Opus audio - <https://opus-codec.org/> ¦
¦ ¦ Opus = makeFourCc("Opus"), ¦
¦ ¦ ¦
¦ ¦ // Mp3 audio - <https://en.wikipedia.org/wiki/MP3> ¦
¦ ¦ Mp3 = makeFourCc(".mp3"), ¦
¦ ¦ ¦
¦ ¦ // Free Lossless Audio Codec - <https://xiph.org/flac/format.html> ¦
¦ ¦ Flac = makeFourCc("fLaC"), ¦
¦ ¦ ¦
¦ ¦ // Advanced Audio Coding - <https://en.wikipedia.org/wiki/Advanced_Audio_Coding> ¦
¦ ¦ // The following AAC profiles, denoted by their object types, are supported ¦
¦ ¦ // 1 = main profile ¦
¦ ¦ // 2 = low complexity, a.k.a., LC ¦
¦ ¦ // 5 = high efficiency / scale band replication, a.k.a., HE / SBR ¦
¦ ¦ Aac = makeFourCc("mp4a"), ¦
¦ ¦} ¦
¦ ¦ ¦
¦ ¦enum AvMultitrackType { ¦
¦ ¦ // ¦
¦ ¦ // Used by audio and video pipeline ¦
¦ ¦ // ¦
¦ ¦ ¦
¦ ¦ OneTrack = 0, ¦
¦ ¦ ManyTracks = 1, ¦
¦ ¦ ManyTracksManyCodecs = 2, ¦
¦ ¦ ¦
¦ ¦ // 3 - Reserved ¦
¦ ¦ // ... ¦
¦ ¦ // 15 - Reserved ¦
¦ ¦} ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦ ExAudioTagBody Section ¦
¦ Note: This ExAudioTagBody format is signaled by the presence of ExAudioTagHeader ¦
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
¦ Description Of Bitstream ¦ Enumerated Types ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦// ¦enum AudioChannelOrder { ¦
¦// process ExAudioTagBody ¦ // ¦
¦// ¦ // Only the channel count is specified, without any further information ¦
¦while (processAudioBody) { ¦ // about the channel order ¦
¦ if (isAudioMultitrack) { ¦ // ¦
¦ if (audioMultitrackType == AvMultitrackType.ManyTracksManyCodecs) { ¦ Unspecified = 0, ¦
¦ // Each track has a codec assigned to it. Fetch the FOURCC for the next track.¦ ¦
¦ audioFourCc = FOURCC as AudioFourCc ¦ // ¦
¦ } ¦ // The native channel order (i.e., the channels are in the same order in ¦
¦ ¦ // which as defined in the AudioChannel enum). ¦
¦ // Track Ordering: ¦ // ¦
¦ // ¦ Native = 1, ¦
¦ // For identifying the highest priority (a.k.a., default track) ¦ ¦
¦ // or highest quality track, it is RECOMMENDED to use trackId ¦ // ¦
¦ // set to zero. For tracks of lesser priority or quality, use ¦ // The channel order does not correspond to any predefined ¦
¦ // multiple instances of trackId with ascending numerical values. ¦ // order and is stored as an explicit map. ¦
¦ // The concept of priority or quality can have multiple ¦ // ¦
¦ // interpretations, including but not limited to bitrate, ¦ Custom = 2 ¦
¦ // resolution, default angle, and language. This recommendation ¦ ¦
¦ // serves as a guideline intended to standardize track numbering ¦ // 3 - Reserved ¦
¦ // across various applications. ¦ // ... ¦
¦ audioTrackId = UI8 ¦ // 15 - reserved ¦
¦ ¦} ¦
¦ if (audioMultitrackType != AvMultitrackType.OneTrack) { ¦ ¦
¦ // The `sizeOfAudioTrack` specifies the size in bytes of the ¦enum AudioChannelMask { ¦
¦ // current track that is being processed. This size starts ¦ // ¦
¦ // counting immediately after the position where the `sizeOfAudioTrack` ¦ // Mask used to indicate which channels are present in the stream. ¦
¦ // value is located. You can use this value as an offset to locate the ¦ // ¦
¦ // next audio track in a multitrack system. The data pointer is ¦ ¦
¦ // positioned immediately after this field. Depending on the MultiTrack ¦ // masks for commonly used speaker configurations ¦
¦ // type, the offset points to either a `fourCc` or a `trackId.` ¦ // <https://en.wikipedia.org/wiki/Surround_sound#Standard_speaker_channels> ¦
¦ sizeOfAudioTrack = UI24 ¦ FrontLeft = 0x000001, ¦
¦ } ¦ FrontRight = 0x000002, ¦
¦ } ¦ FrontCenter = 0x000004, ¦
¦ ¦ LowFrequency1 = 0x000008, ¦
¦ if (audioPacketType == AudioPacketType.MultichannelConfig) { ¦ BackLeft = 0x000010, ¦
¦ // ¦ BackRight = 0x000020, ¦
¦ // Specify a speaker for a channel as it appears in the bitstream. ¦ FrontLeftCenter = 0x000040, ¦
¦ // This is needed if the codec is not self-describing for channel mapping ¦ FrontRightCenter = 0x000080, ¦
¦ // ¦ BackCenter = 0x000100, ¦
¦ ¦ SideLeft = 0x000200, ¦
¦ // set audio channel order ¦ SideRight = 0x000400, ¦
¦ audioChannelOrder = UI8 as AudioChannelOrder ¦ TopCenter = 0x000800, ¦
¦ ¦ TopFrontLeft = 0x001000, ¦
¦ // number of channels ¦ TopFrontCenter = 0x002000, ¦
¦ channelCount = UI8 ¦ TopFrontRight = 0x004000, ¦
¦ ¦ TopBackLeft = 0x008000, ¦
¦ if (audioChannelOrder == AudioChannelOrder.Custom) { ¦ TopBackCenter = 0x010000, ¦
¦ // Each entry specifies the speaker layout (see AudioChannel enum above ¦ TopBackRight = 0x020000, ¦
¦ // for layout definition) in the order that it appears in the bitstream. ¦ ¦
¦ // First entry (i.e., index 0) specifies the speaker layout for channel 1. ¦ // Completes 22.2 multichannel audio, as ¦
¦ // Subsequent entries specify the speaker layout for the next channels ¦ // standardized in SMPTE ST2036-2-2008 ¦
¦ // (e.g., second entry for channel 2, third entry for channel 3, etc.). ¦ // see - <https://en.wikipedia.org/wiki/22.2_surround_sound> ¦
¦ audioChannelMapping = UI8[channelCount] as AudioChannel ¦ LowFrequency2 = 0x040000, ¦
¦ } ¦ TopSideLeft = 0x080000, ¦
¦ ¦ TopSideRight = 0x100000, ¦
¦ if (audioChannelOrder == AudioChannelOrder.Native) { ¦ BottomFrontCenter = 0x200000, ¦
¦ // audioChannelFlags indicates which channels are present in the ¦ BottomFrontLeft = 0x400000, ¦
¦ // multi-channel stream. You can perform a Bitwise AND ¦ BottomFrontRight = 0x800000, ¦
¦ // (i.e., audioChannelFlags & AudioChannelMask.xxx) to see if a ¦} ¦
¦ // specific audio channel is present ¦ ¦
¦ audioChannelFlags = UI32 ¦enum AudioChannel { ¦
¦ } ¦ // ¦
¦ } ¦ // Channel mappings enums ¦
¦ ¦ // ¦
¦ if (audioPacketType == AudioPacketType.SequenceEnd) { ¦ ¦
¦ // signals end of sequence ¦ // commonly used speaker configurations ¦
¦ } ¦ // see - <https://en.wikipedia.org/wiki/Surround_sound#Standard_speaker_channels> ¦
¦ ¦ FrontLeft = 0, // i.e., FrontLeft is assigned to channel zero ¦
¦ if (audioPacketType == AudioPacketType.SequenceStart) { ¦ FrontRight, ¦
¦ if (audioFourCc == AudioFourCc.Aac) { ¦ FrontCenter, ¦
¦ // The AAC audio specific config (a.k.a., AacSequenceHeader) is ¦ LowFrequency1, ¦
¦ // defined in ISO/IEC 14496-3. ¦ BackLeft, ¦
¦ aacHeader = [AacSequenceHeader] ¦ BackRight, ¦
¦ } ¦ FrontLeftCenter, ¦
¦ ¦ FrontRightCenter, ¦
¦ if (audioFourCc == AudioFourCc.Flac) { ¦ BackCenter = 8, ¦
¦ // FlacSequenceHeader layout is: ¦ SideLeft, ¦
¦ // ¦ SideRight, ¦
¦ // The bytes 0x66 0x4C 0x61 0x43 ("fLaC" in ASCII) signature ¦ TopCenter, ¦
¦ // ¦ TopFrontLeft, ¦
¦ // Followed by a metadata block (called the STREAMINFO block) as described ¦ TopFrontCenter, ¦
¦ // in section 7 of the FLAC specification. The STREAMINFO block contains ¦ TopFrontRight, ¦
¦ // information about the whole sequence, such as sample rate, number of ¦ TopBackLeft, ¦
¦ // channels, total number of samples, etc. It MUST be present as the first ¦ TopBackCenter = 16, ¦
¦ // metadata block in the sequence. The FLAC audio specific bitstream format ¦ TopBackRight, ¦
¦ // is defined at <https://xiph.org/flac/format.html> ¦ ¦
¦ flacHeader = [FlacSequenceHeader] ¦ // mappings to complete 22.2 multichannel audio, as ¦
¦ } ¦ // standardized in SMPTE ST2036-2-2008 ¦
¦ ¦ // see - <https://en.wikipedia.org/wiki/22.2_surround_sound> ¦
¦ if (audioFourCc == AudioFourCc.Opus) { ¦ LowFrequency2 = 18, ¦
¦ // Opus Sequence header (a.k.a., ID header): ¦ TopSideLeft, ¦
¦ // - The Opus sequence start is also known as the ID header. ¦ TopSideRight, ¦
¦ // - It contains essential information needed to initialize ¦ BottomFrontCenter, ¦
¦ // the decoder and understand the stream format. ¦ BottomFrontLeft, ¦
¦ // - For detailed structure, refer to RFC 7845, Section 5.1: ¦ BottomFrontRight = 23, ¦
¦ // <https://datatracker.ietf.org/doc/html/rfc7845#section-5.1> ¦ ¦
¦ // ¦ // 24 - Reserved ¦
¦ // If the Opus sequence start payload is empty, use the ¦ // ... ¦
¦ // AudioPacketType.MultichannelConfig signal for channel ¦ // 0xfd - reserved ¦
¦ // mapping when present; otherwise, default to mono/stereo mode. ¦ ¦
¦ opusHeader = [OpusSequenceHeader] ¦ // Channel is empty and can be safely skipped. ¦
¦ } ¦ Unused = 0xfe, ¦
¦ } ¦ ¦
¦ ¦ // Channel contains data, but its speaker configuration is unknown. ¦
¦ if (audioPacketType == AudioPacketType.CodedFrames) { ¦ Unknown = 0xff, ¦
¦ if (audioFourCc == AudioFourCc.Ac3 || audioFourCc == AudioFourCc.Eac3) { ¦} ¦
¦ // Body contains audio data as defined by the bitstream syntax ¦ ¦
¦ // in the ATSC standard for Digital Audio Compression (AC-3, E-AC-3) ¦ ¦
¦ ac3Data = [Ac3CodedData] ¦ ¦
¦ } ¦ ¦
¦ ¦ ¦
¦ if (audioFourCc == AudioFourCc.Opus) { ¦ ¦
¦ // Body contains Opus packets. The layout is one Opus ¦ ¦
¦ // packet for each of N different streams, where N is ¦ ¦
¦ // typically one for mono or stereo, but MAY be greater ¦ ¦
¦ // than one for multichannel audio. The value N is ¦ ¦
¦ // specified in the ID header (Opus sequence start) or ¦ ¦
¦ // via the AudioPacketType.MultichannelConfig signal, and ¦ ¦
¦ // is fixed over the entire length of the Opus sequence. ¦ ¦
¦ // The first (N - 1) Opus packets, if any, are packed one ¦ ¦
¦ // after another using the self-delimiting framing from ¦ ¦
¦ // Appendix B of [RFC6716]. The remaining Opus packet is ¦ ¦
¦ // packed at the end of the Ogg packet using the regular, ¦ ¦
¦ // undelimited framing from Section 3 of [RFC6716]. All ¦ ¦
¦ // of the Opus packets in a single audio packet MUST be ¦ ¦
¦ // constrained to have the same duration. ¦ ¦
¦ opusData = [OpusCodedData] ¦ ¦
¦ } ¦ ¦
¦ ¦ ¦
¦ if (audioFourCc == AudioFourCc.Mp3) { ¦ ¦
¦ // An Mp3 audio stream is built up from a succession of smaller ¦ ¦
¦ // parts called frames. Each frame is a data block with its own header ¦ ¦
¦ // and audio information ¦ ¦
¦ mp3Data = [Mp3CodedData] ¦ ¦
¦ } ¦ ¦
¦ ¦ ¦
¦ if (audioFourCc == AudioFourCc.Aac) { ¦ ¦
¦ // The AAC audio specific bitstream format is defined in ISO/IEC 14496-3. ¦ ¦
¦ aacData = [AacCodedData] ¦ ¦
¦ } ¦ ¦
¦ ¦ ¦
¦ if (audioFourCc == AudioFourCc.Flac) { ¦ ¦
¦ // The audio data is composed of one or more audio frames. Each frame ¦ ¦
¦ // consists of a frame header, which contains a sync code and information ¦ ¦
¦ // about the frame, such as the block size, sample rate, number of ¦ ¦
¦ // channels, et cetera. The Flac audio specific bitstream format ¦ ¦
¦ // is defined at <https://xiph.org/flac/format.html> ¦ ¦
¦ flacData = [FlacCodedData] ¦ ¦
¦ } ¦ ¦
¦ } ¦ ¦
¦ ¦ ¦
¦ if ( ¦ ¦
¦ isAudioMultitrack && ¦ ¦
¦ audioMultitrackType != AvMultitrackType.OneTrack && ¦ ¦
¦ positionDataPtrToNextAudioTrack(sizeOfAudioTrack) ¦ ¦
¦ ) { ¦ ¦
¦ // TODO: need to implement positionDataPtrToNextVideoTrack() ¦ ¦
¦ continue ¦ ¦
¦ } ¦ ¦
¦ ¦ ¦
¦ // done processing audio message ¦ ¦
¦ break ¦ ¦
¦} ¦ ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
The VideoTagHeader has been extended to define additional video codecs, multitrack capabilities, signaling support, and additional miscellaneous enhancements, while ensuring backward compatibility. This extension is termed the ExVideoTagHeader and is designed to be future-proof, allowing for the definition of additional video codecs, features, and corresponding signaling.
During the parsing process, the logic MUST handle unexpected or unknown elements gracefully. Specifically, if any critical signaling or flags (e.g., VideoFrameType, VideoPacketType, or VideoFourCc) are not recognized, the system MUST fail in a controlled and predictable manner.
Important: A single video message for a unique timestamp may include a batch of VideoPacketType values (e.g., multiple TrackId values, Metadata values). When parsing a video message, the bitstream MUST be processed completely to ensure all payload data has been handled.
Table: Extended VideoTagHeader
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦ Description Of Bitstream ¦ Enumerated Types ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦// Check if isExVideoHeader flag is set to 1, signaling enhanced RTMP ¦enum VideoFrameType { ¦
¦// video mode. In this case, VideoCodecId's 4-bit unsigned binary (UB[4]) ¦ // 0 - reserved ¦
¦// should not be interpreted as a codec identifier. Instead, these ¦ KeyFrame = 1, // a seekable frame ¦
¦// UB[4] bits should be interpreted as VideoPacketType. ¦ InterFrame = 2, // a non - seekable frame ¦
¦isExVideoHeader = UB[1] ¦ DisposableInterFrame = 3, // H.263 only ¦
¦videoFrameType = UB[3] as VideoFrameType ¦ GeneratedKeyFrame = 4, // reserved for server use only ¦
¦ ¦ ¦
¦if (isExVideoHeader == 0) { ¦ // If videoFrameType is not ignored and is set to VideoFrameType.Command, ¦
¦ // Utilize the VideoCodecId values and the bitstream description ¦ // the payload will not contain video data. Instead, (Ex)VideoTagHeader ¦
¦ // as defined in the legacy [FLV] specification. Refer to this ¦ // will be followed by a UI8, representing the following meanings: ¦
¦ // version for the proper implementation details. ¦ // ¦
¦ videoCodecId = UB[4] as VideoCodecId ¦ // 0 = Start of client-side seeking video frame sequence ¦
¦ ¦ // 1 = End of client-side seeking video frame sequence ¦
¦ if (videoFrameType == VideoFrameType.Command) { ¦ // ¦
¦ videoCommand = UI8 as VideoCommand ¦ // frameType is ignored if videoPacketType is VideoPacketType.MetaData ¦
¦ } ¦ Command = 5, // video info / command frame ¦
¦} ¦ ¦
¦ ¦ // 6 = reserved ¦
¦ ¦ // 7 = reserved ¦
¦ ¦} ¦
¦ ¦ ¦
¦ ¦enum VideoCommand { ¦
¦ ¦ StartSeek = 0, ¦
¦ ¦ EndSeek = 1, ¦
¦ ¦ ¦
¦ ¦ // 0x03 = reserved ¦
¦ ¦ // ... ¦
¦ ¦ // 0xff = reserved ¦
¦ ¦} ¦
¦ ¦ ¦
¦ ¦enum VideoCodecId { ¦
¦ ¦ // These values remain as they were in the legacy [FLV] specification. ¦
¦ ¦ // If the IsExVideoHeader flag is set, we switch into ¦
¦ ¦ // FOURCC video mode defined in the VideoFourCc enumeration. ¦
¦ ¦ // This means that VideoCodecId (UB[4] bits) is not interpreted ¦
¦ ¦ // as a codec identifier. Instead, these UB[4] bits are ¦
¦ ¦ // interpreted as VideoPacketType. ¦
¦ ¦ ¦
¦ ¦ ¦
¦ ¦ // 0 - Reserved ¦
¦ ¦ // 1 - Reserved ¦
¦ ¦ SorensonH263 = 2, ¦
¦ ¦ Screen = 3, ¦
¦ ¦ On2VP6 = 4, ¦
¦ ¦ On2VP6A = 5, // with alpha channel ¦
¦ ¦ ScreenV2 = 6, ¦
¦ ¦ Avc = 7, ¦
¦ ¦ // 8 - Reserved ¦
¦ ¦ // ... ¦
¦ ¦ // 15 - Reserved ¦
¦ ¦} ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦ ExVideoTagHeader Section ¦
¦ note: ExVideoTagHeader is present if IsExVideoHeader flag is set. ¦
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
¦ Description Of Bitstream ¦ Enumerated Types ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦// ¦enum VideoPacketType { ¦
¦// process ExVideoTagHeader ¦ SequenceStart = 0, ¦
¦// ¦ CodedFrames = 1, ¦
¦processVideoBody = false ¦ SequenceEnd = 2, ¦
¦if (isExVideoHeader == 1) { ¦ ¦
¦ processVideoBody = true ¦ // CompositionTime Offset is implicitly set to zero. This optimization ¦
¦ ¦ // avoids transmitting an SI24 composition time value of zero over the wire. ¦
¦ // Interpret UB[4] bits as VideoPacketType instead of sound rate, size, and type. ¦ // See the ExVideoTagBody section below for corresponding pseudocode. ¦
¦ videoPacketType = UB[4] as VideoPacketType // at byte boundary after this read ¦ CodedFramesX = 3, ¦
¦ ¦ ¦
¦ // Process each ModEx data packet ¦ // ExVideoTagBody does not contain video data. Instead, it contains ¦
¦ while (videoPacketType == VideoPacketType.ModEx) { ¦ // an AMF-encoded metadata. Refer to the Metadata Frame section for ¦
¦ // Determine the size of the packet ModEx data (ranging from 1 to 256 bytes) ¦ // an illustration of its usage. For example, the metadata might include ¦
¦ modExDataSize = UI8 + 1 ¦ // HDR information. This also enables future possibilities for expressing ¦
¦ ¦ // additional metadata meant for subsequent video sequences. ¦
¦ // If maximum 8-bit size is not sufficient, use a 16-bit value ¦ // ¦
¦ if (modExDataSize == 256) { ¦ // If VideoPacketType.Metadata is present, the FrameType flags ¦
¦ modExDataSize = UI16 + 1; ¦ // at the top of this table should be ignored. ¦
¦ } ¦ Metadata = 4, ¦
¦ ¦ ¦
¦ // Fetch the packet ModEx data based on its determined size ¦ // Carriage of bitstream in MPEG-2 TS format ¦
¦ modExData = UI8[modExDataSize] ¦ // ¦
¦ ¦ // PacketTypeSequenceStart and PacketTypeMPEG2TSSequenceStart ¦
¦ // fetch the VideoPacketOptionType ¦ // are mutually exclusive ¦
¦ videoPacketModExType = UB[4] as VideoPacketModExType ¦ MPEG2TSSequenceStart = 5, ¦
¦ ¦ ¦
¦ // Update videoPacketType ¦ // Turns on video multitrack mode ¦
¦ videoPacketType = UB[4] as VideoPacketType // at byte boundary after this read ¦ Multitrack = 6, ¦
¦ ¦ ¦
¦ if (videoPacketModExType == VideoPacketModExType.TimestampOffsetNano) { ¦ // ModEx is a special signal within the VideoPacketType enum that ¦
¦ // This block processes TimestampOffsetNano to enhance RTMP timescale ¦ // serves to both modify and extend the behavior of the current packet. ¦
¦ // accuracy and compatibility with formats like MP4, M2TS, and Safari's ¦ // When this signal is encountered, it indicates the presence of ¦
¦ // Media Source Extensions. It ensures precise synchronization without ¦ // additional modifiers or extensions, requiring further processing to ¦
¦ // altering core RTMP timestamps, applying only to the current media ¦ // adjust or augment the packet's functionality. ModEx can be used to ¦
¦ // message. These adjustments enhance synchronization and timing ¦ // introduce new capabilities or modify existing ones, such as ¦
¦ // accuracy in media messages while preserving the core RTMP timestamp ¦ // enabling support for high-precision timestamps or other advanced ¦
¦ // integrity. ¦ // features that enhance the base packet structure. ¦
¦ // ¦ ModEx = 7, ¦
¦ // NOTE: ¦ ¦
¦ // - 1 millisecond (ms) = 1,000,000 nanoseconds (ns). ¦ // 8 - Reserved ¦
¦ // - Maximum value representable with 20 bits is 1,048,575 ns ¦ // ... ¦
¦ // (just over 1 ms), allowing precise sub-millisecond adjustments. ¦ // 14 - reserved ¦
¦ // - modExData must be at least 3 bytes, storing values up to 999,999 ns. ¦ // 15 - reserved ¦
¦ videoTimestampNanoOffset = bytesToUI24(modExData) ¦} ¦
¦ ¦ ¦
¦ // TODO: Integrate this nanosecond offset into timestamp management ¦enum VideoPacketModExType { ¦
¦ // to accurately adjust the presentation time. ¦ TimestampOffsetNano = 0, ¦
¦ } ¦ ¦
¦ } ¦ // ... ¦
¦ ¦ // 14 - reserved ¦
¦ if ( ¦ // 15 - reserved ¦
¦ videoPacketType != VideoPacketType.Metadata && ¦} ¦
¦ videoFrameType == VideoFrameType.Command ¦ ¦
¦ ) { ¦enum VideoFourCc { ¦
¦ videoCommand = UI8 as VideoCommand ¦ // ¦
¦ ¦ // Valid FOURCC values for signaling support of video codecs ¦
¦ // ExVideoTagBody has no payload if we got here. ¦ // in the enhanced FourCC pipeline. In this context, support ¦
¦ // Set boolean to not try to process the video body. ¦ // for a FourCC codec MUST be signaled via the enhanced ¦
¦ processVideoBody = false ¦ // "connect" command. ¦
¦ } else if (videoPacketType == VideoPacketType.Multitrack) { ¦ // ¦
¦ isVideoMultitrack = true; ¦ ¦
¦ videoMultitrackType = UB[4] as AvMultitrackType ¦ Vp8 = makeFourCc("vp08"), ¦
¦ ¦ Vp9 = makeFourCc("vp09"), ¦
¦ // Fetch VideoPacketType for all video tracks in the video message. ¦ Av1 = makeFourCc("av01"), ¦
¦ // This fetch MUST not result in a VideoPacketType.Multitrack ¦ Avc = makeFourCc("avc1"), ¦
¦ videoPacketType = UB[4] as VideoPacketType ¦ Hevc = makeFourCc("hvc1"), ¦
¦ ¦} ¦
¦ if (videoMultitrackType != AvMultitrackType.ManyTracksManyCodecs) { ¦ ¦
¦ // The tracks are encoded with the same codec. Fetch the FOURCC for them ¦enum AvMultitrackType { ¦
¦ videoFourCc = FOURCC as VideoFourCc ¦ // ¦
¦ } ¦ // Used by audio and video pipeline ¦
¦ } else { ¦ // ¦
¦ videoFourCc = FOURCC as VideoFourCc ¦ ¦
¦ } ¦ OneTrack = 0, ¦
¦} ¦ ManyTracks = 1, ¦
¦ ¦ ManyTracksManyCodecs = 2, ¦
¦ ¦ ¦
¦ ¦ // 3 - Reserved ¦
¦ ¦ // ... ¦
¦ ¦ // 15 - Reserved ¦
¦ ¦} ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
¦ ExVideoTagBody Section ¦
¦ Note: This ExVideoTagBody format is signaled by the presence of ExVideoTagHeader and if videoCommand has not been set (see VideoFrameType description) ¦
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
¦ Description Of Bitstream ¦
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
¦// ¦
¦// process ExVideoTagBody ¦
¦// ¦
¦while (processVideoBody) { ¦
¦ if (isVideoMultitrack) { ¦
¦ if (videoMultitrackType == AvMultitrackType.ManyTracksManyCodecs) { ¦
¦ // Each track has a codec assigned to it. Fetch the FOURCC for the next track. ¦
¦ videoFourCc = FOURCC as VideoFourCc ¦
¦ } ¦
¦ ¦
¦ // Track Ordering: ¦
¦ // ¦
¦ // For identifying the highest priority (a.k.a., default track) ¦
¦ // or highest quality track, it is RECOMMENDED to use trackId ¦
¦ // set to zero. For tracks of lesser priority or quality, use ¦
¦ // multiple instances of trackId with ascending numerical values. ¦
¦ // The concept of priority or quality can have multiple ¦
¦ // interpretations, including but not limited to bitrate, ¦
¦ // resolution, default angle, and language. This recommendation ¦
¦ // serves as a guideline intended to standardize track numbering ¦
¦ // across various applications. ¦
¦ videoTrackId = UI8 ¦
¦ ¦
¦ if (videoMultitrackType != AvMultitrackType.OneTrack) { ¦
¦ // The `sizeOfVideoTrack` specifies the size in bytes of the ¦
¦ // current track that is being processed. This size starts ¦
¦ // counting immediately after the position where the `sizeOfVideoTrack` ¦
¦ // value is located. You can use this value as an offset to locate the ¦
¦ // next video track in a multitrack system. The data pointer is ¦
¦ // positioned immediately after this field. Depending on the MultiTrack ¦
¦ // type, the offset points to either a `fourCc` or a `trackId.` ¦
¦ sizeOfVideoTrack = UI24 ¦
¦ } ¦
¦ } ¦
¦ ¦
¦ if (videoPacketType == VideoPacketType.Metadata) { ¦
¦ // The body does not contain video data; instead, it consists of AMF-encoded ¦
¦ // metadata. The metadata is represented by a series of [name, value] pairs. ¦
¦ // Currently, the only defined [name, value] pair is ["colorInfo", Object]. ¦
¦ // See the Metadata Frame section for more details on this object. ¦
¦ // ¦
¦ // For a deeper understanding of the encoding, please refer to the descriptions ¦
¦ // of SCRIPTDATA and SCRIPTDATAVALUE in the FLV file specification. ¦
¦ videoMetadata = [VideoMetadata] ¦
¦ } ¦
¦ ¦
¦ if (videoPacketType == VideoPacketType.SequenceEnd) { ¦
¦ // signals end of sequence ¦
¦ } ¦
¦ ¦
¦ if (videoPacketType == VideoPacketType.SequenceStart) { ¦
¦ if (videoFourCc == VideoFourCc.Vp8) { ¦
¦ // body contains a VP8 configuration record to start the sequence ¦
¦ vp8Header = [VPCodecConfigurationRecord] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Vp9) { ¦
¦ // body contains a VP9 configuration record to start the sequence ¦
¦ vp9Header = [VPCodecConfigurationRecord] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Av1) { ¦
¦ // body contains a configuration record to start the sequence ¦
¦ av1Header = [AV1CodecConfigurationRecord] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Avc) { ¦
¦ // body contains a configuration record to start the sequence. ¦
¦ // See ISO/IEC 14496-15:2019, 5.3.4.1 for the description of ¦
¦ // the AVCDecoderConfigurationRecord. ¦
¦ avcHeader = [AVCDecoderConfigurationRecord] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Hevc) { ¦
¦ // body contains a configuration record to start the sequence. ¦
¦ // See ISO/IEC 14496-15:2022, 8.3.3.2 for the description of ¦
¦ // the HEVCDecoderConfigurationRecord. ¦
¦ hevcHeader = [HEVCDecoderConfigurationRecord] ¦
¦ } ¦
¦ } ¦
¦ ¦
¦ if (videoPacketType == VideoPacketType.MPEG2TSSequenceStart) { ¦
¦ if (videoFourCc == VideoFourCc.Av1) { ¦
¦ // body contains a video descriptor to start the sequence ¦
¦ av1Header = [AV1VideoDescriptor] ¦
¦ } ¦
¦ } ¦
¦ ¦
¦ if (videoPacketType == VideoPacketType.CodedFrames) { ¦
¦ if (videoFourCc == VideoFourCc.Vp8) { ¦
¦ // body contains series of coded full frames ¦
¦ vp8CodedData = [Vp8CodedData] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Vp9) { ¦
¦ // body contains series of coded full frames ¦
¦ vp9CodedData = [Vp9CodedData] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Av1) { ¦
¦ // body contains one or more OBUs representing a single temporal unit ¦
¦ av1CodedData = [Av1CodedData] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Avc) { ¦
¦ // See ISO/IEC 14496-12:2015, 8.6.1 for the description of the composition ¦
¦ // time offset. The offset in an FLV file is always in milliseconds. ¦
¦ compositionTimeOffset = SI24 ¦
¦ ¦
¦ // Body contains one or more NALUs; full frames are required ¦
¦ avcCodedData = [AvcCodedData] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Hevc) { ¦
¦ // See ISO/IEC 14496-12:2015, 8.6.1 for the description of the composition ¦
¦ // time offset. The offset in an FLV file is always in milliseconds. ¦
¦ compositionTimeOffset = SI24 ¦
¦ ¦
¦ // Body contains one or more NALUs; full frames are required ¦
¦ hevcData = [HevcCodedData] ¦
¦ } ¦
¦ } ¦
¦ ¦
¦ if (videoPacketType == VideoPacketType.CodedFramesX) { ¦
¦ // compositionTimeOffset is implied to equal zero. This is ¦
¦ // an optimization to save putting SI24 value on the wire ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Avc) { ¦
¦ // Body contains one or more NALUs; full frames are required ¦
¦ avcCodedData = [AvcCodedData] ¦
¦ } ¦
¦ ¦
¦ if (videoFourCc == VideoFourCc.Hevc) { ¦
¦ // Body contains one or more NALUs; full frames are required ¦
¦ hevcData = [HevcCodedData] ¦
¦ } ¦
¦ } ¦
¦ ¦
¦ if ( ¦
¦ isVideoMultitrack && ¦
¦ videoMultitrackType != AvMultitrackType.OneTrack && ¦
¦ positionDataPtrToNextVideoTrack(sizeOfVideoTrack) ¦
¦ ) { ¦
¦ // TODO: need to implement positionDataPtrToNextVideoTrack() ¦
¦ continue ¦
¦ } ¦
¦ ¦
¦ // done processing video message ¦
¦ break ¦
¦} ¦
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
To support various types of video metadata, the legacy [FLV] specification has been enhanced. The VideoTagHeader has been extended to define a new VideoPacketType.Metadata (see ExVideoTagHeader table in Enhanced Video section) whose payload will contain an AMF-encoded metadata. The metadata will be represented by a series of [name, value] pairs. For now the only defined [name, value] pair is [“colorInfo”, Object]. When leveraging VideoPacketType.Metadata to deliver HDR metadata, the metadata MUST be sent prior to the video sequence, scene, frame or such that it affects. Each time a new colorInfo object is received it invalidates and replaces the current one. To reset to the original color state you can send colorInfo with a value of Undefined (the RECOMMENDED approach) or an empty object (i.e., {}).
It is intentional to leverage a video message to deliver VideoPacketType.Metadata instead of other [RTMP] Message types. One benefit of leveraging a video message is to avoid any racing conditions between video messages and other RTMP message types. Given this, once your colorInfo object is parsed, the read values MUST be processed in time to affect the first frame of the video section which follows the colorInfo object.
The colorInfo object provides HDR metadata to enable a higher quality image source conforming to BT.2020 (a.k.a., Rec. 2020) standard. The properties of the colorInfo object, which are encoded in an AMF message format, are defined below.
Note:
- For content creators: Whenever it behooves to add video hint information via metadata (e.g., HDR) to the FLV container it is RECOMMENDED to add it via VideoPacketType.Metadata. This may be done in addition (or instead) to encoding the metadata directly into the codec bitstream.
- The object encoding format (i.e., AMF0 or AMF3) is signaled during the connect command.
type ColorInfo = {
colorConfig: {
// number of bits used to record the color channels for each pixel
bitDepth: number, // SHOULD be 8, 10 or 12
//
// colorPrimaries, transferCharacteristics and matrixCoefficients are defined
// in ISO/IEC 23091-4/ITU-T H.273. The values are an index into
// respective tables which are described in "Colour primaries",
// "Transfer characteristics" and "Matrix coefficients" sections.
// It is RECOMMENDED to provide these values.
//
// indicates the chromaticity coordinates of the source color primaries
colorPrimaries: number, // enumeration [0-255]
// opto-electronic transfer characteristic function (e.g., PQ, HLG)
transferCharacteristics: number, // enumeration [0-255]
// matrix coefficients used in deriving luma and chroma signals
matrixCoefficients: number, // enumeration [0-255]
},
hdrCll: {
//
// maximum value of the frame average light level
// (in 1 cd/m2) of the entire playback sequence
//
maxFall: number, // [0.0001-10000]
//
// maximum light level of any single pixel (in 1 cd/m2)
// of the entire playback sequence
//
maxCLL: number, // [0.0001-10000]
},
//
// The hdrMdcv object defines mastering display (i.e., where
// creative work is done during the mastering process) color volume (a.k.a., mdcv)
// metadata which describes primaries, white point and min/max luminance. The
// hdrMdcv object SHOULD be provided.
//
// Specification of the metadata along with its ranges adhere to the
// ST 2086:2018 - SMPTE Standard (except for minLuminance see
// comments below)
//
hdrMdcv: {
//
// Mastering display color volume (mdcv) xy Chromaticity Coordinates within CIE
// 1931 color space.
//
// Values SHALL be specified with four decimal places. The x coordinate SHALL
// be in the range [0.0001, 0.7400]. The y coordinate SHALL be
// in the range [0.0001, 0.8400].
//
redX: number,
redY: number,
greenX: number,
greenY: number,
blueX: number,
blueY: number,
whitePointX: number,
whitePointY: number,
//
// max/min display luminance of the mastering display (in 1 cd/m2 ie. nits)
//
// note: ST 2086:2018 - SMPTE Standard specifies minimum display mastering
// luminance in multiples of 0.0001 cd/m2.
//
// For consistency we specify all values
// in 1 cd/m2. Given that a hypothetical perfect screen has a peak brightness
// of 10,000 nits and a black level of .0005 nits we do not need to
// switch units to 0.0001 cd/m2 to increase resolution on the lower end of the
// minLuminance property. The ranges (in nits) mentioned below suffice
// the theoretical limit for Mastering Reference Displays and adhere to the
// SMPTE ST 2084 standard (a.k.a., PQ) which is capable of representing full gamut
// of luminance level.
//
maxLuminance: number, // [5-10000]
minLuminance: number, // [0.0001-5]
},
}
Table: Flag values for the videoFunction property
+-----------------------------------------------------+-------------------------------------------------------------------+-------------+
¦ Function Flag ¦ Usage ¦ Value ¦
+-----------------------------------------------------+-------------------------------------------------------------------+-------------+
¦SUPPORT_VID_CLIENT_SEEK ¦Indicates that the client can perform frame-accurate seeks. ¦ 0x0001 ¦
+-----------------------------------------------------+-------------------------------------------------------------------+-------------+
¦SUPPORT_VID_CLIENT_HDR ¦Indicates that the client has support for HDR video. Note: Implies ¦ 0x0002 ¦
¦ ¦support for colorInfo Object within VideoPacketType.Metadata. ¦ ¦
+-----------------------------------------------------+-------------------------------------------------------------------+-------------+
¦SUPPORT_VID_CLIENT_VIDEO_PACKET_TYPE_METADATA ¦Indicates that the client has support for VideoPacketType.Metadata.¦ 0x0004 ¦
¦ ¦See Metadata Frame section for more detail. ¦ ¦
+-----------------------------------------------------+-------------------------------------------------------------------+-------------+
¦SUPPORT_VID_CLIENT_LARGE_SCALE_TILE ¦The large-scale tile allows the decoder to extract only an ¦ ¦
¦ ¦interesting section in a frame without the need to decompress the ¦ ¦
¦ ¦entire frame. Support for this feature is not required and is ¦ 0x0008 ¦
¦ ¦assumed to not be implemented by the client unless this property is¦ ¦
¦ ¦present and set to true. ¦ ¦
+-----------------------------------------------------+-------------------------------------------------------------------+-------------+
E-RTMP has introduced support for multitrack streaming, offering increased flexibility in audio and video streaming through the use of a track index (a.k.a., trackId). This feature allows for the serialization of multiple tracks over a single [RTMP] connection and stream channel.
It’s important to note that multitrack support is designed to augment, not replace, the option of using multiple streams for streaming. While both multiple streams and multitrack can potentially address the same use cases, the choice between them will depend on the specific capabilities of your RTMP implementation and requirements. In certain cases, multitrack may not be the most efficient option.
function ScriptMethodName(arg: Object) {
console.log("Invoking a script with trackId: ", arg.trackId);
}
Multitrack capabilities in E-RTMP offer a wide range of possibilities, from adaptive bitrate streaming to multi-language support. While this document doesn’t prescribe specific encoding rules or manifest metadata, it aims to guide you through the complexities of leveraging multitrack features. Consider various parameters like codecs, frame rates, key frames, sampling rates, and resolutions to meet your unique objectives. Remember, media encoding settings are separate from E-RTMP configurations.
When a client connects to an E-RTMP server, it sends a connect command to the server. The command structure sent from the client to the server contains a Command Object, comprising name-value pairs. This is where the client indicates the audio and video codecs it supports. To declare support for newly defined codecs or other enhancements supported by the client, this name-value pair list must be extended. Below is the description of a new name-value pair used in the Command Object of the connect command.
Table: New name-value pair that can be set in the Command Object
+---------------------+---------------------------+-------------------------------------------------------------------+------------------------------------------------------+
¦ Property ¦ Type ¦ Description ¦ Example Value ¦
+---------------------+---------------------------+-------------------------------------------------------------------+------------------------------------------------------+
¦fourCcList ¦Strict Array of strings ¦Used to declare the enhanced list of supported codecs when ¦e.g., 1 ¦
¦ ¦ ¦connecting to the server. The fourCcList property is a strict array¦[ ¦
¦ ¦ ¦of dense ordinal indices. Each entry in the array is of string ¦ "av01", "vp09", "vp08", "Hvc1", ¦
¦ ¦ ¦type, specifically a [FourCC] value (i.e., a string that is a ¦ "Avc1", "ac-3", "ec-3", "Opus", ¦
¦ ¦ ¦sequence of four bytes), representing a supported audio/video ¦ ".mp3", "fLaC", "Aac" ¦
¦ ¦ ¦codec. ¦] ¦
¦ ¦ ¦ ¦ ¦
¦ ¦ ¦In the context of E-RTMP, clients capable of receiving any codec ¦e.g., 2 ¦
¦ ¦ ¦(e.g., recorders or forwarders) may set a FourCC value to the ¦[ "*" ] ¦
¦ ¦ ¦wildcard value of "*". ¦ ¦
¦ ¦ ¦ ¦ ¦
¦ ¦ ¦Note: The fourCcList property was introduced in the original ¦ ¦
¦ ¦ ¦E-RTMP. Going forward, it is RECOMMENDED on the client side to ¦ ¦
¦ ¦ ¦switch to using the [audio|video]FourCcInfoMap properties described¦ ¦
¦ ¦ ¦below. On the server side, we RECOMMEND supporting both fourCcList ¦ ¦
¦ ¦ ¦and [audio|video]FourCcInfoMap properties to handle cases where a ¦ ¦
¦ ¦ ¦client has not yet transitioned to using the new properties. ¦ ¦
+---------------------+---------------------------+-------------------------------------------------------------------+------------------------------------------------------+
¦videoFourCcInfoMap, ¦Object ¦The [audio|video]FourCcInfoMap properties are designed to enable ¦e.g., 1 ¦
¦audioFourCcInfoMap ¦ ¦setting capability flags for each supported codec in the context of¦videoFourCcInfoMap = { ¦
¦ ¦ ¦E-RTMP streaming. A FourCC key is a four-character code used to ¦ // can forward any video codec ¦
¦ ¦ ¦specify a video or audio codec. The names of the object properties ¦ "*": FourCcInfoMask.CanForward, ¦
¦ ¦ ¦are strings that correspond to these FourCC keys. Each object ¦ ¦
¦ ¦ ¦property holds a numeric value that represents a set of capability ¦ // can decode, encode, forward (see "*") VP9 codec ¦
¦ ¦ ¦flags. These flags can be combined using a Bitwise OR operation. ¦ "vp09": FourCcInfoMask.CanDecode | ¦
¦ ¦ ¦ ¦ FourCcInfoMask.CanEncode, ¦
¦ ¦ ¦Refer to the enum FourCcInfoMask for the available flags: ¦} ¦
¦ ¦ ¦ ¦ ¦
¦ ¦ ¦enum FourCcInfoMask { ¦e.g., 2 ¦
¦ ¦ ¦ CanDecode = 0x01, ¦audioFourCcInfoMap = { ¦
¦ ¦ ¦ CanEncode = 0x02, ¦ // can forward any audio codec ¦
¦ ¦ ¦ CanForward = 0x04, ¦ "*": FourCcInfoMask.CanForward, ¦
¦ ¦ ¦} ¦ ¦
¦ ¦ ¦ ¦ // can decode, encode, forward (see "*") Opus codec ¦
¦ ¦ ¦Capability flags define specific functionalities, such as the ¦ "Opus": FourCcInfoMask.CanDecode | ¦
¦ ¦ ¦ability to decode, encode, or forward. ¦ FourCcInfoMask.CanEncode, ¦
¦ ¦ ¦ ¦} ¦
¦ ¦ ¦A FourCC key set to the wildcard character "*" acts as a catch-all ¦ ¦
¦ ¦ ¦for any codec. When this wildcard key exists, it overrides the ¦ ¦
¦ ¦ ¦flags set on properties for specific codecs. For example, if the ¦ ¦
¦ ¦ ¦flag for the "*" property is set to FourCcInfoMask.CanForward, all ¦ ¦
¦ ¦ ¦codecs will be forwarded regardless of individual flags set on ¦ ¦
¦ ¦ ¦their specific properties. ¦ ¦
+---------------------+---------------------------+-------------------------------------------------------------------+------------------------------------------------------+
¦capsEx ¦number ¦The value represents capability flags which can be combined via a ¦CapsExMask.Reconnect | CapsExMask.Multitrack ¦
¦ ¦ ¦Bitwise OR to indicate which extended set of capabilities (i.e., ¦ ¦
¦ ¦ ¦beyond the legacy [RTMP] specification) are supported via E-RTMP. ¦ ¦
¦ ¦ ¦See enum CapsExMask for the enumerated values representing the ¦ ¦
¦ ¦ ¦assigned bits. If the extended capabilities are expressed elsewhere¦ ¦
¦ ¦ ¦they will not appear here (e.g., FourCC, HDR or ¦ ¦
¦ ¦ ¦VideoPacketType.Metadata support is not expressed in this ¦ ¦
¦ ¦ ¦property). ¦ ¦
¦ ¦ ¦ ¦ ¦
¦ ¦ ¦When a specific flag is encountered: ¦ ¦
¦ ¦ ¦- The implementation might fully handle the feature by applying the¦ ¦
¦ ¦ ¦appropriate logic. ¦ ¦
¦ ¦ ¦- Alternatively, if full support is not available, the ¦ ¦
¦ ¦ ¦implementation can still parse the bitstream correctly, ensuring ¦ ¦
¦ ¦ ¦graceful degradation. This allows continued operation, even with ¦ ¦
¦ ¦ ¦reduced functionality. ¦ ¦
¦ ¦ ¦ ¦ ¦
¦ ¦ ¦enum CapsExMask { ¦ ¦
¦ ¦ ¦ Reconnect = 0x01, // Support for reconnection ¦ ¦
¦ ¦ ¦ Multitrack = 0x02, // Support for multitrack ¦ ¦
¦ ¦ ¦ ModEx = 0x04, // Can parse ModEx signal ¦ ¦
¦ ¦ ¦ TimestampNanoOffset = 0x08, // Support for nano offset ¦ ¦
¦ ¦ ¦} ¦ ¦
+---------------------+---------------------------+-------------------------------------------------------------------+------------------------------------------------------+
As you can see, the client declares to the server what enhancements it supports. The server responds with a command, either _result or _error, to indicate whether the response is a result or an error. During the response, the server provides some properties within an Object as one of the parameters. This is where the server needs to state its support for E-RTMP. The server SHOULD state its support via attributes such as videoFourCcInfoMap, capsEx, and similar properties.
Action Message Format (AMF) is a compact binary format used to serialize SCRIPTDATA. It has two specifications: [AMF0] and [AMF3]. AMF3 improves on AMF0 by optimizing the payload size on the wire. To understand the full scope of these optimizations, please refer to the AMF0 and AMF3 specifications.
Supporting AMF3 in the [RTMP] and [FLV] is beneficial due to its optimization over AMF0. Understanding the ecosystem is crucial before adding AMF3 support to RTMP or FLV.
To enable support for AMF3 in RTMP, the following steps are REQUIRED:
RTMP has had AMF3 as part of its specification for some time now. During the handshake, the client declares whether it has support for AMF3.
Prior to Y2023, the FLV file format did not have AMF3 as part of its SCRIPTDATA specification. To ensure support for AMF3 in FLV:
Established, pre E-RTMP, specifications state the following:
Unfortunately, the above is incomplete and may be somewhat unclear. To clarify, in addition to the above:
There is no need for a version bump within E-RTMP for either the [RTMP] handshake sequence or the FLV header file version field. All of the enhancements are triggered via the newly defined additions to the bitstream format which don’t break legacy implementations. E-RTMP is self describing in its capabilities.
Adobe Systems Inc. “Action Message Format – AMF 0”, June 2006,
<https://veovera.github.io/enhanced-rtmp/docs/legacy/amf0-file-format-spec.pdf>.
Adobe Systems Inc. “Action Message Format – AMF 3”, June 2006,
<https://veovera.github.io/enhanced-rtmp/docs/legacy/amf3-file-format-spec.pdf>.
Deprecation,
<https://en.wikipedia.org/wiki/Deprecation>.
“Adobe Flash Video File Format Specification, Version 10.1”, August 2010,
<https://veovera.github.io/enhanced-rtmp/docs/legacy/video-file-format-v10-1-spec.pdf>.
A sequence of four bytes (typically ASCII) used to uniquely identify data formats,
<https://en.wikipedia.org/wiki/FourCC>.
Legacy specifications for the RTMP solution,
<https://veovera.github.io/enhanced-rtmp/docs/legacy/>.
Bradner, S., “Key words for use in RFCs to Indicate
Requirement Levels”, BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Leiba, B., “Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words”, BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017,
<https://www.rfc-editor.org/info/rfc8174>.
Parmar, H., Ed. and M. Thornburgh, Ed., “Adobe’s Real Time Messaging Protocol”, December 2012,
<https://veovera.github.io/enhanced-rtmp/docs/legacy/rtmp-v1-0-spec.pdf>.
“Adobe Flash Video File Format Specification, Version 10.1”, August 2010,
<https://veovera.org/docs/legacy/video-file-format-v10-1-spec.pdf#page=80>.
“Adobe Flash Video File Format Specification, Version 10.1”, August 2010,
<https://veovera.org/docs/legacy/video-file-format-v10-1-spec.pdf#page=80>.
W3C, “WebCodecs”
<https://www.w3.org/TR/webcodecs/>.
The revision history section of this document is maintained to provide a clear and concise record of significant changes throughout its development phases, such as alpha, beta, and release. Here are the key points regarding how we manage this history:
These guidelines ensure that the revision history in the specification document remains focused, relevant, and easy to navigate, while the complete history of all changes is securely stored and accessible in GitHub.
Table: Revision history
+----------------------+----------------------------------------------------------------------------------------+
¦ Document Revision History ¦
+---------------------------------------------------------------------------------------------------------------+
¦ Date ¦ Comments ¦
+----------------------+----------------------------------------------------------------------------------------+
¦ v2-2024-10-18-b1 ¦ 1. Added Version Stage Definitions Section - Alpha, Beta, Release ¦
¦ ¦ 2. Move E-RTMP v2 specification into Beta stage!!! ¦
+----------------------+----------------------------------------------------------------------------------------+
¦ v2-...-b* ¦ 1. See GitHub for revision history. ¦
+----------------------+----------------------------------------------------------------------------------------+