December, 2004
MP4LIVE INTERNALS
Program Control
Media Flows and Nodes
Media Sources
Media Codecs
Media Frames
Media Sinks
List of Media Sources
List of Codecs
List of Media Sinks
This document provides an overview of the internals of the mp4live application for those who intend to modify mp4live. See the README for information on using mp4live.
The control flow of mp4live is easiest to understand if one first considers the no GUI (--headless) mode of operation. In this mode, mp4live reads a configuration file, and then creates a "media flow" which will use this configuration. The media flow is started which causes the appropriate threads of execution to be started. The main program thread then sleeps for the desired duration, and upon awaking tells the media flow to stop. Program execution then ends.
When the mp4live GUI is active the main program thread runs the GUI code. In this case the GUI actions cause configuration information to be changed, and the media flow to be started and stopped.
The "media flow" (media_flow.h) is the top level concept that organizes the processing activities of mp4live. A media flow is a collection of media nodes (media_node.h), with forwarding rules between the nodes. Each media node is a thread within the mp4live process. Each thread has a message queue which can be used to control the threads. Currently, messages are typically only used for starting and stopping the threads, and notifying media sinks when new media frames from the media sources have become available. Other coordination between threads can be achieved via the shared configuration information data.
A "media source" (media_source.h) is a media node that acquires raw media frames and processes them according to the target output(s) of the current flow configuration. Currently, a media source may produce either audio, video, or both. It may produce multiple media frames for an input frame. For example, a video source may generate a reconstructed YUV video frame, and an MPEG-4 encoded video frame.
Since much of the media processing is shared regardless of the details of media acquisition, the base media source class (media_source.cpp) contains the central code to encode media and maintain timing and synchronization. The generic media processing is somewhat over-engineered at present to allow for transcoding scenarios where the source is pre-existing encoded media instead of a capture device that can be configured to match the desired output.
The generic process for audio includes:
The generic process for video includes:
There are two defined types of media codecs (encoders), audio and video. These are defined in audio_encoder.h and video_encoder.h as abstract classes that provide simple generalized interfaces to the encoder libraries. The media codec classes are used by the media sources to invoke the media encoders to transform raw, uncompressed media into encoded media.
Each supported media codec derives a class from the appropriate abstract class, and provides code to map the generic interface to that provided by the codec library.
Each encoder type also has a number of calls to set various variables for rtp transmission or saving to an mp4 file. These are also defined in audio_encoder.h and video_encoder.h
Encoders can be added dynamically by adding a new video encoder class, and adding to the video_encoder_tables.cpp, or adding a new audio encoder class and adding to the audio_encoder_tables.cpp. You will also want to add the correct hooks to each routine in audio_encoder.cpp and video_encoder.cpp.
The output(s) of a media source is a "media frame" (media_frame.h). This is a reference counted structure that contains: a pointer to malloc'ed media data, the media data length, the media type, the timestamp of the media frame, the media frame duration (used in audio only), and the timescale of the media frame duration (ticks per second). If creating your own source, it is imperative to get a synchronized timestamp between sources.
A media source constructs one or more "media frames" during its processing for each acquired media frame. Each one of these output media frames is sent in a message to all the registered sinks of the source. Note that if there are N sinks, N messages are created that all point to 1 media frame, i.e. only 1 copy of the media data exists. The reference count on the media frame ensures that as each media sink "frees" the media frame the reference count is decremented. When the media frame reference count reaches 0, the media frame data is free'ed and the media frame is destroyed.
A "media sink" (media_sink.h) is a media node that receives media frames from a media source, and delivers them to a destination device such as a file or network socket. To date media sinks do very little processing of the received media frames. They exist so that there are independent threads that can buffer media frames and block on writes to destination devices. I.e. it would be very bad if the media sources couldn't keep up with the capture devices which in general can't be "paused" without dropping media. Note that while a media sink is blocked on a write to a destination device, memory is being consumed as newly processed media frames pile up in its receive queue. If the media sink can't catch up with the real-time media flow eventually, then memory exhaustion will occur. Let the developer beware! This is a very real concern if you are planning on adding a sink that will use TCP to a remote system as the destination device. A write timeout should be used to avert an ugly program exit.
Note currently, a registered sink receives all the outputted types of media frame from a media source. The media sink is expected to check the type of the received media frame, and reject (free) any types that it does not want.
V4L | video_v4l_source.cpp Acquires YUV12 images from a V4L (Video For Linux) device |
V4L2 | video_v4l2_source.cpp Acquires YUV12 images from a V4L2 (Video For Linux) device (recommended) |
OSS | audio_oss_source.cpp Acquires PCM16 audio samples from an OSS (Open Sound System) device |
self | loop_source.cpp Acquires raw YUV or PCM from main process |
Lame -> MP3 - audio_lame.cpp, installed from lame version 3.92 or later
FAAC -> AAC - audio_faac.cpp, installed from faac version 1.21 or later
ffmeg-> Mpeg layer 2, AMR NB, AMR WB, installed from ffmpeg (0.4.7 or later)
Xvid -> MPEG4 - video_xvid.cpp, mpeg4ip/lib/xvid, or from xvidcode 0.92 or later
Xvid 1.0 -> MPEG4 - video_xvid10.cpp, from xvidcode 1.0RC3 or later
ffmpeg-> MPEG4, MPEG2, H263 - video_ffmpeg.cpp, installed from ffmpeg version 0.4.7 or later (see main README for installation instructions).
MP4 | file_mp4_recorder.cpp Writes media frames to an mp4 file |
RTP | rtp_transmitter.cpp Transmits media frames via RTP/UDP/IP Implements media specific RTP payloads as defined in IETF RFCs An adjunct to the RTP transmitter is the SDP file writer (sdp_file.cpp) which constructs an SDP file that can be used to tune into the RTP streams |
SDL Previewer | video_sdl_preview.cpp Display video frames to a local video display via the SDL multi-platform library |
Raw Sink | file_raw_sink.cpp Writes raw media frames (YUV12 and PCM16) to a local named pipe This enables sharing of the capture devices between mp4live and another application. |