|
BIFS Tutorial - Part IIIIncluding Media |
|
In this part we will learn how to add textures (image, video) and sound to a scene. In order to do so we will have to understand the notion of media object in MPEG-4 and the associated Object Descriptor used to declare visual (image or vidéo) or audio media in an MPEG-4 scene.
The MPEG-4 standard is defined at ISO/IEC as the "Coding of audio-visual objects" standard. This is not really related to object-oriented software coding, what it really means is that each media (an image, a video, a sound) is considered as an object that can be used and reused, interactively placed in the scene, independently of the coding format of the media (a video object can be encoded as an MPEG-1, MPEG-2, MPEG-4 video stream but can also be encoded using coding techniques developped outside the MPEG comittee). This allows modifying the scene (eg, layout of media objects) without re-encoding the media object or re-encode the media object at lower bitrates without changing the scene. The only constraint on media objects as "known" by the scene is their base type (audio or visual).
A Media Object is described by a set of information transmitted through an object descriptor. This information can be splitted into two distinct sets: information needed by the scene to access and understand the media structure (identifier, time sub-structure or segments) on the one hand, information needed by the terminal to recieve, decode and synchronize the media with other objects.
The identifier of a media object is a binary number called ObjectDescriptorID, but the XMT representation is slightly different, the binary identifier being called binaryID while the ObjectDescriptorID holds a textural identifier to simplify reading of the document. The binary number is used to reference objects in the scene graph. In a same scene, no two objects can have the same binary identifier.
The most important notion of a media object is that its data is transported in one or several streams, called ElementaryStreams in MPEG-4. Some of these streams are media data streams (a video stream) and some are meta-data streams (stream carrying cryptographic data to unlock the object, or description data such as MPEG-7 or MPEG-4 Object Content Information). Usually an object is composed of a single media stream, however there may be cases where an object is composed of several media streams: scalable coding (first stream at low video resolution and enhancement stream for high quality) is one example, alternate coding (one stream per language, one stream per bandwidth, ...) is another one. Each elementary stream is described by a descriptor called ES_Descriptor. An ES_Descriptor also has a binary identifier (ES_ID / binaryID in XMT) unique in the presentation. This identifier is used to access the stream (local storage in MP4, real-time streaming through RTSP, ...). The ES_Descriptor also stores the decoder configuration (DecoderConfigDescriptor) and the synchronization configuration (SLConfigDescriptor). The DecoderConfigDescriptor indicates at least the type of media (streamType) and the type of coding (ObjectTypeIndication). It may also contain information depending on the coding type used (video resolution, audio channels and sampling rate, etc) as binary data in the decSpecInfo descriptor. We won't go into the details of the SLConfigDescriptor for now, we will just use it with '<predefined value="2"/>' attribute as this is the mandatory form for storage in MP4 files.
Here is a simple ObjectDescriptor syntax in XMT:
<ObjectDescriptor objectDescriptorID="TextualIdentifier" binaryID="10">
<Descr>
<esDescr>
<ES_Descriptor ES_ID="StreamtextualIdentifier" binaryID="3">
<decConfigDescr>
<DecoderConfigDescriptor objectTypeIndication="MPEG4Visual" streamType="Visual" >
<decSpecificInfo>
<DecoderSpecificInfo src="data:application/octet-string,%15%08"/>
</decSpecificInfo>
</DecoderConfigDescriptor>
</decConfigDescr>
<slConfigDescr>
<SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor>
</slConfigDescr>
</ES_Descriptor>
</esDescr>
</Descr>
</ObjectDescriptor>The list of media types currently supported in the MPEG-4 standard is available here.
As there are BIFS commands to modify the scene, there are OD commands to modify the current set of media objects available in the presentation allowing:
These commands are packed in Access Units and assigned ComposionTimeStamps, hence allowing to define an OD stream as we have done with BIFS. It is then possible to replace at any time a video by a new one.
Here is an example of an OD command in XMT:
<ObjectDescriptorUpdate>
<OD>
<ObjectDescriptor objectDescriptorID="NewTextualIdentifier" binaryID="11">....<ObjectDescriptor>
</OD>
</ObjectDescriptorUpdate>
In order to bootstrap an MPEG-4 presentation, the terminal must know a few things: where is the BIFS stream containing the scene, is there (and where is it) an OD stream describing objects used by the scene, is the terminal able to understand all coding tools used in the presentation (profiles and levels), and so on . This information is carried in a special descriptor called InitialObjectDescriptor. The InitialObjectDescriptor is an extension of the ObjectDescriptor and must be placed in the Header element of the XMT document.
<InitialObjectDescriptor>
<Descr>
<esDescr>
<ES_Descriptor ES_ID="BIFSStream" binaryID="1">... ...</ES_Descriptor>
<ES_Descriptor ES_ID="ODStream" binaryID="2">... ... </ES_Descriptor>
</esDescr>
</Descr>
</InitialObjectDescriptor>
Through the Object Descriptor Framework, media inclusion in the scene is fairly simple: the binary identifier is used in URL fields of nodes interfacing with media objects. Here are some of these nodes:
BIFS Nodes |
Description |
Anchor | Hyperlinking between MPEG-4 scenes or to external web pages |
AnimationStream | Playback of a BIFS object (command or anim) |
AudioClip | Playback of an audio object |
AudioSource | Playback of an audio object |
Background2D | Display of background images |
ImageTexture | Reference a still image for texturing |
Inline | Includes an (external) MPEG-4 scene in the main scene |
InputSensor | Recieves generic interactions (keyboard, mouse, joystick ...) |
MediaBuffer | Buffering of a media object |
MediaControl | Controls playback of a media object |
MediaSensor | Retrieves information during playback of a media object |
MovieTexture | Reference a video object for texturing |
The XMT syntax used to reference an ObjectDescriptor is:
url="'binaryID'".
Let's see some real-world examples before you get lost...
|
Step 1 : Declare the OD stream:
|
<ES_Descriptor ES_ID="ObjectDescriptorStream" binaryID="2"> <decConfigDescr> <DecoderConfigDescriptor objectTypeIndication="MPEG4Systems1" streamType="ObjectDescriptor"/> </decConfigDescr> <slConfigDescr> <SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor> </slConfigDescr> </ES_Descriptor> |
Step 2 : Adding the ObjectDescriptor for the image:
|
<ObjectDescriptorUpdate> <OD> <ObjectDescriptor objectDescriptorID="JPEGImage" binaryID="10"> <Descr> <esDescr> <ES_Descriptor ES_ID="JPEGStream" binaryID="3"> <decConfigDescr> <DecoderConfigDescriptor objectTypeIndication="JPEG" streamType="Visual"/> </decConfigDescr> <slConfigDescr> <SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor> </slConfigDescr> <StreamSource url="logo_enst.jpg"/> </ES_Descriptor> </esDescr> </Descr> </ObjectDescriptor> </OD> </ObjectDescriptorUpdate> |
Note that the streamType is of type "visual" and the object type is "JPEG". The StreamSource element indicates the media location to the encoder.
Step 3 : Apply a texture:
|
<Shape> <geometry>... ... </geometry> <appearance> <Appearance> <texture><ImageTexture url="'10'"/></texture> <material><Material2D filled="true"/></material> </Appearance> </appearance> </Shape> |
The url field indicates '10' which is the binaryID assigned to the ObjectDescriptor of the image.
NOTE: The special geometry node Bitmap can be used to display the texture; it defines an implicit rectangle whose size is the size of the texture in pixels. It should be used as the prefered way to display images or video since texture mapping is usually a more complex operation than straight pixel blitting.
Let's now add a video in the scene. We will use the NHNT import format used by most MPEG-4 tools available. More information on NHNT is available here. The following files are needed and shall be placed in the same directory than the XMT document.
Step 1 : Adding the OD stream:
See above.
Step 2 : Adding the OD for the video object:
|
<ES_Descriptor ES_ID="VideoStream" binaryID="3"> <decConfigDescr> <DecoderConfigDescriptor objectTypeIndication="MPEG4Visual" streamType="Visual"/> </decConfigDescr> <slConfigDescr> <SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor> </slConfigDescr> <StreamSource url="stouffer.media"/> </ES_Descriptor> |
Step 3 : Apply the texture:
|
<Shape> <geometry>... ... </geometry> <appearance> <Appearance> <texture><MovieTexture url="'10'"/></texture> <material><Material2D filled="true"/></material> </Appearance> </appearance> </Shape> |
The final scene is video1.xmt, video1.bt, video1.mp4.
We will now add some sound to the scene. As above, we will use the NHNT import format and use AAC audio.
Step 1 : Adding the OD stream:
See above.
Step 2 : Adding the OD for the audio object:
|
<ES_Descriptor ES_ID="AudioStream" binaryID="4"> <decConfigDescr> <DecoderConfigDescriptor objectTypeIndication="MPEG4Audio" streamType="Audio"/> </decConfigDescr> <slConfigDescr> <SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor> </slConfigDescr> <StreamSource url="audio.media"/> </ES_Descriptor> |
Step 3 : Adding the sound in the scene:
|
<Sound2D> <source><AudioSource url="'10'"/></source> </Sound2D> |
The final scene is audio1.xmt, audio1.bt, audio1.mp4.
Exercise 12 : Use the JPEG example and OD commands to replace the image with another one.
Exercise 13 : Write a simple scene with a background image instead of a background color
In this part we have seen how the object descriptor framework is used with the BIFS framework to mix synthetic and natural media. We still need to understand how synchronization between objects work to have a richer authoring, but you can already author a simple multimedia presentation with sound, images, video and synthetic objects, a la DVD :)
[ Home ] [ Animation ] [ Including Media ] [ Interactivity ] |