BIFS Tutorial - Part III

Including Media

In this part we will learn how to add textures (image, video) and sound to a scene. In order to do so we will have to understand the notion of media object in MPEG-4 and the associated Object Descriptor used to declare visual (image or vidéo) or audio media in an MPEG-4 scene.


MPEG-4 and Media Objects:

The MPEG-4 standard is defined at ISO/IEC as the "Coding of audio-visual objects" standard. This is not really related to object-oriented software coding, what it really means is that each media (an image, a video, a sound) is considered as an object that can be used and reused, interactively placed in the scene, independently of the coding format of the media (a video object can be encoded as an MPEG-1, MPEG-2, MPEG-4 video stream but can also be encoded using coding techniques developped outside the MPEG comittee). This allows modifying the scene (eg, layout of media objects) without re-encoding the media object or re-encode the media object at lower bitrates without changing the scene. The only constraint on media objects as "known" by the scene is their base type (audio or visual).


The Object Descriptor Framework:

A Media Object is described by a set of information transmitted through an object descriptor. This information can be splitted into two distinct sets: information needed by the scene to access and understand the media structure (identifier, time sub-structure or segments) on the one hand, information needed by the terminal to recieve, decode and synchronize the media with other objects.

The identifier of a media object is a binary number called ObjectDescriptorID, but the XMT representation is slightly different, the binary identifier being called binaryID while the ObjectDescriptorID holds a textural identifier to simplify reading of the document. The binary number is used to reference objects in the scene graph. In a same scene, no two objects can have the same binary identifier.

The most important notion of a media object is that its data is transported in one or several streams, called ElementaryStreams in MPEG-4. Some of these streams are media data streams (a video stream) and some are meta-data streams (stream carrying cryptographic data to unlock the object, or description data such as MPEG-7 or MPEG-4 Object Content Information). Usually an object is composed of a single media stream, however there may be cases where an object is composed of several media streams: scalable coding (first stream at low video resolution and enhancement stream for high quality) is one example, alternate coding (one stream per language, one stream per bandwidth, ...) is another one. Each elementary stream is described by a descriptor called ES_Descriptor. An ES_Descriptor also has a binary identifier (ES_ID / binaryID in XMT) unique in the presentation. This identifier is used to access the stream (local storage in MP4, real-time streaming through RTSP, ...). The ES_Descriptor also stores the decoder configuration (DecoderConfigDescriptor) and the synchronization configuration (SLConfigDescriptor). The DecoderConfigDescriptor indicates at least the type of media (streamType) and the type of coding (ObjectTypeIndication). It may also contain information depending on the coding type used (video resolution, audio channels and sampling rate, etc) as binary data in the decSpecInfo descriptor. We won't go into the details of the SLConfigDescriptor for now, we will just use it with '<predefined value="2"/>' attribute as this is the mandatory form for storage in MP4 files.

Here is a simple ObjectDescriptor syntax in XMT:

<ObjectDescriptor objectDescriptorID="TextualIdentifier" binaryID="10">
<Descr>
<esDescr>
<ES_Descriptor ES_ID="StreamtextualIdentifier" binaryID="3">
<decConfigDescr>
<DecoderConfigDescriptor objectTypeIndication="MPEG4Visual" streamType="Visual" >
<decSpecificInfo>
<DecoderSpecificInfo src="data:application/octet-string,%15%08"/>
</decSpecificInfo>
</DecoderConfigDescriptor>
</decConfigDescr>
<slConfigDescr>
<SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor>
</slConfigDescr>
</ES_Descriptor>
</esDescr>
</Descr>
</ObjectDescriptor>

The list of media types currently supported in the MPEG-4 standard is available here.


Dynamic Modification of ODs:

As there are BIFS commands to modify the scene, there are OD commands to modify the current set of media objects available in the presentation allowing:

These commands are packed in Access Units and assigned ComposionTimeStamps, hence allowing to define an OD stream as we have done with BIFS. It is then possible to replace at any time a video by a new one.

Here is an example of an OD command in XMT:

<ObjectDescriptorUpdate>
<OD>
<ObjectDescriptor objectDescriptorID="NewTextualIdentifier" binaryID="11">....<ObjectDescriptor>
</OD>
</ObjectDescriptorUpdate>


The Initial Object Descriptor (IOD) :

In order to bootstrap an MPEG-4 presentation, the terminal must know a few things: where is the BIFS stream containing the scene, is there (and where is it) an OD stream describing objects used by the scene, is the terminal able to understand all coding tools used in the presentation (profiles and levels), and so on . This information is carried in a special descriptor called InitialObjectDescriptor. The InitialObjectDescriptor is an extension of the ObjectDescriptor and must be placed in the Header element of the XMT document.

<InitialObjectDescriptor>
<Descr>
<esDescr>
<ES_Descriptor ES_ID="BIFSStream" binaryID="1">... ...</ES_Descriptor>
<ES_Descriptor ES_ID="ODStream" binaryID="2">... ... </ES_Descriptor>
</esDescr>
</Descr>

</InitialObjectDescriptor>


Usage of ODs within the scene:

Through the Object Descriptor Framework, media inclusion in the scene is fairly simple: the binary identifier is used in URL fields of nodes interfacing with media objects. Here are some of these nodes:

BIFS Nodes

Description

Anchor Hyperlinking between MPEG-4 scenes or to external web pages
AnimationStream Playback of a BIFS object (command or anim)
AudioClip Playback of an audio object
AudioSource Playback of an audio object
Background2D Display of background images
ImageTexture Reference a still image for texturing
Inline Includes an (external) MPEG-4 scene in the main scene
InputSensor Recieves generic interactions (keyboard, mouse, joystick ...)
MediaBuffer Buffering of a media object
MediaControl Controls playback of a media object
MediaSensor Retrieves information during playback of a media object
MovieTexture Reference a video object for texturing

The XMT syntax used to reference an ObjectDescriptor is:

url="'binaryID'".

Let's see some real-world examples before you get lost...


Adding an Image (JPEG):

We will take the following JPEG image (198x197 pixels) and use it as a texture. This will also introduce some new nodes.

Step 1 : Declare the OD stream:

As seen above, we need an ObjectDescriptor to describe the image we're planning to use. This descriptor will be added to the scene via an OD command, hence the need to declare an OD stream in the InitialObjectDescriptor:

<ES_Descriptor ES_ID="ObjectDescriptorStream" binaryID="2">
<decConfigDescr>
<DecoderConfigDescriptor objectTypeIndication="MPEG4Systems1" streamType="ObjectDescriptor"/>
</decConfigDescr>
<slConfigDescr>
<SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor>
</slConfigDescr>

</ES_Descriptor>

Step 2 : Adding the ObjectDescriptor for the image:

We now need to write the OD command that will add the ObjectDescriptor of the image to the scene. Since we want the image to be displayed upon loading, we will send this command at time t=0s. As we have seen in the previous part, the body element behaves as a par element with a default timing at 0s. Therefore we only have to add an ObjectDescriptorUpdate in the body element. If we wanted to insert the image at time 10 s, the same rules as for BIFS would apply: a par element with begin attribute at 10 would have to be inserted before the ObjectDescriptorUpdate command. The command itself contains a list of ObjectDescriptors to be added or updated:

<ObjectDescriptorUpdate>
<OD>
<ObjectDescriptor objectDescriptorID="JPEGImage" binaryID="10">
<Descr>
<esDescr>
<ES_Descriptor ES_ID="JPEGStream" binaryID="3">
<decConfigDescr>
<DecoderConfigDescriptor objectTypeIndication="JPEG" streamType="Visual"/>
</decConfigDescr>
<slConfigDescr>
<SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor>
</slConfigDescr>
<StreamSource url="logo_enst.jpg"/>
</ES_Descriptor>
</esDescr>
</Descr>
</ObjectDescriptor>
</OD>

</ObjectDescriptorUpdate>

Note that the streamType is of type "visual" and the object type is "JPEG". The StreamSource element indicates the media location to the encoder.

Step 3 : Apply a texture:

The final step is to declare in the scene where the texture should be used. A texture can be applied to any geometry node except IndexedLineSet and IndexedLineSet2D. In order to add the texture, we must fill the Appearance.texture field of the target geometry. By default this field is NULL, which means no texture applies. In 2D texture mapping is done by stretching the texture to the enclosing rectangle of the geometry and matching bottom-left corners (then potential texture transformations are applied).

The XMT syntax used to declare a texture is:

<Shape>
<geometry>... ... </geometry>
<appearance>
<Appearance>
<texture><ImageTexture url="'10'"/></texture>
<material><Material2D filled="true"/></material>
</Appearance>
</appearance>
</Shape>

The url field indicates '10' which is the binaryID assigned to the ObjectDescriptor of the image.

NOTE: The special geometry node Bitmap can be used to display the texture; it defines an implicit rectangle whose size is the size of the texture in pixels. It should be used as the prefered way to display images or video since texture mapping is usually a more complex operation than straight pixel blitting.


Adding a video:

Let's now add a video in the scene. We will use the NHNT import format used by most MPEG-4 tools available. More information on NHNT is available here. The following files are needed and shall be placed in the same directory than the XMT document.

Step 1 : Adding the OD stream:

See above.

Step 2 : Adding the OD for the video object:

Same as above, the ES_Descriptor is now:

<ES_Descriptor ES_ID="VideoStream" binaryID="3">
<decConfigDescr>
<DecoderConfigDescriptor objectTypeIndication="MPEG4Visual" streamType="Visual"/>
</decConfigDescr>
<slConfigDescr>
<SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor>
</slConfigDescr>
<StreamSource url="stouffer.media"/>

</ES_Descriptor>

Step 3 : Apply the texture:

Same as above, except the ImageTexture node is now a MovieTexture node.

<Shape>
<geometry>... ... </geometry>
<appearance>
<Appearance>
<texture><MovieTexture url="'10'"/></texture>
<material><Material2D filled="true"/></material>
</Appearance>
</appearance>
</Shape>

The final scene is video1.xmt, video1.bt, video1.mp4.


Adding sound:

We will now add some sound to the scene. As above, we will use the NHNT import format and use AAC audio.

Step 1 : Adding the OD stream:

See above.

Step 2 : Adding the OD for the audio object:

Same as above, the ES_Descriptor is now:

<ES_Descriptor ES_ID="AudioStream" binaryID="4">
<decConfigDescr>
<DecoderConfigDescriptor objectTypeIndication="MPEG4Audio" streamType="Audio"/>
</decConfigDescr>
<slConfigDescr>
<SLConfigDescriptor><predefined value="2"/></SLConfigDescriptor>
</slConfigDescr>
<StreamSource url="audio.media"/>
</ES_Descriptor>

Step 3 : Adding the sound in the scene:

Add the following nodes in the OrderedGroup.children field:

<Sound2D>
<source><AudioSource url="'10'"/></source>

</Sound2D>

The final scene is audio1.xmt, audio1.bt, audio1.mp4.


Exercises :

Exercise 12 : Use the JPEG example and OD commands to replace the image with another one.

Exercise 13 : Write a simple scene with a background image instead of a background color


Conclusion :

In this part we have seen how the object descriptor framework is used with the BIFS framework to mix synthetic and natural media. We still need to understand how synchronization between objects work to have a richer authoring, but you can already author a simple multimedia presentation with sound, images, video and synthetic objects, a la DVD :)


[ Home ] [ Animation ] [ Including Media ] [ Interactivity ]


Last Modified: 02/04/2005
Cyril Concolato & Jean Le Feuvre © 2002-2005