XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


A Realist's SMIL Manifesto

May 29, 2002

Realist: One who is inclined to physical evidence or pragmatism. -- From the Realist Manifesto (1920), written by constructivist authors and brothers Antoine Pevsner & Naum Gabo

The Synchronized Multimedia Integration Language, SMIL, has a less-than-stellar past but a very interesting future. SMIL 2.0 recaptures the simplicity and practicality of declarative synchronization of media introduced by version 1.0, while adding modularization and content-related features much missed in the early version.

The goal of this two-part series is to illustrate best practices and creative uses of SMIL 2.0; in particular the creation of guided-reading documents which push the boundaries of Web narrative technology by combining classic layout and design practices with television-like effects.

The present article deals with the problem of enhancing video inexpensively and dynamically with SMIL 1.0 and assumes no prior knowledge of SMIL 1.0. It covers the current state of SMIL; the structure and syntax of the language, with examples; and SMIL 1.0's strengths and flaws. It is meant to get you up to speed with the last three years of SMIL, while the next article will show you what is ahead in the coming years, and how SMIL can be a player in improving narrative technology on the Web. (You can download the example files I use in this article, but be warned: they are about 4 mb.)

The State of SMIL

The SMIL project started in 1998 and then, after initial enthusiasm in multimedia circles developing kiosks and similar applications, virtually disappeared from people's attention, in favor of other technologies. With the August, 2001 release of SMIL 2.0, the buzz is starting to return, but SMIL suffers from two main problems: confusion about terminology and the lack of business or artistic orientation in current literature.

Confusion about terminology and versioning

Keeping up with version numbers in commercial multimedia packages is simple; the relevant entities are the "editor" and "player", the versions of which are usually the same, and they are either "beta" or "release". Because of technical and bureaucratic reasons, things with SMIL were not so simple. First of all, SMIL 2.0 is technically not just a language but a collection of reusable modules (animation, layout, synchronization) which can be independently implemented and used in other languages. Second, as a W3C recommendation, the status of SMIL at any point includes less well-known markers like "Candidate Recommendation", "Note", which generally do not improve the clarity of the situation to the intended SMIL public.

In SMIL elements and attributes are grouped into independent bundles called modules; for example, the layout and region elements are in the Layout Module, and the animateColor and animateMotion elements are in the Animation module. SMIL modules can be grouped into a language, called a profile. There are two SMIL profiles, "SMIL 2.0 language profile" and a simplified version, "SMIL 2.0 basic profile", designed for small devices. Both are supersets of the original SMIL 1.0 language.

Modules are designed to be reusable as parts of other XML vocabularies, so vendors or other standards initiatives may decide to implement only parts of SMIL. Examples of this practice include the marriage of XHTML and the SMIL timing module and declarative animation in SVG, implemented by IE6 and Adobe SVG Viewer 2+ respectively. As far as direct SMIL support is concerned, there are a number of SMIL 2.0 players in the making (see side box) but most of the available players still use SMIL 1.0. The examples of the SMIL 2.0 language profile discussed in this article work on SMIL 1.0 players, except where noted.

The other big impediment to popularizing SMIL is the nature of the current literature, which for the most part contains a descriptive overview of each module, its elements and attributes, with occasional examples of a zooming square or a photo slideshow. This documentation pattern doesn't address the communication potential of SMIL or its contribution to the media. It's certainly not going to convince any manager to invest in a SMIL development or a creative developer to learn SMIL. The key to popularizing SMIL is to emphasize its potential to expand the the possibilities of a media-rich Web, rather than its strictly technical superiority.

The Process

Whether using SMIL 1.0 or 2.0, the steps involved in creating a presentation with SMIL 1.0 (hereafter, "SMIL") are invariably the following:

  • Create an XML document and include the appropriate namespace. The root element is smil, and its children are head and body
  • In the head element, code the layout of the areas where content can be inserted
  • In the body element, code the references to the content to be inserted; specify where, when, and for how long each element is shown.

The Problem: Late and Localized Annotations

When you watch even the simplest television show you're watching images composed of several layers of content: the actual video filmed with a camera, the logo of the channel on a corner, annotations (in the case of Figure 1 the name of the band and the song) etc. Some networks add even layers of content, providing extra data about, say, the drivers of a NASCAR race or trivia about the band on a music video.

Figure 1. Images are composed by superimposing layers

The problem with traditional TV is that all the layers get merged before they are shipped to everyone's television, where people get one flat image. Media like Digital HDTV and the Web using SMIL can keep track of the different components of a presentation. Thus, they can avoid merging layers early by deciding at presentation time to hide or to show content, depending on user preferences or other factors and constraints. For example, a DVD can show or hide captions with script notes synchronized with the movie, at the user's will.

Showing and hiding extra layers of content is just the tip of the iceberg; using SMIL you can position and synchronize any media on top of your video without ever having to decide on a final merge of your pieces. Furthermore, you can combine SMIL with dynamic content and customize and localize your layers, opening new opportunities for information, entertainment, and publicity.

The Project: Annotating Boxing Footage

What we want from this project is a solution for visually annotating videos, adding layers of content with data dependent on the locale and preferences of each viewer, without having to alter the video itself. This is a very desirable feature for many media sites, which want to inexpensively add dynamic content to their video for publicity and business purposes.

The steps to create an annotated video include

  1. deciding what video and what kind of annotation data we want;
  2. creating a layout for the annotated video: figuring out which region serves which purpose, the size and position of the regions;
  3. deciding the sequence and duration of events; and
  4. modifying the source of the annotations so that they can be localized and customized.

Each step involves not only technical knowledge about the SMIL language, but effective design ideas, which make the difference between a nice experiment and an effective tool.

The Video

The video we will annotate is a portion of a boxing match between Jake La Motta and Sugar Ray Robinson in 1951. The reason I picked this clip is because it is small, and sports feeds are a realistic example of video that can be served by dynamic annotations.

We want to add three kinds of annotation: opening titles, boxers' statistics, and associated trivia. Figure 2 shows a snapshot of the final result versus the original video.

Figure 2. The naked video vs. the Final Result


Layout is the process of arranging elements in a space. Effective layout directs the attention of the user, guiding her through the hierarchy of elements. Layout is accomplished in a variety of ways, like providing a sense of depth, creating contrast between elements, or intuitively sequencing elements.

Directing the viewer's attention to different elements involved in a video is a lot easier than in static graphics because elements can pop up and disappear from the screen. However, important style notions are relevant for our example, especially the notions of regularity, recognition, and depth. Table 1 shows the layout regions for our content, the code necessary to implement them, and their rationale.

Layout Areas Code
     <root-layout id="video"

     <region id="comment" left="10" top="9" 
     width="34" height="29" z-index="1"/>

     <region id="stats" left="105" top="14" 
     width="43" height="75" z-index="1"/>

     <region id="title" left="12" top="99" 
     width="113" height="15" z-index="1"/>

     <region id="caption" left="29" top="90" 
     width="102" height="20" z-index="2"/>

   <!-- Not shown -->

Using a total area not bigger than the video itself promotes the reusability of the annotated video because we don't have to make compromises or assumptions about the background color of the area not covered by the footage. Rhythm is important in a layout because it helps the user recognize and classify information. In the case of SMIL annotations nothing is easier than achieve regularity by consistently showing related information on the same places. We use totally different areas for Tips and Statistics. Banking on well-known practices is often convenient. Titles and people's names at the bottom are instantly recognized by users, so are white-on-black captions centered at the bottom, on top of all else.

Table 1. Video Annotation Layout

I've kept the code compatible with SMIL 1.0 because there are very few players for SMIL 2.0 and the ideas introduced here are the same in SMIL 1.0 and 2.0.

Adding and Grouping Elements

The first elements we want to add to our presentation are the opening credits, which are two simple GIF files. What we want, as specified in the timeline of Figure 3, is for each GIF to appear for 3 seconds, one after the other. To achieve this we reference the media using img elements, and we group them in a seq (for sequence) element, as shown in Listing 1.

     <!-- Layout exactly as in Table 1 -->
     <img src="Intro-Names.gif" region="video" dur="3s"/>
     <img src="Intro-Date.gif"    region="video"  dur="3s"/>

Figure 3. Timeline for credits Listing 1. Showing the credits in a sequence

As you can see, specifying a sequence in SMIL is very intuitive. Before getting into more sophisticated ways of specifying synchronization, the prior question is what media types you can synchronize. The media elements tags are

  • img : JPEG or GIF images work on all current players. See the documentation of your player for details. GIF89 transparency is supported in any current player, non-interlaced GIF preferred in RealPlayer.
  • video: MPEG, AVI, RealVideo and other formats for motion clips must be included using this element. The support for different video formats is specially dependent on the player.
  • text: Static text. HTML is not supported in any SMIL 1.0 players. 
  • audio: Audio clips including WAV and AU. Also covers streaming audio such as RealVideo
  • animation: Animation clips. The types supported are especially player-dependent and limited (don't really expect Flash and Mojo support in standalone players). 
  • ref: Any clip not covered by other elements but supported by the player

It is important to realize that the existence of an explicit tag does not mean that every SMIL player supports that media type. The incomplete support for some media types in many players is one of the reasons for the slow adoption of SMIL. For example, you cannot see through the transparent areas of a GIF file or include HTML as a media element in any of the current SMIL 1.0 players and support is only partial in SMIL 2.0 players presently.

RealPlayer and Quicktime include extra elements for including vendor-specific "smart text" for effects like tickers and basic formatting. Unless you have to produce SMIL 1.0 specifically for either platform, you should avoid such extensions for the sake of portability.

Pages: 1, 2

Next Pagearrow