Jeff Cornett Team Project Assignment

COP4932: DESIGNING INTERACTIVE ENVIRONMENTS
Jeff Cornett's Project: Synchronized Animation of Audio Narration

Objective:

Project Parameters:

You must have a proposal to me by Thursday, November 6. This proposal has to include the names of team members, an abstract describing what you are proposing to analyze, and a brief first cut description of the experiments you intend to carry out.

An example of a project might be the analysis of texture map rendering, with the hypothesis that texture maps whose sizes are powers of two render faster in VRML browsers. An experiment would obviously need to include tests of many different sizes of textures. It would also need to include several browsers and VRML plug-ins. For example, you might use Netscape 4.03 with Cosmo 2.0 and WorldView, and Internet Explorer 4.0 with the same plug-ins.

A very different example might be the analysis of Java's RMI, with the hypothesis that RMI solutions are significantly slower than socket solutions using serialized objects. An experiment would obviously need to include tests of different numbers and kinds of parameters being passed. A fair comparison is hard if results are not required, but you shouldn't ignore the fact that many applications do not need results. Of course, a separate thread could be used with RMI to create the appearance of asynchronous communication.

Project Proposal:

The purpose of this experiment is to study how well VRML can synchronize the timed display of animation with an audio stream narration. An example would be to animate the play by play of a crew race. Given a six minute continuous audio narration, how well can VRML graphically illustrate movement over a race course and the relative position of the crews to coincide with the audio narration?

The hypothesis is that this can be done reliably only if an accurate timer can be implemented to synchronize graphics at specific time spots within a real time audio stream. I am also concerned that a 15 megabyte WAV audio file will take too long to download as an internet application. If necessary, an alternative would be to break up the audio into pieces that could then be triggered separately to coincide with graphic illustrations.

The experimental design would consist of implementing a VRML environment for studying the animation of a crew race audio narration. I can then explore methods of triggering graphics within VRML after first starting an audio narration. VRML includes a timing mechanism with linear interpolators that might form the basis for doing this accurately. I must verify that other simultaneous PC functions or processor speeds do not slow down the animation clock (the real time audio play should be unaffected). If these timing methods prove unreliable or impractical, I would try other approaches to synchronize graphics with audio narration such as through a user interface, or perhaps resorting to breaking up the narration into pieces.

Research Results:

The 1971 IRA Final in VRML 3D with Audio Narration

The following link is a second example from the previous day's 2nd chance "repechage" qualifying heat. (Please, be patient for roughly 10 minutes as the audio file downloads to your machine. Click the green ball to start the race after all downloading is complete.)
The 1971 IRA Repechage #2 in VRML 3D with Audio Narration

To learn more about the historical context for this race, link to the following website:
The 1971 IRA Final in Text and Pictures

The following summarizes my technology research findings:

Compressing a 16 meg WAV file: Americal Online limits users to a maximum of 2 meg of storage per user ID. Therefore, to load this file as a single audio stream requires reducing the size of this file to less than 2 meg. I converted this WAV file to AU format resulting in a file only 2.7 meg in size, but still too large for AOL. Next, using the GZIP utility, I was able to compress this file down to 1.8 meg -- small enough to store in one of the five user IDs associated with a single AOL account.

Audio file load time: VRML will load an AU or AUZ audio file very quickly from the local drive. It takes only four or five seconds to load these files -- regardless of whether they are in AU format or compressed AUZ format. The decompression process is automatic within VRML and does not seem to slow down the loading process at all. Off the internet with a 24K baud connection, it takes about 10 minutes to load a 1.8 meg AUZ file after starting VRML. Loading an uncompressed 2.7 meg AU file off my UCF website takes about 12 minutes.

Synchronized animation: The PositionInterpolator feature of VRML worked extremely well for pacing each of six crew shells against a 6 minute race narration. Running other applications such a Java sort threads, MS Office applications, or other browsers did not seem to affect the ultimate timing of shell movement. VRML tracks in real time against the PC's internal clock. If there are animation delays, VRML catches up by jumping the crews forward to where they belong. I was able to trick VRML into animation distortions by going in and changing the internal clock setting ahead by a minute during a race using the Windows Control Panel. This made the shells jump forward in time to a future location, but the crews still finished the race in the same order and position as specified.

Audio interruptions: In the course of testing animation timing, I discovered that you can throw off the audio timing by creating a pause in the audio playback. By performing a disk intensive operation, you can cause the audio to momentarily stop streaming. When it restarts, it continues where it left off. This causes the audio to lag the real time animation PositionInterpolator. Thus, it is possible to cause the audio and animation to get out of synch by interrupting the audio. However, unless you are really trying to, these processes will synchronize accurately.

Digitized audio: I asked our audio technician at work (Time Warner Full Service Network) to digitize my crew race audio narrations. Concerned about storage size, he digitized the narration in a manner that slightly decompressed the real time audio duration. A race narration that should last about 6 minutes and 20 seconds was recorded to reduce "empty audio space" resulting in a narration that lasted only 5 minutes and 53 seconds. Thus, when calibrating VRML's position interpolators, one must time the animation to be synchronized with the digital narration -- not the actual race narration as played from a cassette tape player. This could also distort the actual pacing of crews down the race course, but the elimination of this empty audio seems to be fairly evenly distributed over the course judging from realistically proportional 500 meter splits. The compressed race narration seems to represent the actual historical race fairly well, although the total race length is reduced in time.

PositionInterpolator calibration: From a practical standpoint, the most difficult part of this synchronization process is the effort to precisely calibrate the PositionInterpolator animation. The original narration was done of a live crew race being broadcast to the fans on shore. The announcer generally reports when 500 meter markers are reached and the order of the crews. He may periodically recount the relative position of crews based on how many shell lengths behind they are. He also announces which crews seem to be gaining on other crews. All told, the announcer reports a combination of race locations, crew orders, shell separations, and indications of relative velocity. VRML's PositionInterpolator uses a process of linear extrapolation over distances and time. To represent the many changing race conditions with a minimum of extrapolation points, I found it useful to construct an Excel spreadsheet to calibrate the race parameters to feed into VRML. The spreadsheet also calculated the crew speeds, locations and spreads so that these could be explicitely evaluated for accuracy and realism. The ultimate test was whether the VRML animation seemed to accurately portray the events being narrated. If not, adjustments would be made in the spreadsheet model with the results entered into the VRML program. As a further test of race calibration, I caused one of the crews to "jump the start" -- not historically accurate for this race, but a useful race calibration experiment. I also caused the crews to coast to a stop, rather than stop abruptly as the race ends.

Viewing angles: VRML offers a "virtual reality" experience. By constructing various viewpoints around the race course, the VRML viewer can experience the view of the race from the perspective of an announcer or race judge following the race, fans from various points on shore, or even an aerial view. By sliding along behind a shell while continuously colliding with the shell, you can also experience the view from the coxswains seat of a particular crew. This technique is a poor man's animated viewpoint, but works nearly as well as a fully programmed animated viewpoint. The VRML experience also provides realistic distortions of crew positions by viewing the course at an angle other than 90 degrees. The most accurate viewpoint is the aerial view. Any other view can make it difficult to accurate interpret relative crew positions and speeds. These types of distortions also explain why the race narrator sometimes reported events may not actually have been accurate. For example, in the last 500 meters of the championship race, the announcer reported that Cornell regained a length lead after Washington had closed to a half length lead. This was not true from the superior viewing angle I had as the Cornell coxswain. As the crews raised their stroke rates toward a sprint, Washington closed our lead each time they raised their stroke rate. When we took our stroke up, we held them but never regained any lost space. I chose to animate this part of the race as I remember it. Part of the true virtual reality experience is that sometimes the race narrator is inaccurate as he tries to describe the race from a viewpoint from a launch located behind the trailing crew.

to Jeff's course homepage

Send feedback to Jeff Cornett -- This page last updated November 30, 1997

COP4932: DESIGNING INTERACTIVE ENVIRONMENTS Jeff Cornett's Project: Synchronized Animation of Audio Narration

Objective:

Project Proposal:

Research Results:

COP4932: DESIGNING INTERACTIVE ENVIRONMENTS
Jeff Cornett's Project: Synchronized Animation of Audio Narration