
COP4932: DESIGNING INTERACTIVE ENVIRONMENTS
Jeff Cornett's Project: Synchronized Animation
of Audio Narration
Objective:
Project Parameters:
This is not a project in the traditional sense. In other words, it
won't require a grandiose effort, but it will be different than your othjer
assignments in that you will be defining it, and it must be experimental,
involving the evaluation of some technology. This evaluation has to have
a hypotheses, an experimental design, an implementation to gather data,
an analysis, a final report that includes your conclusions, and a poster
display that summarizes the report. This poster will be on display on Thursday,
December 4. Teams for this assignment will normally have 2 or 3 members.
Larger teams, or individual efforts require a good argument.
You must have a proposal to me by Thursday, November 6. This proposal
has to include the names of team members, an abstract describing what you
are proposing to analyze, and a brief first cut description of the experiments
you intend to carry out.
An example of a project might be the analysis of texture map rendering,
with the hypothesis that texture maps whose sizes are powers of two render
faster in VRML browsers. An experiment would obviously need to include
tests of many different sizes of textures. It would also need to include
several browsers and VRML plug-ins. For example, you might use Netscape
4.03 with Cosmo 2.0 and WorldView, and Internet Explorer 4.0 with the same
plug-ins.
A very different example might be the analysis of Java's RMI, with the
hypothesis that RMI solutions are significantly slower than socket solutions
using serialized objects. An experiment would obviously need to include
tests of different numbers and kinds of parameters being passed. A fair
comparison is hard if results are not required, but you shouldn't ignore
the fact that many applications do not need results. Of course, a separate
thread could be used with RMI to create the appearance of asynchronous
communication.
Proposals Due: November 6, Thursday, Week #12.
Papers and Poster Presentations Due: December 4, Thursday, Week #16.
Project Proposal:
Researcher: Jeff Cornett
The purpose of this experiment is to study how well VRML can synchronize
the timed display of animation with an audio stream narration. An example
would be to animate the play by play of a crew race. Given a six minute
continuous audio narration, how well can VRML graphically illustrate movement
over a race course and the relative position of the crews to coincide with
the audio narration?
The hypothesis is that this can be done reliably only if an accurate
timer can be implemented to synchronize graphics at specific time spots
within a real time audio stream. I am also concerned that a 15 megabyte
WAV audio file will take too long to download as an internet application.
If necessary, an alternative would be to break up the audio into pieces
that could then be triggered separately to coincide with graphic illustrations.
The experimental design would consist of implementing a VRML environment
for studying the animation of a crew race audio narration. I can
then explore methods of triggering graphics within VRML after first starting
an audio narration. VRML includes a timing mechanism with linear interpolators
that might form the basis for doing this accurately. I must verify
that other simultaneous PC functions or processor speeds do not slow down
the animation clock (the real time audio play should be unaffected).
If these timing methods prove unreliable or impractical, I would try other
approaches to synchronize graphics with audio narration such as through
a user interface, or perhaps resorting to breaking up the narration into
pieces.
Research Results:
The following link is the VRML product of my "feature discovery" research:
(Please, be patient for roughly 10 minutes as the audio file downloads
to your machine. Click the green ball to start the race after all
downloading is complete.)
The 1971 IRA Final in VRML 3D with Audio Narration
The following link is a second example from the previous day's 2nd chance
"repechage" qualifying heat. (Please, be patient for roughly 10 minutes
as the audio file downloads to your machine. Click the green ball
to start the race after all downloading is complete.)
The 1971 IRA Repechage #2 in VRML 3D with Audio
Narration
To learn more about the historical context for this race, link to the
following website:
The 1971 IRA Final
in Text and Pictures
The following summarizes my technology research findings:
-
Compressing a 16 meg WAV file: Americal Online limits users to a
maximum of 2 meg of storage per user ID. Therefore, to load this
file as a single audio stream requires reducing the size of this file to
less than 2 meg. I converted this WAV file to AU format resulting
in a file only 2.7 meg in size, but still too large for AOL. Next,
using the GZIP utility, I was able to compress this file down to 1.8 meg
-- small enough to store in one of the five user IDs associated with a
single AOL account.
-
Audio file load time: VRML will load an AU or AUZ audio file very
quickly from the local drive. It takes only four or five seconds
to load these files -- regardless of whether they are in AU format or compressed
AUZ format. The decompression process is automatic within VRML and
does not seem to slow down the loading process at all. Off the internet
with a 24K baud connection, it takes about 10 minutes to load a 1.8 meg
AUZ file after starting VRML. Loading an uncompressed 2.7 meg AU
file off my UCF website takes about 12 minutes.
-
Synchronized animation: The PositionInterpolator feature of VRML
worked extremely well for pacing each of six crew shells against a 6 minute
race narration. Running other applications such a Java sort threads,
MS Office applications, or other browsers did not seem to affect the ultimate
timing of shell movement. VRML tracks in real time against the PC's
internal clock. If there are animation delays, VRML catches up by
jumping the crews forward to where they belong. I was able to trick
VRML into animation distortions by going in and changing the internal clock
setting ahead by a minute during a race using the Windows Control Panel.
This made the shells jump forward in time to a future location, but the
crews still finished the race in the same order and position as specified.
-
Audio interruptions: In the course of testing animation timing, I
discovered that you can throw off the audio timing by creating a pause
in the audio playback. By performing a disk intensive operation,
you can cause the audio to momentarily stop streaming. When it restarts,
it continues where it left off. This causes the audio to lag the
real time animation PositionInterpolator. Thus, it is possible to
cause the audio and animation to get out of synch by interrupting the audio.
However, unless you are really trying to, these processes will synchronize
accurately.
-
Digitized audio: I asked our audio technician at work (Time Warner
Full Service Network) to digitize my crew race audio narrations.
Concerned about storage size, he digitized the narration in a manner that
slightly decompressed the real time audio duration. A race narration
that should last about 6 minutes and 20 seconds was recorded to reduce
"empty audio space" resulting in a narration that lasted only 5 minutes
and 53 seconds. Thus, when calibrating VRML's position interpolators,
one must time the animation to be synchronized with the digital narration
-- not the actual race narration as played from a cassette tape player.
This could also distort the actual pacing of crews down the race course,
but the elimination of this empty audio seems to be fairly evenly distributed
over the course judging from realistically proportional 500 meter splits.
The compressed race narration seems to represent the actual historical
race fairly well, although the total race length is reduced in time.
-
PositionInterpolator calibration: From a practical standpoint, the
most difficult part of this synchronization process is the effort to precisely
calibrate the PositionInterpolator animation. The original narration
was done of a live crew race being broadcast to the fans on shore.
The announcer generally reports when 500 meter markers are reached and
the order of the crews. He may periodically recount the relative
position of crews based on how many shell lengths behind they are.
He also announces which crews seem to be gaining on other crews.
All told, the announcer reports a combination of race locations, crew orders,
shell separations, and indications of relative velocity. VRML's PositionInterpolator
uses a process of linear extrapolation over distances and time. To
represent the many changing race conditions with a minimum of extrapolation
points, I found it useful to construct an Excel spreadsheet to calibrate
the race parameters to feed into VRML. The spreadsheet also calculated
the crew speeds, locations and spreads so that these could be explicitely
evaluated for accuracy and realism. The ultimate test was whether
the VRML animation seemed to accurately portray the events being narrated.
If not, adjustments would be made in the spreadsheet model with the results
entered into the VRML program. As a further test of race calibration,
I caused one of the crews to "jump the start" -- not historically accurate
for this race, but a useful race calibration experiment. I also caused
the crews to coast to a stop, rather than stop abruptly as the race ends.
-
Viewing angles: VRML offers a "virtual reality" experience.
By constructing various viewpoints around the race course, the VRML viewer
can experience the view of the race from the perspective of an announcer
or race judge following the race, fans from various points on shore, or
even an aerial view. By sliding along behind a shell while continuously
colliding with the shell, you can also experience the view from the coxswains
seat of a particular crew. This technique is a poor man's animated
viewpoint, but works nearly as well as a fully programmed animated viewpoint.
The VRML experience also provides realistic distortions of crew positions
by viewing the course at an angle other than 90 degrees. The most
accurate viewpoint is the aerial view. Any other view can make it
difficult to accurate interpret relative crew positions and speeds.
These types of distortions also explain why the race narrator sometimes
reported events may not actually have been accurate. For example,
in the last 500 meters of the championship race, the announcer reported
that Cornell regained a length lead after Washington had closed to a half
length lead. This was not true from the superior viewing angle I
had as the Cornell coxswain. As the crews raised their stroke rates
toward a sprint, Washington closed our lead each time they raised their
stroke rate. When we took our stroke up, we held them but never regained
any lost space. I chose to animate this part of the race as I remember
it. Part of the true virtual reality experience is that sometimes
the race narrator is inaccurate as he tries to describe the race from a
viewpoint from a launch located behind the trailing crew.
to
Jeff's course homepage
Send feedback to
Jeff Cornett -- This page last updated November 30, 1997