Multimodal Maps: Demonstration
In order to view the videos included on this page, you must have an AVI
viewer on your system that has been made available to your Web Browser.
You may obtain public domain versions of AVI viewers for your platform at a
variety of sites. Here are some suggestions:
Make sure to let your Web Browser know to use the viewer when it encounters
.AVI video. For instance, on a UNIX system using Xanim as your viewer,
you would add the following line in your ~/.mailcap file:
# for video, use xanim app. +Ae enables audio option
video/*; xanim +Ae %s
Introduction
What you are about to see is an unedited demonstration of
a synergistic pen / voice interface to a distributed set
of databases. This system is implemented on top of the
Open Agent Architecture, a framework for allowing a society
of software agents to cooperate and communicate in order
to accomplish tasks for the user.
Interface
The user interface provides the following components:
- Title Bar : direction indicator, scale, and current
application and location.
- Map View : the user can write or draw directly on the map
surface. Overlays and icons can be dynamically added to the
map in response to requests by the user. The window only contains
a partial view of a much larger map surface.
- Command Line : handwritten or verbal commands recognized by the
system are displayed here as feedback.
- Feedback windows : The first window displays the current
viewport location and displayed icon locations. A second window
lists current filtering constraints during search. Finally, textual
messages are received by in message area.
The map functionality, interface design, and classes of input data of
the system presented here is based on a design by Oviatt and Cohen,
used by them in a wizard-of-oz simulation system designed to explore
complex interactions of modalities (Oviatt: "Multimodal Interactive Maps: Designing for Human Performance", forthcoming journal publication).
The agent-based architecture used to realize Oviatt and Cohen's design is
new, as is its application to travel planning.
Multimodal Interactions
Commands may be entered using handwriting, gesture or speech,
and these modalities may be used together simultaneously.
Video illustrating combined modalities:
Summary
- A natural combination of handwriting, speech and gesture may
be used when making queries. State of the art natural language
and speech recognition systems are integrated using agent
technology.
- The map interface can access existing databases and knowledge
sources including the World Wide Web.
- Using the distributed agent architecture allows intensive computing
such as speech recognition, natural language processing and database
access to be offloaded to powerful server machines. The interface
can run on both a desktop machine and a handheld PDA.
- Multimedia output including video can be produced if supported by the
interface machine.
Acknowledgments
The work reported here would not have been possible without
the inspiration of Sharon Oviatt and Phil Cohen under whose direction
we worked for a year on a project (NSF Grant No. IRI-9213472)
in which the combination of modalities
contained in the interface presented here was crystallized and studied
via simulations. Neither they nor their sponsors, of course, are
responsible for the work presented here.
Adam Cheyer <cheyer@ai.sri.com>
June 1995