InfoWiz: An Animated Voice Interactive Information System
InfoWiz Project
The InfoWiz project is centered around the idea of putting a
interactive kiosk into the lobby of SRI.
People who have a few minutes to spend should be able to
learn something about SRI, enjoy themselves, and walk away
with a good feeling of having seen something interesting and unusual.
One of the design decisions of the project has been to use speech recognition
as the main form of user input to the system. Using a mouse to
navigate through scrollbars and web links might be a concept unfamiliar
with many visitors to SRI. Using a touchscreen is possible, but, hey,
let's aim high. If we are successful in building a natural
spoken language interface, this should contribute to the user's
overall impression of ease of use, advanced technology, and general
intelligence of the system. If we are not successful, we can always
put back a touch screen.
In order to encourage spoken interaction with the system, we
added an animated character, a cartoon wizard, who attempts
to engage the user in conversations about SRI. As Don Nielson (to
whom this system is dedicated) used to say, talking to a computer
without an animated persona who can speak back is like talking to a
wall. Initially, we did our own animations based on a popular
clip-art figure. In later incarnations of the
interface, we adopted Microsoft Agent's Merlin character to provide
the graphics for the system.
The InfoWiz Kiosk
Interactions with the Wizard
After an audio and visual welcome from the InfoWiz Wizard,
the user is presented with
a screen containing a web browser,
and the Wizard himself. The Wizard's job is to instruct, advise,
suggest and demonstrate; he will take you on tours of the InfoWiz
information space or the complete set of SRI's web pages, and he
can answer questions you might have along the way.
The Wizard is able to cross window boundaries, and interact with
any part of the display, flying and pointing with his magic wand
to various places on the screen.
All audio interactions with the system occur through a telephone next
to the kiosk display. Users can make natural spoken requests about
information presented on the current web page or about general topics
related to SRI.
The Wizard is capable of answering questions and
providing supplementary information (videos, etc.) about the
InfoSpace. When the user is looking at web pages outside of
it's own dowmain (the InfoSpace), the InfoWiz can scroll pages for you
or help you navigate links, but of course, does not have any
special supplementary information about these pages.
InfoWiz Kiosk Screen
Interactions with the Wizard
The Challenge
The biggest challenge of this project is to create a system that
appears intelligent, easy to use, and not overly constraining,
to an untrained, not necessarily technically aware person who
walks in off the street. A potential user only may have a few
minutes to spend with the system, so it must be intuitively and
immediately obvious what to do.
The Approach
Our approach is as follows:
- Provide interaction styles that are familiar even to people who have
never seen Microsoft Windows or a web browser:
a telephone as an input device, and
a human character with whom to converse.
- Combine numerous state-of-the-art artificial intelligence
technologies in a flexible and incremental fashion. Allow the system
to be workable soon, but easily expandable in its capabilities.
- Find a way to allow easy updating of the content, vocabulary and
knowledge required by the system to appear intelligent.
The system is implemented on top of the
Open Agent ArchitectureTM (OAATM)
, which provides the means for bringing
together the component technologies in a plug-and-play manner. The OAA
is an open and distributed system; components can be written in different
programming languages, and be distributed over multiple computers.
In addition, technologies can be added incrementally -- we can
start out with a simple natural language processing component, and then
as the system is used and user input analyzed in more detail, we can
replace the prototype natural language agent with our most capable
system.
The Technologies
A number of different technologies are integrated using the OAA to
create the InfoWiz Kiosk. The plug and play nature of OAA has allowed
us to experiment with various versions and vendors for the component
technologies. But the basic elements are, as pictured below,
animation graphics, speech recognition, natural language
interpretation, dialog management, text-to-speech, and then knowledge
about the InfoSpace.
InfoWiz Architecture
- Speech Recognition is provided by technology created in SRI's
STAR Laboratory and commercialized
by Nuance Communications, a spinoff company. We have also tried IBM's
ViaVoice product in it's stead.
- Natural Language processing is handled by a mixture of Nuance's NL api,
and SRI's DCG-NL parser. Other more powerful systems may be brought in
later as needed.
- Wizard animation is handled by
Microsoft Agent.
Under UNIX, we used DFKI's
PPP Animated Presentation Agent
, which provided both graphics and a
a powerful multimedia generation system based on planning technology.
We also used for a time an Java-based animation package called the
Gamelet toolkit written by Mark Tachi.
- We have written agents enabling us to incorporate either Netscape
or Internet Explorer as the content viewer. A previous incarnation of
the InfoWiz project made use of a modified variation of
NCSA's Mosaic version 2.6.
- Currently, for text output, we use text-to-speech provided as
part of Microsoft Agent. In previous versions,
the Wizard's voice was composed either of sequenced audio files played
by the PlayWav agent, OR text to speech generated by Entropic's TrueTalk
system. Combining recorded utterances limits what the Wizard can say, but
sounds much better. The system adapts dynamically to the current set of
agents connected to the network, using whichever output modalities are
available.
- To author, test and manage the knowledge that the InfoWiz has
over the InfoSpace, we created our own knowledge management tools and
a dialog processor to help guide the animations and verbal responses
of the character.
Since these technologies are brought together by the
Open Agent Architecture,
it is probable that other OAA-enabled agents will be brought to bear on
this project. Existing agents which may be considered for this project
include:
- Small Vision Module (SVM) stereo camera and algorithms for
detecting whether people are near the kiosk.
-
Gemini Natural Language Understanding System,
our most robust NL parser for building
spoken language systems.
CommandTalk
is an example of a project that first
started using the DCG-NL agent and later upgraded to Gemini as the domain
became better understood.
- Telephone control: ex: User: "I'm here to see Adam Cheyer", Wizard: "Let me call him for you..."
-
FASTUS: a system for extracting information from free text might be
applied to extracting knowledge from ever-changing webpages.
- Robots: SRI's family
of mobile robots have been integrated into the OAA (an OAA-coordinated
multiple-robot approach recently won the Office Navigation task of the
AAAI Robot Contest). How about: user: "I'm here to see Adam Cheyer",
Wizard: "Let me send someone to show you the way there..."
If you are interested in the evolution of this project, see the
InfoWiz Project Timeline.