This page under construction!
Open Agent Architecture (OAA) FAQ
Contents
The Open Agent Architecture is a framework in which a community of
software agents running on distributed machines can work together on
tasks assigned by human or non-human participants in the community.
Distributed cooperation and high-level communication are two ideas
central to the foundation of the OAA.
In the Open Agent Architecture, an agent is defined as being software
which conforms to the communication and functional standards imposed by
the OAA. In order for multiple agents to be able to cooperate,
OAA agents can delegate and receive work requests through a Faciliator agent.
These requests are expressed in an Interagent
Communication Language common to a all agents. In addition, agents
possess a common set of functionalities, such as the ability to install
local or remote triggers.
The OAA is useful for building complex systems in which there are many
components, and in which flexibility and extensibility is important.
It is probably true that if you have a frozen system, this system could be
built without using an agent architecture; however, if you want a system
that is adaptable, and can easily be extended (incrementally), agent architectures
offer a new paradigm of solutions.
At SRI, we are using the OAA as a way of integrating both commercial (or legacy)
systems with artificial intelligence technologies, such as planning, speech
recognition, natural language, etc.
The ability for humans to be able to communicate easily and naturally with
the community of agents is an important design requirement of the OAA.
The OAA has been used to implement a number of systems where intelligent
agents provide services to humans. These include:
- ADT: A set of Agent Development Tools are being created to simplify the
task of implementing new OAA agents and applications.
- Multimodal Maps:
The Open Agent Architecture is being used to implement
synergistic multimodal interfaces to distributed data sources.
-
CommandTalk: a spoken-language interface that will allow commanders
to use natural, spoken English commands to control simulated forces.
- Automated Office:
Control your office from remote locations -- agents
provide access and monitoring of your calendar, email, or database
applications over the telephone, a laptop, or PDA.
In our opinion, at least two characteristics distinguish distributed agents from distributed objects:
- Whereas programs accessing distributed objects must use low-level remote procedure call or method
invokation, agent communication takes place at a higher-level declarative level. The language used by
the agents to communicate must be powerful enough to describe natural language requests, so that humans
can communicate easily with other agents.
- Whereas distributed objects are generally passive sets of data and library routines, distributed agents
can ideally participate actively in a process. Agents may have their own goals and may reason about the
interactions of other agents, proactively providing information when appropriate. In addition, agents
can receive or make requests to install triggers or monitors of various kinds, keeping track of events
for another agent or human peer.
Distribute and coordinate tasks among client agents.
Domain independent. Specific knowledge encoded in meta-agents or
mediator agents.
Mutiple facilitators, Facilitators remain lightweight.
ICL
TCP-IP
Local or remote triggers. Data triggers, Event triggers, Time triggers,
Test triggers
The one agent that is used in every agent application is the Facilitator agent.
In addition, a growing number of support agents exist that are domain independent
and can be useful as participants in a given application domain. Examples of such
include:
- Natural Language Agent (three currently exist, which varying characteristics and functionality)
- Speech Recognition Agent
- Text-To-Speech Agent
- Database Agent
- Phone Agent (to permit mobile access to speech recognition)
Agent libraries and tools exist to create agents in the following programming
languages:
- Prolog
- C
- Lisp
- Microsoft Visual Basic
- Borland Delphi
- Sun's JAVA
A new agent can be created in a few minutes to a few days or longer, depending on how easy
it is to access the API of the functional core of the system an agent is interfacing with.
This assumes the programmer is familiar with OAA programming concepts and functionality.
To create a new agent, an agent developer usually performs the following
tasks.
- Specify what services an agent can provide and what services an
agent needs, expressed in the syntax of the Interagent Communication Language (ICL).
- Write the code for the agent, implementing the capabilities declared
solvable by the agent.
- Document the agent's functionalities, usually in Internet-readable
form so as to provide easy access to the documentation through
Web browsers and Internet search tools.
- Add vocabulary for the agent to the appropriate service agents such
as Nuance Speech Recognition or Gemini Natural Language agents.
- Include the agent as a community member for agent-based systems
which might make use of its services.
The Agent Development Tools (ADT) attempts to guide an agent programmer through
the steps required for developing an agent, automating the process as much
as possible.
Currently, OAA agents run on Sun sparcs (SunOS 4.1.3, Solaris 2.0), SGI (IRIX),
PC's and Wintel PDAs (Microsoft Windows 3.1, Windows for Workgroups, Windows 95, PenWindows).
In addition, user interface agents can be written using JAVA or HTML,
providing platform independent access to agent applications.
Currently, the Facilitator agent, which is required for all agents systems, runs only on
UNIX machines, so at least one of these machines is required for OAA development and deployment.
Telephone control
To allow a phone agent to receive incoming phone calls, interpret touch
tone input, accept speech input for recognition, and make outgoing
calls on behalf of the user, we use
Product: Computerfone, Model CF-4
Suncoast Systems, Inc.
3100 McCormick St., Box 22
Pensacola, FL 32514
(904) 478-6477
(904) 476-1875 (fax) attn: Neal Collier
Cabling:
The Computerfone has a DB-9 Male connector and is a DCE device
(that is, it uses the same cabling as a modem would).
Text-To-Speech
To provide audio information to the user, we generate messages on PC
interfaces using Creative Lab's Monologue For Windows, and on Sun's or SGI's,
we use Entropic's TrueTalk.
Product: TrueTalk
Entropic Research Laboratory, Inc.
600 Pennsylvania Ave., SE suite 202
Washington DC 20003
(202) 547-1420
or contact
Tom Veatch
(415) 322-6329
tv@sprex.com
Product: Monologue For Windows
Creative Labs, Inc.
1901 McCarthy Boulevard
Milpitas CA 95035
(408) 428-6600
(408) 428-6633 (fax)
Handwriting Recognition
For Handwriting Recognition, any PenWindows-compatible recognizer
will work, but we prefer the recognizer from:
Product: Handwriter for Windows
Communication Intelligence Corporation (CIC)
275 Shoreline Dr, Suite 520
Redwood City, CA 94065-1413
(415) 802-7888
Fax: (415) 802-7777
Can be bought for about $200-250 from the following resellers:
CDW (800) 495-4239, part #53808
Delware (800) 449-3355, part #379301
Tiger Direct (800) 888-4437, part #C-62-1000A
PCZone (800) 248-0800, part #w-179-09
CompUSA (800) 266-7872, part #114746
(retail stores in many cities)
There is a Japanese version that is currently available
on for the Macintosh. CIC's Japanese office is:
3 + 5276 + 9900 (and 9901)
Speech Recognition
We have developed an agent to interface with the large vocabulary, continuous
speech, speaker independent recognizer from:
Product: Nuance Speech Recognition Toolkit
Nuance Communications
333 Ravenswood, Bld 110
Menlo Park, CA 94025
(415) 614-8254
(415) 462-8201 (fax)
Attn: Troy Kamphuis
troyk@coronacorp.com
Network TCP/IP Software
Agent software should work on any winsock-compliant TCP/IP stack.
We have tested our software using TCP/IP stacks from Distinct,
Netmanage's Chameleon, Frontier Technologies' SuperTCP,and
Microsoft Windows95's stack.
Product: TCPIP Runtime #TCP-103
Distinct Corporation
12901 Saratoga Avenue
Saratoga, CA 95070
(408) 366-8933
Office Systems
A number of office-related systems are accessible through the OAA. These
include UNIX mail, calendar, Internet news and web applications as well
as X.500, Prolog and Oracle database applications.
Simple OAA systems can run on a Sun Sparcstation 1 with 16Meg of memory,
however some OAA-integrated technologies such as speech recognition or
text-to-speech require more. To run a full OAA system incorporating these
components, we recommend a Sun Sparctation 20 with 64Meg of memory or a
similarly equipped SGI.
Similarly, OAA user interface agents have been run on a 486 25mhz
portable computer with 4 Mb of memory, but we would recommend a more
powerful system, such as a 486 50Mhz machine with at least 8Mb,
especially if the system is running Windows 95 instead of Window 3.1.
The OAA has been used to implement a number of systems where intelligent
agents provide services to humans. These include:
- ADT: A set of Agent Development Tools are being created to simplify the
task of implementing new OAA agents and applications.
- Multimodal Maps:
The Open Agent Architecture is being used to implement
synergistic multimodal interfaces to distributed data sources.
-
CommandTalk: a spoken-language interface that will allow commanders
to use natural, spoken English commands to control simulated forces.
- Automated Office:
Control your office from remote locations -- agents
provide access and monitoring of your calendar, email, or database
applications over the telephone, a laptop, or PDA.
A. The current directions of the OAA group has been focused on bringing together advanced computer
technologies that have currently existed only in isolation, and through the combination of these
technologies, making computers smarter, easier and friendlier for use for the average human being.
Our future directions will continue along this path, towards this ultimate goal,
according to priorities set by our clients.
B. We believe that many interesting technologies have already been integrated into the OAA framework,
and there are still many yet to incorporate. In addition, a number of systems
have been built to demonstrate the capacities of these technologies. However, it is clear
that improvements in technology will continue for many years and we hope that the OAA will continue
to be part of this development.
Problem:
The command "startit" fails with following output:
% ./startit -noexpect ../office.config
Warning: locale not supported by Xlib, locale set to C
Warning: X locale modifiers not supported, using default
Segmentation fault (core dumped)
Solution:
The problem is Start-It is not finding the nls/ directory.
That directory is normally installed on Sun machines; a copy
exists on the AIC machines in
/home/trestle4/OAA/demo/nls.
Make sure to setenv XNLSPATH
to the nls directory.
Problem:
Start-It is having trouble executing programs or agents on either
local or remote machines.
Solution:
Make sure your environment supports the following check-list:
- Start-It currently only supports a "csh" or "tcsh" user environment.
It does not work when run from "sh".
- You must be able to rsh FROM the machine startit is running on,
TO every machine which you will run things on, ==> INCLUDING the
"from" machine itself (check the /etc/hosts.equiv file for this)
- make sure any info that needs to be in the user's environment,
e.g. the $path variable, get set in the "interactive shell" portion
of the user's .cshrc (i.e. outside of any checks for "$?prompt",
or more explicitly, when "$?prompt" is allowed to be false)
)
Problem:
The colors for Start-It are not correct.
Solution:
Start-It's colors are defined in an X resource file called Startit.
You may copy this file to one of your local directories and modify the
colors to your liking. To make sure that Start-It uses your modified
version of the resource file, set the environment variable
XUSERFILESEARCHPATH to include the directory where your file has been
stored.
Example:
setenv XUSERFILESEARCHPATH ./%N:/home/trestle4/OAA/demo/%
You should set up your microphone, optional preamp, and then
test the whole setup using /usr/demo/SOUND/soundtool. You
are striving for a good clean (flat) signal during silence
and a clean active one during speaking. Set the gain control
on your preamp or the volume level in soundtool appropriately.
You should also play back your recorded test to make sure that
it sounds clear and not distorted.
Be aware that there are two input sources on a Sun (mic and line)
but that these are mutually exclusive (ie it's a toggle).
If soundtool doesn't appear to hear you, this is probably the
problem.
Once you have found ideal setting values for your audio
configuration, you probably will want to make them permanent
with respect to Nuance by storing them in ~/nuance-resources.
This is described in a later section.
Setup environment
The following must be executed before doing anything with Nuance:
setenv NUANCE /home/YourNuancePath
source $NUANCE/SETUP
Create a grammar
Grammar files always end in a '.grammar' extension.
Remember that speech grammars are made up of NONTERMINALS
(which start with a capital letter) and combinations of
non-terminal vocabulary words.
NON_TERM (a b) # means a AND b
NON_TERM2 [a b] # means a OR b
NON_TERM3 ?a # means optionally a
NON_TERM3 *a # means 0 or more occurrences of a
NON_TERM4 +a # means 1 or more occurrences of a
Non-terminals can only refer to non-terminals that have been
defined above them in the grammar file (enforces regular
grammar constraint.
Special Non-Terminals starting with a '.' are called top-level
Non-Terminals and provide the starting point of the tree
to be recognized.
Here's a simple test grammar, in file 'test.grammar'.
Statement (testing [one two three])
Question (is this a ?great test)
.TOP [Statement Question]
Compile the grammar
To compile the grammar, type:
nuance-compile test ptm6
test is the basename for your grammar file, ptm6 is one of the
model sets that can be used (different model sets possess different
characteristics of robustness/speed).
If a word in your grammar file is unknown in Nuance's default
dictionary, the compile will print an error and put the word
into a file called 'test.missing'. You should then create a file
called 'test.dictionary' where you give the pronunciation for all
missing words in 'test.grammar'. You must specify the pronunciation
using using phonemes listed in the nuance manual (or in the ADT
users manual). An easy way to figure out the pronunciations for
a word is to look up pronunciations for words that sound similar
using the command 'pronounce word'.
After a successful compilation, Nuance creates a directory named
'test' containing all the required files for speech recognition.
This directory is now called a 'package'.
Nuance provides two tools for testing recognition: sample-application
(a text-based tester) and Xapp (an XWindows-based one).
To test your grammar, try :
sample-application -package ./test
OR
Xapp -package ./test
Sample phrases from our test grammar above might be:
testing one
is this a great test
is this a test
testing two
If you have problems (such as sample-application doesn't appear to
hear you when you speak), you should know that starting Nuance will
set all audio parameters to their default values as defined in the
file $NUANCE/data/nuance-resources.defaults. You can override the
default values by putting a nuance-resources file in your home
directory, with values appropriate to your audio setup. Here is
my nuance-resources file which sets some audio parameters such as
volume, input source, etc:
audio.InputSource line # may also be mic
audio.InputVolume 200 # range 0-255
audio.OutputVolume 200 # range 0-255
client.WriteWaveforms FALSE # don't save utterances
ep.EndSeconds 0.75 # end of speech silence wait
rec.BacktraceFinalsOnly TRUE # only allow complete sentences
config.ServerDebugWindow TRUE # show server trace window
Nuance has many parameters, which are listed in their manual.
Parameters may also be given at runtime for any Nuance-enabled
program as command line arguments: this is useful for testing
values to find the best one.
Example: sample-application -package test audio.InputVolume=150
If you have made it to this step, you should have no problem with
the next one. SRX is just an OAA-enabled Nuance client application,
very similar to the code in Xapp.
First, of course, run a Facilitator.
Start srx using
srx -rf srx.rf -package ./test
If Nuance complains about an incompatible version number, then the
version of srx that you are not using is not compatible with the version
of Nuance defined by your $NUANCE variable. Either:
- You must obtain a version of srx that is compiled with the library
files used by your current version of $NUANCE
- You must obtain a version of Nuance that is compatible with your
version of srx.
srx opens up an XWindows display which lets you test locally.
Clicking on the NextGrammar button will cycle through all the
top-level nonterminals in your grammar file. When you see one you
like, press enter in the Do: textfield.
Pressing clickToTalk button will start recognition. Abort
will abort recognition. You should set Trace to be On in the
Edit menu.
You can also test speech recognition by sending speech solvables
from another agent. For instance, from the XWindows oaa interface
agent, you could try solve(recognize(X),[]) to start recognition.
Problem
I want to add new users bitmaps to the Office Assistant user
interface on the PC. Also, how can I change the default user,
which always appears to be "Adam Cheyer".
Solution
You must edit the file /windows/interfac.ini.
[Configuration]
Use Mailtalk Agent=0
NumUsers=5
CurrentUser=1
[User Data]
LastName1=Cheyer
FirstName1=Adam
Picture1=C:\AGENT\BITMAPS\CHEYER.BMP
LastName2=Moran
FirstName2=Douglas
Picture2=C:\AGENT\BITMAPS\MORAN.BMP
LastName3=Martin
FirstName3=David
Picture3=C:\AGENT\BITMAPS\MARTIN.BMP
LastName4=Perrault
FirstName4=Raymond
Picture4=C:\AGENT\BITMAPS\PERRAULT.BMP
LastName5=Julia
FirstName5=Luc
Picture5=C:\AGENT\BITMAPS\JULIA.BMP
LastName6=
FirstName6=
Picture6=
LastName7=
FirstName7=
Picture7=
LastName8=
FirstName8=
Picture8=
You can specify your own users and their photos, as well as which
user is the default user (CurrentUser=1)
Problem
Computer (especially Sun Workstation) doesn't "talk" to the Computerfone.
Solution 1
The manual is initially confusing on whether the Computerfone is
DTE (terminal) or DCE (modem):
"whatever cable your host would normally use to connect it
to a modem or terminal would probably be adequate"
(page 2-12 of User Guide of 1-95).
Later on, it is documented as DCE (page 2-13),
which is what you would expect since it is similar to a modem.
To connect it to a Sun Workstation (DTE),
we use an AT Serial Modem Cable, DB9 Female to DB25 Male.
If you don't know if your cable is a modem cable or null modem cable,
try inserting a null modem into the connection chain
(modem + null-modem = null-modem; null-modem + null-modem = modem).
Solution 2
On Sun workstations where TTY ports A and B are combined in a single DB-25
connect, port B does NOT have modem control.
If you are using port B, you need to set the DIP jumper
in your Computerfone
to use XON/XOFF handshaking.
At the time of this writing, this was DIP G.
Solution 3
Many serial devices work in either RS-232 or the newer RS-423 mode.
The TTY ports on newer
Sun Workstations are coming configured as RS-423,
resettable to RS-232 by jumpers on the main board.
The Computerfone Manual says it has an RS-232 interface,
and on more than one occasion, switching the jumpers in a Sun to RS-232
fixed a problem with the Sun not talking to the Computerfone.
However, we have also had Computerfones work on Suns where the jumpers
were still set to RS-423.
For the Sun SPARCstation 10,
see "Changing the Serial Port Jumpers"
(pages 60-61)
in the "Desktop SPARC Hardware Owner's Guide--December 1992".
For the Sun SPARCstation 20,
see "Changing the RS423/232 Jumpers"
(chapter 6; pages 24-27)
in the "SPARCstation 20 Installation Guide").
In both the SS-10 and SS-20, the pair of jumpers that control this selection
are labeled J0801 and J0802
and are located very close to the rear panel
(requires some dexterity to switch the jumpers).
For the Sun Ultra, ****???