The attain_lex_tool_agent (ALTA) is meant as a tool to help
developers create lexical entries for words to be used by the
attain_nl_agent. These words can be words specific to the
application, or they can be additional definitions for words that
already exist in the attain_nl_agent or in other agents.
To use ALTA, first make sure that a facitator and the attain_nl_agent
are running. Then start ALTA from the command-line. Follow the
directions at the prompt. Entries can be temporarily added to the
attain_nl_agent for testing by using the "a" command at the highest
level prompt. Entries can be saved to a file for further editting or
use (e.g. by the vocabulary_agent) by using the "s" command at the
highest level prompt.
Creating the correct lexical entries for new vocabulary words by
hand
can be a bit tricky, so ALTA is designed to help you with this
task. If the number of new words or definitions is small, ALTA can
be
used to define all the words. If the number of new words is large,
it
can be used to define a few model words, and the rest of the words
could be created separately (e.g. using a text editor) based on those
models.
The agent works by asking the user a series of questions about how the
word being defined will be used in the appication. Most words have
a
wide range of uses, yet only some of those will be appropriate for
a
given application. For example, in the context of this document,
"agent" refers to a piece of software, not a person who makes deals
for a movie star. It is very important always to keep in mind the
particular uses of the word in the context of the application and not
get distracted by other uses.
-Entering the word
There are four types of information that ALTA will try to ascertain,
corresponding to the four types of information in a maximally
specified lexical entry. The first type of information is simply the
word (or phrase) itself. This in fact is the first thing ALTA will
ask
for. It is important to give the "citation" or "dictionary" form of
the word. This is the form that would be given in the dictionary. Do
not other forms of the word (see below). For example, define "goose",
not "geese"; "run", not "ran"; "go", not "went"; etc.
If the word is already defined in the attain_nl_agent, ALTA will show
you the definitions. You can either accept those, or define your own
new entries for the word.
-The category of the word
The second type of information is the category or "part of speech"
(e.g. noun, verb, etc). In some cases, the determination is
straightforward. However, for nouns, names, and verbs, ALTA will ask
some questions to verify the category and/or to make some finer
distinctions. For example, some verbs usually occur with a preposition
(e.g. "search for something"). See below for more information
concerning verbs.
The questions usually ask you to consider whether the word might be
used in a particular example sentence. It is very IMPORTANT to
consider those sentences on their own, without mentally adding
anything else. For example, in defining the verb "prefer", ALTA will
ask:
Consider this sentence:
They always prefer.
If this is fine AS IS or is nonsense, type y.
If there is another problem, type n.
Please answer y or n, or o to start over for "prefer":
The sentence "They always prefer" is not fine AS IS. You have to
"prefer" something, you can't just "prefer".
It is important to distinguish between sentences that are
grammatically incorrect from ones that are nonsense. A nonsense
sentence is one that may not make sense, but which does not have
grammatical problems. A nonsense sentence might be said in a story
or
a poem, but not in your application. An ungrammatical sentence would
never be said, not even in a poem. Here's an example of a nonsense
sentence which is grammatically correct:
The round triangle slept through the night.
There are (at least) two nonsensical things about that sentence:
triangles aren't round and they don't sleep. However, we can
understand what this *would* mean if triangles were round and
slept. Here's an example of a nonsense sentence which is not
grammatically correct:
The colorless red birds goes.
This sentence is nonsense since "colorless" and "red" are
contradictory, but it is also ungrammatical since it should be "birds
go" (or "bird goes").
While ALTA tries not to give you nonsense sentences, it may happen.
On
the other hand, finding out which sentences containing your word are
ungrammatical is an important part of how ALTA determines the category
for your word.
Not all of the grammatical distinctions ALTA checks for are relevant
for the current ATTAIN grammar, but the procedure is there for future
iterations of the grammar. In some instances, ALTA will determine that
an additional categorization would probably lead to better accuracy,
and so will suggest an additional entry. You are always able to delete
or ignore unwanted entries.
-The forms of the word
The third type of information ALTA collects is the different forms of
the word. For example nouns have singular and plural forms
(e.g. goose/geese) and verbs have 4 "tense" forms (e.g. "go" has
"goes", "going", "went", "gone"). Some words may have different
variants that different people might use. For example, some people
say
"mother-in-laws" as the plural of "mother-in-law" while other people
say "mothers-in-law". If you think your users might use different
variants, ALTA allows you to specify them.
-The translation of the word
The fourth type of information ALTA collects is the translation,
interpretation or "meaning" of the word. The default translation,
which is what is used if you do not specify a "special" translation,
is simply the word itself. However, it is often useful to specify that
several words have the same translation. For example, "manager" and
"boss" might both get translated as "manager"; or "Michael Jordan",
"MJ", and "Jordan" might all get translated as "MJordan". You would
define separate entries for each word, but they would all have the
same translation and be equivalent for the purposes of natural
language understanding.
This technique allows for normalization of names and other words, and
facilitates database lookups, since the user can use any (defined)
variant without having to know which one actually occurs in the
database.
-A note about multiple words
Sometimes you will want to define an entry that contains more than one
word. Names are one example (e.g. "Michael Jordan"), as are certain
verbs (e.g. "search for"). In all cases EXCEPT verbs, enter the whole
phrase when ALTA prompts you for the word to define. For verbs, enter
just the verb. For example, to define "Michael Jordan":
Please enter the word you want to define followed
by <return>.
(remember to use the form of the word you'd
find in a
dictionary): Michael Jordan
To define "search for":
Please enter the word you want to define followed
by <return>.
(remember to use the form of the word you'd
find in a
dictionary): search
-More about verbs
Verbs are by far the trickiest type of word to define, so ALTA tries
to give you lots of help. One common stumbling block is the use of
other words that change the verb's meaning.
For example, we've talked about "search for". Notice that searching
for something is very different from searching someplace (e.g. the
Web). There are many, many verbs in English which have a different
meaning depending on which or whether they occur with other words,
usually "prepositions". Here are a few examples:
watch vs watch
out vs watch out for
shut vs
shut up vs shut out vs shut down
talk vs
talk to someone vs talk about something
look vs
look for vs look at
vs look something up
vs look up to someone
ALTA will ask a series of questions to determine whether the verb
occurs with one or two prepositions, and what kind they are.
Notice
the difference between these examples
"I ran up a big bill" is the same as
"I ran a big bill up"
"I ran up a big hill" is NOT the same as
"I ran a big hill up"
(Note that "I ran a big hill up" is ungrammatical.)
When in doubt, guess. You can then test the completed lexical entry
to
see if the word behaves as you think it should. If not, come back to
ALTA and try a different set of answers. We've found that with a
little practice (just a few of these complex verbs), people do get
the
hang of it.