ATTAIN lexical tool agent (ALTA)

  • Description
  • Quick start
  • Using ALTA
  • More details

  •  

    Description


    The attain_lex_tool_agent (ALTA) is meant as a tool to help
    developers create lexical entries for words to be used by the
    attain_nl_agent. These words can be words specific to the
    application, or they can be additional definitions for words that
    already exist in the attain_nl_agent or in other agents.
     

    Quick start


    To use ALTA, first make sure that a facitator and the attain_nl_agent
    are running. Then start ALTA from the command-line. Follow the
    directions at the prompt. Entries can be temporarily added to the
    attain_nl_agent for testing by using the "a" command at the highest
    level prompt. Entries can be saved to a file for further editting or
    use (e.g. by the vocabulary_agent) by using the "s" command at the
    highest level prompt.

    Using ALTA


    Creating the correct lexical entries for new vocabulary words by hand
    can be a bit tricky, so ALTA is designed to help you with this
    task. If the number of new words or definitions is small, ALTA can be
    used to define all the words. If the number of new words is large, it
    can be used to define a few model words, and the rest of the words
    could be created separately (e.g. using a text editor) based on those
    models.

    The agent works by asking the user a series of questions about how the
    word being defined will be used in the appication. Most words have a
    wide range of uses, yet only some of those will be appropriate for a
    given application. For example, in the context of this document,
    "agent" refers to a piece of software, not a person who makes deals
    for a movie star. It is very important always to keep in mind the
    particular uses of the word in the context of the application and not
    get distracted by other uses.

    -Entering the word

    There are four types of information that ALTA will try to ascertain,
    corresponding to the four types of information in a maximally
    specified lexical entry. The first type of information is simply the
    word (or phrase) itself. This in fact is the first thing ALTA will ask
    for. It is important to give the "citation" or "dictionary" form of
    the word. This is the form that would be given in the dictionary. Do
    not other forms of the word (see below). For example, define "goose",
    not "geese"; "run", not "ran"; "go", not "went"; etc.

    If the word is already defined in the attain_nl_agent, ALTA will show
    you the definitions. You can either accept those, or define your own
    new entries for the word.

    -The category of the word

    The second type of information is the category or "part of speech"
    (e.g. noun, verb, etc). In some cases, the determination is
    straightforward. However, for nouns, names, and verbs, ALTA will ask
    some questions to verify the category and/or to make some finer
    distinctions. For example, some verbs usually occur with a preposition
    (e.g. "search for something"). See below for more information
    concerning verbs.

    The questions usually ask you to consider whether the word might be
    used in a particular example sentence. It is very IMPORTANT to
    consider those sentences on their own, without mentally adding
    anything else. For example, in defining the verb "prefer", ALTA will
    ask:

        Consider this sentence:

           They always prefer.

        If this is fine AS IS or is nonsense, type y.
        If there is another problem, type n.
        Please answer y or n, or o to start over for "prefer":

    The sentence "They always prefer" is not fine AS IS. You have to
    "prefer" something, you can't just "prefer".

    It is important to distinguish between sentences that are
    grammatically incorrect from ones that are nonsense. A nonsense
    sentence is one that may not make sense, but which does not have
    grammatical problems. A nonsense sentence might be said in a story or
    a poem, but not in your application. An ungrammatical sentence would
    never be said, not even in a poem. Here's an example of a nonsense
    sentence which is grammatically correct:

       The round triangle slept through the night.

    There are (at least) two nonsensical things about that sentence:
    triangles aren't round and they don't sleep. However, we can
    understand what this *would* mean if triangles were round and
    slept. Here's an example of a nonsense sentence which is not
    grammatically correct:

       The colorless red birds goes.

    This sentence is nonsense since "colorless" and "red" are
    contradictory, but it is also ungrammatical since it should be "birds
    go" (or "bird goes").

    While ALTA tries not to give you nonsense sentences, it may happen. On
    the other hand, finding out which sentences containing your word are
    ungrammatical is an important part of how ALTA determines the category
    for your word.

    Not all of the grammatical distinctions ALTA checks for are relevant
    for the current ATTAIN grammar, but the procedure is there for future
    iterations of the grammar. In some instances, ALTA will determine that
    an additional categorization would probably lead to better accuracy,
    and so will suggest an additional entry. You are always able to delete
    or ignore unwanted entries.

    -The forms of the word

    The third type of information ALTA collects is the different forms of
    the word. For example nouns have singular and plural forms
    (e.g. goose/geese) and verbs have 4 "tense" forms (e.g. "go" has
    "goes", "going", "went", "gone"). Some words may have different
    variants that different people might use. For example, some people say
    "mother-in-laws" as the plural of "mother-in-law" while other people
    say "mothers-in-law". If you think your users might use different
    variants, ALTA allows you to specify them.

    -The translation of the word

    The fourth type of information ALTA collects is the translation,
    interpretation or "meaning" of the word. The default translation,
    which is what is used if you do not specify a "special" translation,
    is simply the word itself. However, it is often useful to specify that
    several words have the same translation. For example, "manager" and
    "boss" might both get translated as "manager"; or "Michael Jordan",
    "MJ", and "Jordan" might all get translated as "MJordan". You would
    define separate entries for each word, but they would all have the
    same translation and be equivalent for the purposes of natural
    language understanding.

    This technique allows for normalization of names and other words, and
    facilitates database lookups, since the user can use any (defined)
    variant without having to know which one actually occurs in the
    database.
     

    More details


    -A note about multiple words

    Sometimes you will want to define an entry that contains more than one
    word. Names are one example (e.g. "Michael Jordan"), as are certain
    verbs (e.g. "search for"). In all cases EXCEPT verbs, enter the whole
    phrase when ALTA prompts you for the word to define. For verbs, enter
    just the verb. For example, to define "Michael Jordan":
     

         Please enter the word you want to define followed by <return>.
         (remember to use the form of the word you'd find in a
         dictionary):  Michael Jordan
     

    To define "search for":

         Please enter the word you want to define followed by <return>.
         (remember to use the form of the word you'd find in a
         dictionary):  search

    -More about verbs

    Verbs are by far the trickiest type of word to define, so ALTA tries
    to give you lots of help. One common stumbling block is the use of
    other words that change the verb's meaning.

    For example, we've talked about "search for". Notice that searching
    for something is very different from searching someplace (e.g. the
    Web). There are many, many verbs in English which have a different
    meaning depending on which or whether they occur with other words,
    usually "prepositions". Here are a few examples:

        watch    vs    watch out vs watch out for
        shut     vs    shut up vs shut out  vs shut down
        talk     vs    talk to someone vs talk about something
        look     vs    look for vs look at
          vs look something up    vs look up to someone

    ALTA will ask a series of questions to determine whether the verb
    occurs with one or two prepositions, and what kind they are.  Notice
    the difference between these examples

         "I ran up a big bill" is the same as  "I ran a big bill up"
         "I ran up a big hill" is NOT the same as  "I ran a big hill up"

    (Note that "I ran a big hill up" is ungrammatical.)

    When in doubt, guess. You can then test the completed lexical entry to
    see if the word behaves as you think it should. If not, come back to
    ALTA and try a different set of answers. We've found that with a
    little practice (just a few of these complex verbs), people do get the
    hang of it.