As noted in the previous section, direct manipulation and natural language seem to be very complementary modalities. It is therefore not surprising that a number of multimodal systems combine the two.
Notable among such systems is the Cohen's Shoptalk system [[6]], a prototype manufacturing and decision-support system that aids in tasks such as quality assurance monitoring, and production scheduling. The natural language module of Shoptalk is based on the Chat-85 natural language system [[26]] and is particularly good at handling time, tense, and temporal reasoning.
A number of systems have focused on combining the speed of speech with the reference provided by direct manipulation of a mouse pointer. Such systems include the XTRA system [[1]], CUBRICON [[15]], the PAC-Amodeus model [[16]], and TAPAGE [[9], [12]].
XTRA and CUBRICON are both systems that combine complex spoken input with mouse clicks, using several knowledge sources for reference identification. CUBRICON's domain is a map-based task, making it similar to the application developed in this paper. However, the two are different in that CUBRICON can only use direct manipulation to indicate a specific item, whereas our system produces a richer mixing of modalities by adding both gestural and written language as input modalities.
PAC-Amodeus systems such as VoicePaint and Notebook allow the user to synergistically combine vocal or mouse-click commands when interacting with notes or graphical objects. However, due in part to the selected domains, the natural language input is very simple, generally of the style ``Insert a note here.''
TAPAGE is another system that allows true synergistic combination of spoken input with direct manipulation. Like PAC-Amodeus, TAPAGE's domain provides only simple linguistic input. However, TAPAGE uses a pen-based interface instead of a mouse, allowing gestural commands. TAPAGE, selected as one of the ``building blocks'' for our map application, will be described more in detail in section 4.2.
Other pertinent work regarding the simultaneous combination of handgestures and gaze can be found in [[2], [13]].