The goal of the Multimodal Map application is to explore natural ways of communicating with a community of agents. Inspired by the way a professor would instruct his students at a blackboard, through combinations of drawing, writing, speaking, gesturing, circling, underlining and so forth, the Multimodal Map provides an interactive interface on which the user may draw, write or speak. In a travel planning domain (Figure 5), available information includes hotel, restaurant, and tourist-site data retrieved by distributed software agents from commercial Internet sites. The types of user interactions and multimodal issues handled by the application can be illustrated by a brief scenario from [[Cheyer et al.1998]] featuring working examples taken from the current system.
Sara is planning a business trip to San Francisco, but would like to schedule some activities for the weekend while she is there. She turns on her laptop PC, executes a map application, and selects San Francisco. 2.1 [Speaking] Where is downtown? Map scrolls to appropriate area. 2.2 [Speaking and drawing region] Show me all hotels near here. Icons representing hotels appear. 2.3 [Writes on a hotel] Info? A textual description (price, attributes, etc.) appears. 2.4 [Speaking] I only want hotels with a pool. Some hotels disappear. 2.5 [Draws a crossout on a hotel that is too close to a highway] Hotel disappears 2.6 [Speaking and circling] Show me a photo of this hotel. Photo appears. 2.7 [Points to another hotel] Photo appears. 2.8 [Speaking] Price of the other hotel? Price appears for previous hotel. 2.9 [Speaking and drawing an arrow] Scroll down. Display adjusted. 2.10 [Speaking and drawing an arrow toward a hotel] What is the distance from this hotel to Fisherman's Wharf? Distance displayed. 2.11 [Pointing to another place and speaking] And the distance to here? Distance displayed. Sara decides she could use some human advice. She picks up the phone, calls Bob, her travel agent, and writes Start collaboration to synchronize his display with hers. At this point, both are presented with identical maps, and the input and actions of one will be remotely seen by the other. 3.1 [Sara speaks and circles two hotels] Bob, I'm trying to choose between these two hotels. Any opinions? 3.2 [Bob draws an arrow, speaks, and points] Well, this area is really nice to visit. You can walk there from this hotel. Map scrolls to indicated area. Hotel selected. 3.3 [Sara speaks] Do you think I should visit Alcatraz? 3.4 [Bob speaks] Map, show video of Alcatraz. Video appears. 3.5 [Bob speaks] Yes, Alcatraz is a lot of fun.
For this system, the main research focus is on how to generate the most appropriate interpretation for the incoming streams of multimodal input. Besides providing a user interface to a dynamic set of distributed agents, the application is built using an agent framework, with the OAA helping coordinate competition and cooperation among information sources, which work in parallel to resolve the ambiguities arising at every level of the interpretation process:
This list is by no means exhaustive. Examples of other resolution methods include spatial reasoning (``the hotel between Fisherman's Wharf and Lombard Street'') and user preferences (``near my favorite restaurant'').
The implementation of the Multimodal Map application exploits several features of the OAA: