[ pdf | bib | abstract ]
|
Freitag, D., Chow, E., Kalmar, P., Muezzinoglu, T., and Niekrasz, J. (2012).
A corpus of online discussions for research into linguistic memes.
In Proc. 7th Web as Corpus Workshop (WAC-7).
|
[ pdf | bib | abstract ]
|
Niekrasz, J. (2012).
Toward Summarization of Communicative Activities in Spoken Conversation.
PhD thesis, University of Edinburgh.
|
[ pdf | bib | abstract ]
|
Niekrasz, J. and Moore, J. (2010).
Annotating participant reference in english spoken conversation.
In Proc. the Fourth Linguistic Annotation Workshop (LAW IV).
|
[ pdf | bib | abstract ]
|
Tur, G., Stolcke, A., Voss, L., Peters, S., Hakkani-Tur, D., Dowding, J., Favre, B., Fernández, R., Frampton, M., Frandsen, M., Frederickson, C., Graciarena, M., Kintzing, D., Leveque, K., Mason, S., Niekrasz, J., Purver, M., Riedhammer, K., Shriberg, E., Tien, J., Vergyri, D., and Yang, F. (2010).
CALO meeting assistance system.
Transactions on Audio, Speech and Language Processing.
The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system.
This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.
|
[ pdf | bib | abstract ]
|
Niekrasz, J. and Moore, J. (2009).
Participant subjectivity and involvement as a basis for discourse segmentation.
In Proceedings of the SIGDIAL 2009 Conference, pages 54-61.
Best Student Paper Award Nominee.
We propose a framework for analyzing episodic conversational activities in terms of expressed relationships between the participants and utterance content. We test the hypothesis that linguistic features which express such properties, e.g. tense, aspect, and person deixis, are a useful basis for automatic intentional discourse segmentation.
We present a novel algorithm and test our hypothesis on a set of intentionally segmented conversational monologues. Our algorithm performs better than a simple baseline and as well as or better than well-known lexical-semantic segmentation methods.
|
[ pdf | bib | abstract ]
|
Ehlen, P., Purver, M., Niekrasz, J., Lee, K., and Peters, S. (2008).
Meeting adjourned: Off-line learning interfaces for automatic meeting understanding.
In Proceedings of the 2008 International Conference on Intelligent User Interfaces.
Upcoming technologies will automatically identify and extract certain types of general information from meetings, such as topics and the tasks people agree to do. We explore interfaces for presenting this information to users after a meeting is completed, using two post-meeting interfaces that display information from topics and action items respectively. These interfaces also provide an excellent forum for obtaining user feedback about the performance of classification algorithms, allowing the system to learn and improve with time. We describe how we manage the delicate balance of obtaining necessary feedback without overburdening users. We also evaluate the effectiveness of feedback from one interface on improvement of future action item detection.
|
[ poster | bib | abstract ]
|
Niekrasz, J. (2008).
Using participant deixis in conversational NLP.
In Proceedings of the Workshop on Machine Learning and Multimodal Interaction (Student Poster Session).
Types of person deixis like referring, addressing, and the expression of personal attitudes are bound to communicative projects - intersubjective social activities around which conversation is locally organized. NLP technologies must not ignore this because it is necessary for the useful and meaningful characterization of dialogues. Progress will have direct benefits on summarisation, segmentation, indexing, and general understanding of conversations. The following examples demonstrate this.
|
[ pdf | bib | abstract ]
|
Tur, G., Stolcke, A., Voss, L., Dowding, J., Favre, B., Fernandez, R., Frampton, M., Frandsen, M., Frederickson, C., Graciarena, M., Hakkani-Tür, D., Kintzing, D., Leveque, K., Mason, S., Niekrasz, J., Peters, S., Purver, M., Riedhammer, K., Shriberg, E., Tien, J., Vergyri, D., and Yang, F. (2008).
The CALO meeting speech recognition and understanding system.
In Proceedings of the 2008 IEEE Workshop on Spoken Language Technology.
The CALO Meeting Assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper summarizes the CALO-MA architecture and its speech recognition and understanding components,which include real-time and offline speech transcription, dialog act segmentation and tagging, question-answer pair identification, action item recognition, decision extraction, and summarization.
|
[ pdf | poster | bib | abstract ]
|
Gupta, S., Niekrasz, J., Purver, M., and Jurafsky, D. (2007).
Resolving ”you” in multi-party dialog.
In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue.
This paper presents experiments into the resolution of ”you” in multi-party dialog, dividing this process into three tasks: distinguishing between generic and referential uses; distinguishing between singular and plural reference; and identifying the referred-to addressee(s).
First we perform a multi-corpus experiment into referentiality detection, achieving an accuracy of 73.8% on multi-party data. Our next experiment deals with singular vs. plural reference, achieving an accuracy of 71.4%. Our last experiment is on the task of addressee identification for referential ”you” utterances, achieving an accuracy of 67% without the use of visual information; the output of the first two experiments is shown to help.
|
[ pdf | poster | bib | abstract ]
|
Purver, M., Dowding, J., Niekrasz, J., Ehlen, P., and Noorbaloochi, S. (2007).
Detecting and summarizing action items in multi-party dialogue.
In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue.
This paper addresses the problem of identifying action items discussed in open-domain conversational speech, and does so in two stages: firstly, detecting the subdialogues in which action items are proposed, discussed and committed to; and secondly, extracting the phrases that accurately capture or summarize the tasks they involve. While the detection problem is hard, we show that by taking account of dialogue structure we can achieve reasonable accuracy. We then describe a semantic parser that identifies potential summarizing phrases, and show that for some task properties these can be more informative than plain utterance transcriptions.
|
[ pdf | poster | bib | abstract ]
|
Ehlen, P. and The CALO Team (2007).
Multimodal meeting capture and understanding with The CALO Meeting Assistant.
In Proceedings of the 2007 Machine Learning and Multimodal Interaction Workshop (MLMI).
Demo.
The CALO Meeting Assistant is a multimodal meeting assistant technology that integrates speech, gestures, and multimodal data collected from multiparty interactions during meetings. Using machine learning and robust discourse processing, it provides a rich, browsable record of a meeting.
|
[ pdf | bib | abstract ]
|
Voss, L., Ehlen, P., and The DARPA CALO Meeting Assistant Project Team (2007).
The CALO Meeting Assistant.
In Proceedings of the 2007 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT).
Demo.
The CALO Meeting Assistant is an integrated, multimodal meeting assistant technology that captures speech, gestures, and multimodal data from multiparty interactions during meetings, and uses machine learning and robust discourse processing to provide a rich, browsable record of a meeting.
|
[ pdf | bib | abstract ]
|
Ehlen, P., Purver, M., and Niekrasz, J. (2007).
A meeting browser that learns.
In Interaction Challenges for Intelligent Assistants: Papers from the 2007 AAAI Spring Symposium: Technical Report SS-07-04, pages 33-40. AAAI Press.
We present a system for extracting useful information from multi-party meetings and presenting the results to users via a browser. Users can view automatically extracted discussion topics and action items, initially seeing high-level descriptions, but with the ability to click through to meeting audio and video. Users can also add value by defining and searching for new topics and editing, correcting, deleting, or confirming action items. These feedback actions are used as implicit supervision by the understanding agents, retraining classifier models for improved or user-tailored performance.
|
[ pdf | bib | abstract ]
|
Gruenstein, A., Niekrasz, J., and Purver, M. (2007).
Meeting structure annotation: Annotations collected with a general purpose toolkit.
In Dybkjaer, L. and Minker, W., editors, Recent Trends in Discourse and Dialogue, Text, Speech and Language Technology series.
Springer-Verlag.
We describe a generic set of tools for representing, annotating, and analyzing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and NOMOS - a flexible and extensible toolkit for browsing and annotating discourse.
We describe applications built using the NOMOS framework to facilitate a real annotation task, as well as for visualizing and adjusting features for machine learning tasks. We then present a set of of hierarchical topic segmentations and action item subdialogues collected over 56 meetings from the ICSI and ISL meeting corpora using our tools. These annotations are designed to support research towards automatic meeting understanding.
|
[ pdf | bib | abstract ]
|
Purver, M., Niekrasz, J., and Ehlen, P. (2007).
Automatic annotation of dialogue structure from simple user interaction.
In Popescu-Belis, A., Renals, S., and Bourlard, H., editors, Machine Learning for Multimodal Interaction: Fourth International Workshop, MLMI 2007, Brno, Czech Republic, Revised Selected Papers, volume 4892 of Lecture Notes in Computer Science, pages 44-59. Springer-Verlag.
Previously, we presented a method for automatic detection of action items from natural conversation. This method relies on supervised classification techniques that are trained on data annotated according to a hierarchical notion of dialogue structure; data which are expensive and time-consuming to produce. Subsequently, we presented a meeting browser which allows users to view a set of automatically-produced action item summaries and give feedback on their accuracy. In this paper, we investigate methods of using this kind of feedback as implicit supervision, in order to bypass the costly annotation process and enable machine learning through use. We investigate, through the transformation of human annotations into hypothetical idealized user interactions, the relative utility of various modes of user interaction as well as various techniques for automatically producing training instances from interaction. We show that performance improvements are possible from interaction alone, even with interfaces that present very low cognitive load to users.
|
[ pdf | bib | abstract ]
|
Chaudhri, V. K., Cheyer, A., Guili, R., Jarrold, B., Myers, K. L., and Niekrasz, J. (2006).
A case study in engineering a knowledge base for an intelligent personal assistant.
In Proceedings of the Semantic Desktop and Social Semantic Collaboration Workshop (SemDesk).
We present a case study in engineering a knowledge base to meet the requirements of an intelligent personal assistant. The assistant is designed to function as part of a semantic desktop application, with the goal of helping a user manage and organize his information as well as supporting the user in performing tasks. We describe the knowledge base development process, the knowledge engineering challenges we faced in the process and our solutions to them, and important lessons learned during the process.
|
[ pdf | bib | abstract ]
|
Ehlen, P., Laidebeure, S., Niekrasz, J., Purver, M., Dowding, J., and Peters, S. (2006).
Browsing meetings: Automatic understanding, presentation and feedback for multi-party conversations.
In Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, pages 173-174.
We present a system for extracting useful information from multi-party meetings and presenting the results to users via a browser. Users can view automatically extracted discussion topics and action items, initially seeing high-level descriptions, but with the ability to click through to meeting audio and video. Users can also add value: new topics can be defined and searched for, and action items can be edited or corrected, deleted or confirmed. These feedback actions are used as implicit supervision by the understanding agents, retraining classifier models for improved or user-tailored performance.
|
[ pdf | bib | abstract ]
|
Purver, M., Ehlen, P., and Niekrasz, J. (2006).
Shallow discourse structure for action item detection.
In Proceedings of the 2006 HLT-NAACL Workshop on Analyzing Conversations in Text and Speech, pages 31-34.
We investigated automatic action item detection from transcripts of multi-party meetings. Unlike previous work (Gruenstein et al., 2005), we use a new hierarchical annotation scheme based on the roles utterances play in the action item assignment process, and propose an approach to automatic detection that promises improved classification accuracy while enabling the extraction of useful information for summarization and reporting.
|
[ pdf | bib | abstract ]
|
Niekrasz, J. and Gruenstein, A. (2006).
NOMOS: A Semantic Web software framework for annotation of multimodal corpora.
In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC).
We present NOMOS, an open-source software framework for annotation, processing, and analysis of multimodal corpora. NOMOS is designed for use by annotators, corpus developers, and corpus consumers, emphasizing configurability for a variety of specific annotation tasks. Its features include synchronized multi-channel audio and video playback, compatibility with several corpora, platform independence, and mixed display of temporal, non-temporal, and relational information. We describe NOMOS from two perspectives. First, we present its software architecture, highlighting its principal difference from comparable systems: its use of an OWL-based semantic annotation back-end which provides automatic inference capabilities and a well-defined method for layering datasets.
Second, we describe how the system is used. For corpus development and annotation we present a typical use scenario involving the creation of a schema and specialization of the user interface. For processing and analysis we describe the GUI- and Java-based methods available, including a GUI for query construction and execution, and an automatically generated schema-conforming Java API for processing of annotations.
Additionally, we present some specific annotation and research tasks for which NOMOS has been specialized and used, including topic segmentation and decision-point annotation of meetings.
|
[ pdf | bib | abstract ]
|
Niekrasz, J. and Purver, M. (2006).
A multimodal discourse ontology for meeting understanding.
In Renals, S. and Bengio, S., editors, Machine Learning for Multimodal Interaction: Second International Workshop, MLMI 2005, Edinburgh, UK, July 11-13, 2005, Revised Selected Papers, volume 3869 of Lecture Notes in Computer Science, pages 162-173. Springer.
In this paper, we present a multimodal discourse ontology that serves as a knowledge representation and annotation framework for the discourse understanding component of an artificial personal office assistant.
The ontology models components of natural language, multimodal communication, multi-party dialogue structure, meeting structure, and the physical and temporal aspects of human communication. We compare our models to those from the research literature and from similar applications.
We also highlight some annotations which have been made in conformance with the ontology as well as some algorithms which have been trained on these data and suggest elements of the ontology that may be of immediate interest for further annotation by human or automated means.
|
[ pdf | bib | abstract ]
|
Purver, M., Ehlen, P., and Niekrasz, J. (2006).
Detecting action items in multi-party meetings: Annotation and initial experiments.
In Renals, S., Bengio, S., and Fiscus, J., editors, Machine Learning for Multimodal Interaction: Third International Workshop, MLMI 2006, Bethesda, MD, USA, May 1-4, 2006, Revised Selected Papers, volume 4299 of Lecture Notes in Computer Science, pages 200-211. Springer.
This paper presents the results of initial investigation and experiments into automatic action item detection from transcripts of multi-party human-human meetings. We start from our previous flat action item annotations, and show that automatic classification performance is limited. We then describe a new hierarchical annotation schema based on the roles utterances play in the action item assignment process, and propose a corresponding approach to automatic detection that promises improved classification accuracy while also enabling the extraction of useful information for summarization and reporting.
|
[ pdf | bib | abstract ]
|
Gruenstein, A., Niekrasz, J., and Purver, M. (2005).
Meeting structure annotation: Data and tools.
In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, pages 117-127.
We present a set of annotations of hierarchical topic segmentations and action item subdialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting understanding and analysis. We describe an architecture for representing, annotating, and analyzing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and an audiovisual toolkit which facilitates browsing and annotating discourse, as well as visualizing and adjusting features for machine learning tasks.
|
[ pdf | bib | abstract ]
|
Pallotta, V., Niekrasz, J., and Purver, M. (2005).
Collaborative and argumentative models of natural discussions.
In Proceedings of the 5th Workshop on Computational Models of Natural Argument (CMNA).
We report in this paper experiences and insights resulting from the first two years of work in two similar projects on meeting tracking and understanding. The projects are the DARPA-funded CALO project and the Swiss National research project IM2. The findings from these two projects have been shared and compared in order to come up with a joint ontology as a model for argumentative discussions in meetings.
We highlight the complexity of the problem in modeling interaction and discourse in argumentative discussions and we propose a solution based on the construction of a specific knowledge base.
|
[ pdf | bib | abstract ]
|
Purver, M., Niekrasz, J., and Peters, S. (2005).
Ontology-based multi-party meeting understanding.
In Proceedings of the 2005 CHI Workshop on The Virtuality Continuum Revisited.
This paper describes current and planned research efforts towards developing multimodal discourse understanding for an automated personal office assistant. The research is undertaken as part of a project called The Cognitive Agent that Learns and Organizes (CALO) (see http://www.ai.sri.com/project/CALO). The CALO assistant is intended to aid users both personally and as a group in performing office-related tasks such as coordinating schedules, providing relevant information for completing tasks, making a record of meetings, and assisting in fulfilling decisions.
|
[ pdf | bib | abstract ]
|
Niekrasz, J., Purver, M., Dowding, J., and Peters, S. (2005).
Ontology-based discourse understanding for a persistent meeting assistant.
In Persistent Assistants: Living and Working with AI: Papers from the 2005 AAAI Spring Symposium: Technical Report SS-05-05, pages 26-33. AAAI Press.
In this paper, we present research toward ontology-based understanding of discourse in meetings and describe an ontology of multimodal discourse designed for this purpose. We investigate its application in an integrated but modular architecture which uses semantically annotated knowledge of communicative meeting activity as well as discourse subject matter.
We highlight how this approach assists in improving system performance over time and supports understanding in a changing and persistent environment. We also describe current and future plans for ontology-driven robust natural language understanding in the presence of the highly ambiguous and errorful input typical of the meeting domain.
|
[ pdf | bib | abstract ]
|
Cheng, H., Bratt, H., Mishra, R., Shriberg, E., Upson, S., Chen, J., Weng, F., Peters, S., Cavedon, L., and Niekrasz, J. (2004).
A Wizard of Oz framework for collecting spoken human-computer dialogs.
In Proceedings of the 8th International Conference on Spoken Language Processing (INTERSPEECH - ICSLP), pages 2269-2272.
This paper describes a data collection process aimed at gathering human-computer dialogs in high-stress or ”busy” domains where the user is concentrating on tasks other than the conversation, for example, when driving a car. Designing spoken dialog interfaces for such domains is extremely challenging and the data collected will help us improve the dialog systeminterface and performance, understand howhumans performthese tasks with respect to stressful situations, and obtain speech utterances for extracting prosodic features. This paper describes the experimental design for collecting speech data in a simulated driving environment.
|
[ pdf | bib | abstract ]
|
Kaiser, E., Demirdjian, D., Gruenstein, A., Li, X., Niekrasz, J., Wesson, M., and Kumar, S. (2004).
A multimodal learning interface for sketch, speak and point creation of a schedule chart.
In Proceedings of the 6th International Conference on Multimodal Interfaces (ICMI), pages 329-330. ACM Press.
We present a video demonstration of an agent-based test bed application for ongoing research into multi-user, multimodal, computer-assisted meetings. The system tracks a two person scheduling meeting: one person standing at a touch sensitive whiteboard creating a Gantt chart, while another person looks on in view of a calibrated stereo camera. The stereo camera performs real-time, untethered, vision-based tracking of the onlooker's head, torso and limb movements, which in turn are routed to a 3D-gesture recognition agent. Using speech, 3D deictic gesture and 2D object de-referencing the system is able to track the onlooker's suggestion to move a specific milestone.
The system also has a speech recognition agent capable of recognizing out-of-vocabulary (OOV) words as phonetic sequences. Thus when a user at the whiteboard speaks an OOV label name for a chart constituent while also writing it, the OOV speech is combined with letter sequences hypothesized by the handwriting recognizer to yield an orthography, pronunciation and semantics for the new label. These are then learned dynamically by the system and become immediately available for future recognition.
|
[ pdf | bib | abstract ]
|
Niekrasz, J., Gruenstein, A., and Cavedon, L. (2004).
Multi-human dialogue understanding for assisting artifact-producing meetings.
In Proceedings of the 20th International Conference on Computational Linguistics (COLING), pages 432-438.
In this paper we present the dialogue understanding components of an architecture for assisting multi-human conversations in artifact-producing meetings: meetings in which tangible products such as project planning charts are created. Novel aspects of our system include multimodal ambiguity resolution, modular ontology-driven artifact manipulation, and a meeting browser for use during and after meetings. We describe the software architecture and demonstrate the system using an example multimodal dialogue.
|
[ pdf | bib | abstract ]
|
Gruenstein, A., Cavedon, L., Niekrasz, J., Widdows, D., and Peters, S. (2004).
Managing uncertainty in dialogue information state for real time understanding of multi-human meeting dialogues.
In Proceedings of the 8th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 152-153.
We are concerned with tracking and understanding dialogue between multiple human participants specifically, in meetings in such a way that the dialogue system does not intervene. In this scenario, the system is not able to provide feedback on whether or not it has understood, and is unable to ask for clarification or ambiguity resolution. Our ultimate aim is to model humanhuman dialogue (to the extent that it is feasible) in real-time, providing useful services (e.g. relevant document retrieval) and answering queries about the dialogue state and history (e.g. what action items do we have so far?). Our approach has been to extend our existing dialogue system, based on the information-state update approach which supports a rich semantic interpretation of multi-utterance constructions to cope with the added uncertainty inherent in two-person meetings in which the participants speak, point, and draw on a whiteboard.
|