AIC Seminar Series
Learning to Extract Proteins and their Interactions from Biomedical Text
| Raymond J. Mooney | University of Texas at Austin | [Home Page] |
Notice: hosted by Sugato Basu
Date: Friday July 28, 2006 at 14:30
Location: EJ228 (Directions)
|
|
Automatically extracting information from biomedical text holds the
promise of easily consolidating large amounts of biological knowledge
in computer-accessible form. This strategy is particularly attractive
for extracting data on human genes from the 11 million abstracts in
Medline. We have developed and evaluated a variety of learned
information-extraction systems for identifying human proteins and
their interactions in Medline abstracts. We will present our current
best results on identifying names of human proteins using Conditional
Random Fields and Relational Markov Networks. We will also present
our current best results on identifying interactions between proteins
using a Support Vector Machine with an underlying string
kernel. Finally, we will summarize results from a recent large-scale
application of our techniques, in which we mined 753,459 Medline
abstracts to extract a database of 6,580 interactions between 3,737
human proteins. By merging this extracted data with existing
databases, we have constructed (to our knowledge) the largest database
of known human-protein interactions containing 31,609 interactions
amongst 7,748 proteins.
Joint work with Razvan Bunescu, Edward Marcotte, Ruifang Ge, Rohit
Kate, Yuk-Wah Wong, and Arun Ramani.
| |
|
|
Bio for Raymond J. Mooney |
| |
|
Raymond J. Mooney is a Professor in the Department of Computer Sciences at the
University of Texas at Austin. He received his Ph.D. in 1988 from the University
of Illinois at Urbana/Champaign. He is an author of over 100 published research
papers, primarily in the area of machine learning. He is program co-chair for
the 2006 National Conference on Artificial Intelligence, a recent general chair
of the 2005 Human Language Technology Conference and Conference on Empirical
Methods in Natural Language Processing, a former co-chair of the 1990
International Conference on Machine Learning, a former editor of the Machine
Learning journal, and a Fellow of the American Association for Artificial
Intelligence. His recent research has focused on learning for natural-language
processing, text mining, statistical relational learning, semi-supervised
learning, bioinformatics, and autonomic computing.
| |