Search |  Contact |  SRI Home Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap.A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A ASRI International.  333 Ravenswood Avenue.  Menlo Park, CA 94025-3493. SRI International is a nonprofit corporation.

Publication Details

Discovery of Numerous Specific Topics via Term Co-Occurrence Analysis

by Madani, O. and Yu, J.

Technical Note 569
Institution: AI Center, SRI International
Address: 569
March 2011.

Order an AIC Technical Note

Abstract

We describe techniques for the construction of term co-occurrence graphs and explore an application of such graphs to the discovery of tens of thousands of fine-grained, that is specific rather than broad, topics. A topic corresponds to a small dense subgraph in our work. We discover topics by randomized local searches (constrained random walks) initiated at each term (node) in the graph. The mined topics are highly interpretable, and reveal the different meanings of a term in the corpus. We explore document tagging via the induced topics, and demonstrate the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in text classification accuracy over the standard bag-of-words tfidf representation, using SVM classification, even at high training proportions when it is difficult to improve over the tfidf representation. We investigate the effect of various options and parameters, including window size during graph construction and variants of a tagging strategy, on the accuracy of classification. We explain how a layered pyramidal view of the term distribution helps in understanding the algorithms and in visualizing and interpreting the topics.

Electronic Copies


Adobe PDF

BibTeX

EndNote

Associated Projects

CALO

Cognitive Assistant that Learns and Organizes

As part of DARPA’s Personalized Assistant that Learns (PAL) program, SRI and team members are working on developing a next-generation "Cognitive Agent that Learns and Organizes" (CALO).

AIC Personnel

Name Title E-mail
Madani, Omid Senior Computer Scientist
Yu, Jiye Computer Scientist

SRI International
©2014 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy