Discovery of Numerous Specific Topics via Term Co-Occurrence Analysis
by Madani, O. and Yu, J.
Technical Note 569
Institution: AI Center, SRI International
Address: 569
March 2011.
We describe techniques for the construction of term co-occurrence graphs and explore an application of such graphs to the discovery of tens of thousands of fine-grained, that is specific rather than broad, topics. A topic corresponds to a small dense subgraph in our work. We discover topics by randomized local searches (constrained random walks) initiated at each term (node) in the graph. The mined topics are highly interpretable, and reveal the different meanings of a term in the corpus. We explore document tagging via the induced topics, and demonstrate the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in text classification accuracy over the standard bag-of-words tfidf representation, using SVM classification, even at high training proportions when it is difficult to improve over the tfidf representation. We investigate the effect of various options and parameters, including window size during graph construction and variants of a tagging strategy, on the accuracy of classification. We explain how a layered pyramidal view of the term distribution helps in understanding the algorithms and in visualizing and interpreting the topics.
![]() Adobe PDF |
![]() BibTeX |
![]() EndNote |
![]() |
Cognitive Assistant that Learns and OrganizesAs part of DARPAs Personalized Assistant that Learns (PAL) program, SRI and team members are working on developing a next-generation "Cognitive Agent that Learns and Organizes" (CALO). |
Name | Title | ||
---|---|---|---|
![]() |
Madani, Omid | Senior Computer Scientist | |
Yu, Jiye | Computer Scientist |