%0 Report %@ 569 %A Madani, O. and Yu, J. %T Discovery of Numerous Specific Topics via Term Co-Occurrence Analysis %C 569 %I AI Center, SRI International %D 2011 %K co-occurrence graphs, association graphs, topic mining, semicliques, graph algorithms, local search, randomized search, feature induction, feature augmentation %X We describe techniques for the construction of term co-occurrence graphs and explore an application of such graphs to the discovery of tens of thousands of fine-grained, that is specific rather than broad, topics. A topic corresponds to a small dense subgraph in our work. We discover topics by randomized local searches (constrained random walks) initiated at each term (node) in the graph. The mined topics are highly interpretable, and reveal the different meanings of a term in the corpus. We explore document tagging via the induced topics, and demonstrate the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in text classification accuracy over the standard bag-of-words tfidf representation, using SVM classification, even at high training proportions when it is difficult to improve over the tfidf representation. We investigate the effect of various options and parameters, including window size during graph construction and variants of a tagging strategy, on the accuracy of classification. We explain how a layered pyramidal view of the term distribution helps in understanding the algorithms and in visualizing and interpreting the topics. %U http://www.ai.sri.com/pubs/files/1822.pdf
