AIC Seminar Series
Bayesian Topic Models
| Thomas Griffiths | University of California, Berkeley | [Home Page] |
Notice: hosted by Sugato Basu
Date: Thursday September 21, 2006 at 16:00
Location: EJ228 (Directions)
|
|
Electronic documents provide vast amounts of information, but need to be organized in a way that lets people use that information. Topic models provide one way of approaching this problem, automatically identifying the "topics" that appear in a collection of documents, and indicating the extent to which each document reflects each topic. I will summarize the basic ideas behind one such model, Latent Dirichlet Allocation (Blei, Ng, & Jordan, 2003), and use this model to describe how tools from Bayesian statistics can be useful in statistical natural language processing. In particular, I will introduce a simple algorithm for identifying topics from documents, based on Markov chain Monte Carlo, and show how this approach makes it easy to extend the basic topic model to incorporate syntax, model the interests of authors, infer topic hierarchies, and pick out topically coherent segments of dialogue.
| |
|
Tom Griffiths is an Assistant Professor of Psychology and Cognitive Science at UC Berkeley, having just joined the faculty this summer. His research explores connections between human and machine learning, using ideas from statistics and artificial intelligence to try to understand how people solve the challenging computational problems they encounter in everyday life. He received his PhD in Psychology from Stanford University in 2005, and taught in the Department of Cognitive and Linguistic Sciences at Brown University before moving to UC Berkeley.
| |