Learning when Concepts Abound
|Omid Madani||Yahoo! Research||[Home Page]|
Notice: hosted by David Israel
Date: 2007-09-12 at 11:00
Location: EJ228 (SRI E building) (Directions)
Categorization is fundamental to intelligence. Without categories (concepts or classes), every experience would be brand new, and one couldn't make sense of one's world. A system also requires numerous categories for increased intelligence. In today's applications, topical categories in text categorization and object classes in vision can easily exceed tens of thousands. Recommendation, personalization, image tagging, and many other tasks can benefit from scalable many-class learning. One may then ask: How can a system efficiently learn to efficiently categorize in the presence of myriad categories?
I will develop the problem and present efficient online learning algorithms that show promise. We have observed, in several text categorization and prediction tasks, that the algorithms train on 100s of thousands of instances and 1000s of classes in minutes. Other methods, such as one-versus-rest and taxonomy based methods using support vector machines can take hours or days and consume substantially more memory. Categorization accuracies remain competitive. The core idea here is to learn an index, that is, a sparse weighted bipartite graph that maps each feature to a small number of categories. No taxonomy or separate feature reduction step is required.
The realization that efficient large scale many-class supervised learning is a good possibility is suggestive. Given time, I hope to touch on my work on the following question: How may a system efficiently acquire millions of inter-related categories, on its own? This latter work is in the context of prediction, currently in text, akin to language modeling.
In collaboration with: Michael Connor, Wiley Greiner, Jian Huang, David Kempe, and Mohammad Salavatipour.
Omid Madani is a senior research scientist at Yahoo! Research. He earned a PhD in computer science from the University of Washington in 2000 (thesis topic: computational complexity of Markov decision processes), and was a post-doctoral fellow at the University of Alberta in 2001-2003, at which time he was awarded the Alberta Ingenuity Associateship.
Omid is interested in all aspects of intelligence. His current research includes themes under large-scale learning and autonomous learning systems. He has successfully applied his research to applications in information retrieval.
Please arrive at least 10 minutes early in order to sign in and be escorted to the conference room. SRI is located at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the visitors lot in front of Building E, and should follow the instructions by the lobby phone to be escorted to the meeting room. Detailed directions to SRI, as well as maps, are available from the Visiting AIC web page.