AIC Seminar Series
Machine Learning of Large Datasets and applications to Biological Systems
Notice: Hosted by Jeffrey Davitz
Date: 2005-11-18 at 10:30
Location: EJ228 (Directions)
In this talk I will describe my work in algorithms, data structures, and user interfaces for learning Bayesian Networks from large datasets. This work was motivated by a need to analyze biological systems.
Learning from data is a hard problem. In particular learning Bayesian Network structure from data is NP-hard. Thus, heuristic search is necessary to find good models. There are two approaches to improving heuristic search: 1) increasing the speed of model evaluation to enable searching a larger number of models in a given time and 2) using better heuristics to generate higher quality models early in the search.
I will present the AD+Tree and Queue Learning. The AD+Tree is a data structure that caches counts from the dataset efficiently, enabling fast evaluation of larger models. Queue Learning is an algorithm for learning Bayesian Network structure that can produce better models early in the search than existing techniques when applied to large datasets.
I will conclude with an example of the application of these techniques to gene expression data analysis.
Please arrive at least 10 minutes early as you will need to sign in by
following instructions by the lobby phone at Building E. SRI is located
at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the parking
lots off Fourth Street. Detailed directions to SRI, as well as maps, are
available from the Visiting AIC web page.
There are two entrances to SRI International located on Ravenswood Ave.
Please check the Builing E entrance signage.
©2014 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493