AIC Seminar Series
Machine Learning of Large Datasets and applications to Biological Systems
Notice: Hosted by Jeffrey Davitz
Date: Friday, November 18th 2005 at 10:30am
Location: EJ228 (Directions)
In this talk I will describe my work in algorithms, data structures, and user interfaces for learning Bayesian Networks from large datasets. This work was motivated by a need to analyze biological systems.
Learning from data is a hard problem. In particular learning Bayesian Network structure from data is NP-hard. Thus, heuristic search is necessary to find good models. There are two approaches to improving heuristic search: 1) increasing the speed of model evaluation to enable searching a larger number of models in a given time and 2) using better heuristics to generate higher quality models early in the search.
I will present the AD+Tree and Queue Learning. The AD+Tree is a data structure that caches counts from the dataset efficiently, enabling fast evaluation of larger models. Queue Learning is an algorithm for learning Bayesian Network structure that can produce better models early in the search than existing techniques when applied to large datasets.
I will conclude with an example of the application of these techniques to gene expression data analysis.
Please arrive at least 10 minutes early as you will need to sign in by
following instructions by the lobby phone at Building E (or call Wilma
Lenz at 650 859 4904, or Vicenta at Lopez at 650 859 5750). SRI is
located at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the
parking lots off Fourth Street. Detailed directions to SRI, as well as maps,
are available from the Visiting AIC web page.
There are two entrances to SRI International located on Ravenswood Ave.
Please check the Building E entrance signage.
©2017 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493