Search |  Contact |  SRI Home Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap.A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A ASRI International.  333 Ravenswood Avenue.  Menlo Park, CA 94025-3493. SRI International is a nonprofit corporation.

AIC Seminar Series

Scalable Partitioning and exploration of Chemical Spaces

Debojyoti DuttaUniversity of Southern California

Date:  Wednesday, March 15th 2006 at 1:00pm

Location:  EK255  (Directions)


Due to technological advances, there has been a rapid growth in the amount of bio-chemical data. Two examples of such large collections of data are small molecule libraries and collections of peptide mass spectra. There is an urgent need to design algorithms to efficiently learn, mine and interpret this ever-increasing corpus of data. In this talk, I will first present a framework to scale traditional applications such as classification, partitioning, and outlier detection for large chemical data sets without a significant loss of accuracy by applying locality sensitive hashing (LSH). We hash chemical descriptors so that points close to each other in the descriptor space are also close to each other in the hashed space. Using this data structure, one can perform approximate nearest-neighbor searches very quickly, in sub-linear time. We validate the accuracy and performance of our framework on three real data sets of sizes ranging from 4337 to 249 071 molecules. Results indicate that the identification of nearest neighbors using the LSH algorithm is at least 2 orders of magnitude faster than the traditional k-nearest-neighbor method and is over 94% accurate for most query parameters. Next, I will discuss my ongoing work in prefiltering large small molecule databases towards obtaining new inhibitors for drug targets using ensemble learning techniques. Finally, I will present a brief summary of my work in mining mass spectrometry data as well as my previous work on low state schemes to fight selfish and malicious traffic within a network.

   Bio for Debojyoti Dutta

Debojyoti graduated from IIT Kharagpur with a Btech in Computer Science in 1999. He got his PhD at USC/ISI in Computer Science with Ashish Goel (now at Stanford) as his advisor. Since his PhD in Summer 2004, he has been a postdoc at the department of Computational Biology (Ting Chen’s group). His interests include large scale machine learning and data mining for proteomics and chemoinformatics, and networking.

   Note for Visitors to SRI

Please arrive at least 10 minutes early as you will need to sign in by following instructions by the lobby phone at Building E (or call Wilma Lenz at 650 859 4904, or Eunice Tseng at 650 859 2799). SRI is located at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the parking lots off Fourth Street. Detailed directions to SRI, as well as maps, are available from the Visiting AIC web page. There are two entrances to SRI International located on Ravenswood Ave. Please check the Building E entrance signage.

SRI International
©2018 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy