AIC Seminar Series
AveBoost2: Boosting for Noisy Data
| Nikunj C. Oza | NASA Ames Research Center | |
Date: Tuesday October 05, 2004 at 16:00
Location: EJ228 (Directions)
|
|
AdaBoost is a well-known ensemble learning algorithm that constructs its base models in sequence. AdaBoost constructs a distribution over the training examples to create each base model. This distribution, represented as a vector, is constructed with the goal of making the next base model's mistakes uncorrelated with those of the previous base model. We previously developed an algorithm, AveBoost, that first constructed a distribution the same way as AdaBoost but then averaged it with the previous models' distributions to create the next base
model's distribution. Our experiments demonstrated the superior
accuracy of this approach. In this paper, we slightly revise our
algorithm to obtain non-trivial theoretical results: bounds on the
training error and generalization error (difference between training and test error). Our averaging process has a regularizing effect which leads us to a worse training error bound for our algorithm than for AdaBoost but a better generalization error bound. This leads us to suspect that our new algorithm works better than AdaBoost on noisy data. For this paper, we experimented with the data that we used in both as originally supplied and with added label noise---some of the data has its original label changed randomly. Our algorithm's experimental performance improvement over AdaBoost is even greater on the noisy data than the original data.
| |
|
Nikunj C. Oza received his B.S. in Mathematics with Computer Science from the Massachusetts Institute of Technology (MIT) in 1994. He received his M.S. (in 1998) and Ph.D. (in 2001) in Computer Science from the University of California at Berkeley. He then joined NASA Ames Research Center as a research scientist and is a member of the Data Mining and Complex Adaptive Systems group at NASA. His research interests include machine learning (especially ensemble learning and online learning), data mining, fault detection, and satellite image understanding.
| |
|
Please arrive at least 10 minutes early in order to sign in and be escorted to the conference room. SRI is located at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the visitors lot in front of Building E, and should follow the instructions by the lobby phone to be escorted to the meeting room. Detailed directions to SRI, as well as maps, are available from the Visiting AIC web page.
©2013 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy
|