CSE 575 AdaBoost, Intuition
This post is going to share some of my own thoughts on the concept of AdaBoost. Notice it is not going to be superbly rigid, but rather I’d like build up some intuition.
First things first, there is this wonderful AdaBoost guide by some fine gentlement from CMU that you can refer to anytime: CMU 10701 Lecture 10
Next, I will go through the AdaBoost question from Homework Assignment 2 so as to demonstrate the procedure AdaBoost takes by each step. In the following demo I will show how to proceed in Iteration 1 and Iteration 2.
A Brief Walkthrough
Recall we have the following 1-D training dataset and the sole weak classifier C. Also recall each sample is assigned with an initial weight of .
Iteration t=1
(1) The initial threshold , which minimizes the current weighted training error, is given to you. So at such , we have the current optimal weak classifier as
(2) The minimal weighted training error by on the dataset is
where I() is the indicator function.
(3) So the coefficient for ,
(4) We now can update the samples’ weights and produce .
(5) And hence at the current step, the linearly combined classifer f is
The ensemble classifer has the overall accuracy of 0.7 .
Iteration t=2
(1) Now we have to pick the threshold on our own. Since there are 11 ‘cuts’ to divide our training set, and after trying them all, we may find when , the weighted training error is at minimum. Let’s set for convenience.
Steps (2) through (4) follow the same fashion as are in Iteration t=1. We may get the following:
(5) And hence at the current step, the linearly combined classifer f is now
The ensemble classifer still has the overall accuracy of 0.7 .
Iterations will continue until reaches 100% accuracy, which is guaranteed by the nature AdaBoost. The proof can be found either from our class slides, or from the aforementioned CMU lecture note on Page 22.
Amplifying the Misclassified
Take notice from Step (2) above, when you are updating the weights on each sample, you are equivalently doing the following selection:
From this we can tell that the weights of the samples misclassified by are amplified, while those of the correctly classified are shrinked. Intuitively, you may treat it as that the weights of the misclassified are now of times importance in the following iteration.