[mlpack] GOSC 2014 idea of AdaBoost implementation(MingJun Liu)

Fri Feb 28 15:34:54 EST 2014

On Thu, Feb 27, 2014 at 09:56:09AM +0800, Mj Liu wrote:
> Hi all,
>     I'm from University of Science and Techonology of China, and working as
> a intern student at Chinese Academy of Sciences now. My major is software
> engineering. I would like to join in the GSOC project and apply for the
> AdaBoosting part. Though I am new here, I am very interest in the project
> of MLPACK.
>     Udit Saxena provided several good suggestions about the weak learners
> in the mail list for the AdaBoost project. I also have several questions
> about how the AdaBoost can be integrated into the mlpack prject, and what
> kind of APIs shall be provide to the users. And I think several problems
> shall be talked settled before begin to work on the project:
>     - What shall the archetecture(API) be? How can the users call the
> method of AdaBoost? May the users can define their own learner method and
> call like: AdaBoost(data,  function learner()), or by AdaBoost(data,
> function inner-learner())?

Hi MingJun,

The API of the AdaBoost implementation is an open question, but it
should definitely be as similar as possible to existing mlpack methods
(found in src/mlpack/methods/).  I have spent some time thinking about
the best way to do this.  mlpack does not use inheritance and I would
prefer for things to remain that way.  Instead, template parameters are
used (this is called policy-based design).  So for AdaBoost, I would
think some API like this might be the way to go:

template<
  typename WeakClassifier1Type,
  typename WeakClassifier2Type,
  typename WeakClassifier3Type,
  ...
>
class AdaBoost;

Then AdaBoost::Classify() could be used to actually perform the boosted
classification.  It would be assumed that each WeakClassifierXType class
implemented a Classify() method of its own (and potentially a Train()
method).

This is preferable to function pointers because it is what's done in the
rest of mlpack.

>     - And how to control the steps of iteration, or the precision of error,
> or initialization of  the user defined learners? May the users just set one
> of MAX_ITERATION_STEP or PRECISION_ERROR , and the mlpack provide default
> of both.

I would imagine that these could just be parameters to the AdaBoost
constructor, and could be configurable by the user.  See the other
mlpack methods for an idea of what I mean.

>     - As to the weak learners, I think the Linear Perceptron(LP), Multi-LP,
> shall be included. Because users may need these to build their new
> algorithms.

Yes, this project should definitely include the implementation of a few
weak learners, because mlpack does not really have any at the moment.
These weak learners should be implemented as efficiently as possible,
and we can verify this by using the benchmarking system (
https://www.github.com/zoq/benchmarks ) to compare against other
implementations.

>     - Since there are so many variants of AdaBoost [
> http://www.site.uottawa.ca/~stan/csi5387/boost-tut-ppr.pdf] [
> http://www.site.uottawa.ca/~stan/csi5387/boost-tut-ppr.pdf] [
> http://colt2008.cs.helsinki.fi/papers/26-Shwartz.pdf], experiments shall be
> implemented to test which shall be kept in MLPACK project.

Yes; better yet, the AdaBoost class should be templatized in such a way
that the user can easily implement their own variant of AdaBoost.

-- 
Ryan Curtin    | "This room is green."
ryan at ratml.org |   - Kazan