[mlpack-svn] [MLPACK] #349: Softmax Regression Module

Tue Jul 22 09:22:59 EDT 2014

#349: Softmax Regression Module
----------------------------+-----------------------------------------------
  Reporter:  siddharth.950  |        Owner:     
      Type:  enhancement    |       Status:  new
  Priority:  major          |    Milestone:     
 Component:  mlpack         |   Resolution:     
  Keywords:                 |     Blocking:     
Blocked By:                 |  
----------------------------+-----------------------------------------------

Comment (by rcurtin):

 So, I'm going to try and keep my discussion about softmax regression here
 since I seem to immediately forget everything that is said in IRC.  This
 will give me something better to refer to...

 As an update on this ticket, here is a snippet from the IRC logs:

 {{{
 18:04 < oldbeardo> while testing today I used the same Gaussians as in the
 Logistic Regression test
 18:04 < oldbeardo> they had base points as "1.0 1.0 1.0" and "9.0 9.0 9.0"
 18:05 < oldbeardo> using that dataset, I was getting an accuracy of 52%
 18:06 < oldbeardo> so I worked out the math, turns out Softmax does not
 give the Logistic cost function when num_classes=2
 18:06 < oldbeardo> I did this mentally so I may be incorrect
 18:06 < naywhayare> I thought that it worked out to be the same... hang
 on, let me look up that site that said it was the same
 18:07 < oldbeardo> the point I'm making is that Softmax has a bias towards
 features points with a higher norm
 18:08 < naywhayare> can you explain why that is? I'm trying to understand
 18:08 < oldbeardo> okay
 18:09 < oldbeardo> the probability for a class is exp(lin_j) /
 sum(exp(lin_i))
 18:10 < oldbeardo> if you take num_classes = 2 it becomes exp(lin_0) /
 (exp(lin_0) + exp(lin_1))
 18:10 < oldbeardo> which is 1 / (1 + exp(lin_1 - lin_0))
 18:10 < naywhayare> lin_j = \theta_j^T * x?
 18:11 < naywhayare> just to be sure we are on the right page
 18:11 < oldbeardo> yes
 18:11 < naywhayare> ok
 18:11 < oldbeardo> now this is not the same as sigmoid(lin_0)
 18:12 < oldbeardo> so, the learned weights are in favour of the class
 which has higher norm for its data points
 18:13 < oldbeardo> at least that's what I inferred from the printed
 probabilities
 }}}

 (this can be found at http://www.mlpack.org/irc/mlpack.20140516.html ).

 The specific bits I want to point out are these comments:

 {{{
 18:09 < oldbeardo> the probability for a class is exp(lin_j) /
 sum(exp(lin_i))
 18:10 < oldbeardo> if you take num_classes = 2 it becomes exp(lin_0) /
 (exp(lin_0) + exp(lin_1))
 18:10 < oldbeardo> which is 1 / (1 + exp(lin_1 - lin_0))

 18:11 < oldbeardo> now this is not the same as sigmoid(lin_0)
 18:12 < oldbeardo> so, the learned weights are in favour of the class
 which has higher norm for its data points
 }}}

 [http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression#Properties_of_softmax_regression_parameterization
 Softmax regression is overparameterized], so we can use this to show that
 logistic and softmax regression actually are equivalent in the two-class
 setting.  I've attached an image containing my whiteboard derivation.  Can
 you take a look at it and tell me if you agree or if I have done something
 wrong?

 If the derivation is right, then on that test set with two Gaussians that
 are far apart, we should get much higher accuracy than 52%.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/349#comment:1>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.