[mlpack-svn] [MLPACK] #349: Softmax Regression Module
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Tue Jul 22 09:22:59 EDT 2014
#349: Softmax Regression Module
----------------------------+-----------------------------------------------
Reporter: siddharth.950 | Owner:
Type: enhancement | Status: new
Priority: major | Milestone:
Component: mlpack | Resolution:
Keywords: | Blocking:
Blocked By: |
----------------------------+-----------------------------------------------
Comment (by rcurtin):
So, I'm going to try and keep my discussion about softmax regression here
since I seem to immediately forget everything that is said in IRC. This
will give me something better to refer to...
As an update on this ticket, here is a snippet from the IRC logs:
{{{
18:04 < oldbeardo> while testing today I used the same Gaussians as in the
Logistic Regression test
18:04 < oldbeardo> they had base points as "1.0 1.0 1.0" and "9.0 9.0 9.0"
18:05 < oldbeardo> using that dataset, I was getting an accuracy of 52%
18:06 < oldbeardo> so I worked out the math, turns out Softmax does not
give the Logistic cost function when num_classes=2
18:06 < oldbeardo> I did this mentally so I may be incorrect
18:06 < naywhayare> I thought that it worked out to be the same... hang
on, let me look up that site that said it was the same
18:07 < oldbeardo> the point I'm making is that Softmax has a bias towards
features points with a higher norm
18:08 < naywhayare> can you explain why that is? I'm trying to understand
18:08 < oldbeardo> okay
18:09 < oldbeardo> the probability for a class is exp(lin_j) /
sum(exp(lin_i))
18:10 < oldbeardo> if you take num_classes = 2 it becomes exp(lin_0) /
(exp(lin_0) + exp(lin_1))
18:10 < oldbeardo> which is 1 / (1 + exp(lin_1 - lin_0))
18:10 < naywhayare> lin_j = \theta_j^T * x?
18:11 < naywhayare> just to be sure we are on the right page
18:11 < oldbeardo> yes
18:11 < naywhayare> ok
18:11 < oldbeardo> now this is not the same as sigmoid(lin_0)
18:12 < oldbeardo> so, the learned weights are in favour of the class
which has higher norm for its data points
18:13 < oldbeardo> at least that's what I inferred from the printed
probabilities
}}}
(this can be found at http://www.mlpack.org/irc/mlpack.20140516.html ).
The specific bits I want to point out are these comments:
{{{
18:09 < oldbeardo> the probability for a class is exp(lin_j) /
sum(exp(lin_i))
18:10 < oldbeardo> if you take num_classes = 2 it becomes exp(lin_0) /
(exp(lin_0) + exp(lin_1))
18:10 < oldbeardo> which is 1 / (1 + exp(lin_1 - lin_0))
18:11 < oldbeardo> now this is not the same as sigmoid(lin_0)
18:12 < oldbeardo> so, the learned weights are in favour of the class
which has higher norm for its data points
}}}
[http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression#Properties_of_softmax_regression_parameterization
Softmax regression is overparameterized], so we can use this to show that
logistic and softmax regression actually are equivalent in the two-class
setting. I've attached an image containing my whiteboard derivation. Can
you take a look at it and tell me if you agree or if I have done something
wrong?
If the derivation is right, then on that test set with two Gaussians that
are far apart, we should get much higher accuracy than 52%.
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/349#comment:1>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.
More information about the mlpack-svn
mailing list