[mlpack] sparse coding test examples in mlpack

Wed Jun 10 20:00:48 EDT 2015

Hi Ryan & Nishant,

Thank you for illuminating these points.

1. Thanks for that. It helps!
3. OK, so I guess there are some corner cases which mlpack cannot handle
with. Is it related to numerically stability, especially for some
ill-conditioned matrix input?

4. I still insist on that mlpack doesn't implement feature-sign search
algorithm.
In sparse_coding_impl.hpp file:
template<typename DictionaryInitializer>
void SparseCoding<DictionaryInitializer>::OptimizeCode()
only invokes lars.Regress function. Looking through lars.Regress()
function, I felt it is a traditional Cholesky-based implementation of the
LARS-Lasso algorithm, instead of Feature-sign algorithm in Honglak Lee's
"Efficient sparse coding algorithms" > (NIPS 2006) paper. Correct me if I
am wrong.

5.
How can I use parallel BLAS when I configured and built mlpack? I only see
the configuration page for CMake here:
http://www.mlpack.org/doxygen.php?doc=build.html
So I don't know how to add cflag options like "mkl=parallel" etc.

Thank you!
Jianyu

On Mon, Jun 8, 2015 at 7:32 PM, Ryan Curtin <ryan at ratml.org> wrote:

> On Fri, Jun 05, 2015 at 05:42:12PM -0700, Jianyu Huang wrote:
> > Hi Ryan,
> >
> > Thanks so much for the reply! It really helps!
> >
> > 1.
> > The output I am getting is like the following:
> > [0;36m[DEBUG] RA Search  [0x7ffe1d9e5740]
> > [DEBUG]   Reference Set: 40x40
> > [DEBUG]   Metric:
> > [DEBUG]     LMetric [0x7ffe1d9e58a4]
> > [DEBUG]       Power: 2
> > [DEBUG]       TakeRoot: false
> > [DEBUG] Sparse Coding  [0x7ffe1d9e5780]
> > [DEBUG]   Data: 40x40
> > [DEBUG]   Atoms: 3
> > [DEBUG]   Lambda 1: 0.1
> > [DEBUG]   Lambda 2: 0
> >
> > But just curious, what is the the "40x40" input data shown in the summary
> > part?
>
> Ah, this is output from src/mlpack/tests/to_string_test.cpp, which just
> makes sure that the ToString() method works for every mlpack class.  You
> can ignore the output, and the 40x40 dataset is just a random dataset.
>
> > 3.
> > Thanks! But just be curious, if I set the data as some random matrix like
> > 1,0,0,0
> > 0,3,0,0
> > 3,0,1,0
> > 0,4,0,0
> > 0,0,5,0
> > 0,0,3,7
> >
> > and I run "./sparse_coding -i data_bak2.csv -k 6 -l 1 -d dict.csv -c
> > codes.csv -n 10 -v" multiple times.
> >
> > Sometimes I can get output smoothly, but sometimes I get the following
> > error:
> >
> >
> -------------------------------------------------------------------------------------------------------
> > [DEBUG] Newton Method iteration 49:
> > [DEBUG]   Gradient norm: 1.94598.
> > [DEBUG]   Improvement: 0.
> > [INFO ]   Objective value: 27.9256.
> > [INFO ] Performing coding step...
> > [DEBUG] Optimization at point 0.
> > [INFO ]   Sparsity level: 22.2222%.
> > [INFO ]   Objective value: 20.6886 (improvement 1.79769e+308).
> > [INFO ] Iteration 2 of 10.
> > [INFO ] Performing dictionary step...
> > [WARN ] There are 1 inactive atoms. They will be re-initialized randomly.
> > [DEBUG] Solving Dual via Newton's Method.
> >
> > error: solve(): solution not found
> >
> > terminate called after throwing an instance of 'std::runtime_error'
> >   what():  solve(): solution not found
> > Aborted (core dumped)
> >
> >
> ------------------------------------------------------------------------------------------------------
> >
> > Do you have any insights about what is wrong here?
>
> I didn't write the sparse coding module, so I've CC'ed Nishant (the
> author) here to see if he has any insights.  To me, it looks like one or
> these two systems is failing to be solved:
>
> sparse_coding_impl.hpp:216 -- arma::mat matAInvZXT = solve(A, codesXT);
> sparse_coding_impl.hpp:223 -- arma::vec searchDirection = -solve(hessian,
> gradient);
>
> You can make the behavior deterministic by setting the random seed using
> the --seed option.
>
> > 4.
> > It looks like mlpack only implements a naive way to solve sparse coding,
> > i.e. using Cholesky-based implementation of the LARS-Lasso algorithm to
> > solve sparse coding step, and using Newton's iterative method to solve
> > Lagrange's Dual. So mlpack doesn't actually implement the feature-sign
> > search algorithm of Honglak Lee's "Efficient sparse coding algorithms"
> > (NIPS 2006) paper. Am I wrong here? Also, it looks like for online sparse
> > coding algorithm, the implementation in Julien Mairal's "Online
> Dictionary
> > Learning for Sparse Coding" (ICML 2009) paper is more efficient, which is
> > adopted in Scikit. Do you have plans to add those sparse coding approach?
>
> My understanding was that the mlpack method does implement the
> feature-sign search algorithm -- that is what the code references.
> Perhaps Nishant can elaborate?
>
> > 5.
> > I also notice the parallel performance of Sparse Coding in mlpack. When I
> > run command line interface "./sparse_coding ...", it looks like only one
> > core is utilized. But when I run the API code, it looks like the quad
> cores
> > in my CPU are all utilized. But searching the whole package, I didn't see
> > any "openmp" or "pthread" key words. My guess is that the performance
> > benefit comes from parallel MKL/BLAS. Am I wrong here? Do you have any
> idea
> > about why I get different parallel performance for CLI and API?
>
> This would have to do with your BLAS implementation, yes.  It looks like
> you are linking against a parallel BLAS when you use the mlpack API, but
> you have not used parallel BLAS when you configured and built mlpack.
>
> --
> Ryan Curtin    | "Weeee!"
> ryan at ratml.org |   - Bobby
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20150610/61ce64ef/attachment.html>