[mlpack] sparse coding test examples in mlpack

Ryan Curtin ryan at ratml.org
Thu Jun 4 11:21:39 EDT 2015


On Wed, Jun 03, 2015 at 08:52:11AM -0700, Jianyu Huang wrote:
> Hi all,
> 
> I am new to sparse coding, and I try to use mlpack to test sparse coding. I
> have sucessfully installed mlpack on a Ubuntu 14.04 64-bit machine.

Hi Jianyu,

I'll do my best to answer the questions. :)

> 1.
> 
> I run "make test". So if I understand correctly, I am running
> "bin/mlpack_test" actually. I check the DEBUG/WARN log output for the test.
> In the beginning, it shows "Running 402 test cases...". In almost the end,
> I find "Sparse Coding" is examined in this test. It shows the Data is
> 40x40, Atoms is 3, Lambda 1 is 0.1 and Lambda 2 is 0.
> 
> So what data set is tested here? Where can I get this “40x40” data?

You are correct -- 'make test' runs the 'bin/mlpack_test' program.
However, I'm not sure where the output you are referring to is.  Can you
paste exactly the output you are getting?  Then maybe I will be able to
figure it out.  As far as I know, though, the sparse coding tests use
the mnist_first250_training_4s_and_9s.arm dataset.

> 2.
> 
> I copy src/mlpack/tests/sparse_coding_test.cpp to a separate cpp file, and
> try to remove all macros depending on boost/test/unit_test. So the new test
> cpp file looks like:
> 
> 
> 
> int main() {
>   double lambda1 = 0.1;
>   uword nAtoms = 25;
> 
>   mat X;
>   X.load("mnist_first250_training_4s_and_9s.arm");
>   uword nPoints = X.n_cols;
> 
>   // Normalize each point since these are images.
>   for (uword i = 0; i < nPoints; ++i) {
>     X.col(i) /= norm(X.col(i), 2);
>   }
> 
>   SparseCoding<> sc(X, nAtoms, lambda1);
>   sc.OptimizeCode();
> 
>   mat D = sc.Dictionary();
>   mat Z = sc.Codes();
> 
>   for (uword i = 0; i < nPoints; ++i)
>   {
>     vec errCorr = trans(D) * (D * Z.unsafe_col(i) - X.unsafe_col(i));
>     SCVerifyCorrectness(Z.unsafe_col(i), errCorr, lambda1);
>   }
> }
> 
> However, it shows the following error:
> -------------------------------------------------------------------------------------------
> Mat::load(): couldn't read mnist_first250_training_4s_and_9s.arm
> error: Mat::col(): index out of bounds
> terminate called after throwing an instance of 'std::logic_error'
>   what():  Mat::col(): index out of bounds
> Aborted (core dumped)
> -------------------------------------------------------------------------------------------
> 
> I checked the data “mnist_first250_training_4s_and_9s.arm” is only 3M, so
> it should not exceed the “4 billion elements” restrictions for Armadillo
> without “ARMA_64BIT_WORD” configurations. Do you have any idea about why
> this error happen?

You probably don't have the dataset file in the working directory of
your program.  Based on the error output, that's what it looks like.

> Also, how can visualize/show the data in “
> mnist_first250_training_4s_and_9s.arm”? I don’t know what the data
> looks like.

This file is an Armadillo binary format to save space.  You could
convert it to CSV and then use whatever tools you like to inspect it
with the following simple program:

----
#include <mlpack/core.hpp>

int main() {
  mat X;
  X.load("mnist_first250_training_4s_and_9s.arm");

  data::Save("mnist_first250_training_4s_and_9s.csv", X);
}
----

> 3.
> 
> The command line interface for sparse_coding is as the following,
> 
> $ sparse_coding -i data.csv -k 200 -l 0.1 -d dict.csv -c codes.csv
> 
> Could you give me an example of data.csv? Sorry I don't know what the input
> of sparse coding should look like.

data.csv should be a comma-separated values file where each row
represents one observation/point and each column represents one
feature/dimension.  As an example, here's the first ten lines of
LCDM_q.csv, which is a dataset containing 3-dimensional objects
collected from the Sloan Digital Sky Survey:

$ head ~/datasets/LCDM_q.csv 
73.1708,100.713,8.93208
66.1034,33.8976,66.7139
73.5393,130.28,55.328
99.751,43.9025,99.2587
98.783,79.4761,78.0526
23.808,81.3255,12.287
89.1525,90.8523,37.9072
74.1535,68.5934,9.56997
44.0054,79.9218,0.937222
29.7863,132.141,59.5095

I hope this is helpful; if there's anything I've written that's unclear,
I'm happy to elaborate.

Thanks,

Ryan

-- 
Ryan Curtin    | "Hungry."
ryan at ratml.org |   - Sphinx


More information about the mlpack mailing list