[mlpack-svn] r10838 - in mlpack/trunk/doc: . guide

fastlab-svn at coffeetalk-1.cc.gatech.edu fastlab-svn at coffeetalk-1.cc.gatech.edu
Fri Dec 16 01:34:38 EST 2011


Author: rcurtin
Date: 2011-12-16 01:34:38 -0500 (Fri, 16 Dec 2011)
New Revision: 10838

Added:
   mlpack/trunk/doc/guide/
   mlpack/trunk/doc/guide/build.hpp
   mlpack/trunk/doc/guide/iodoc.hpp
   mlpack/trunk/doc/guide/matrices.hpp
   mlpack/trunk/doc/guide/sample.hpp
   mlpack/trunk/doc/guide/timer.hpp
Log:
Add some useful documentation.


Added: mlpack/trunk/doc/guide/build.hpp
===================================================================
--- mlpack/trunk/doc/guide/build.hpp	                        (rev 0)
+++ mlpack/trunk/doc/guide/build.hpp	2011-12-16 06:34:38 UTC (rev 10838)
@@ -0,0 +1,103 @@
+/*! @page build Building MLPACK From Source
+
+ at section buildintro Introduction
+
+MLPACK uses CMake as a build system and allows several flexible build
+configuration options.  One can consult any of numerous CMake tutorials for
+further documentation, but this tutorial should be enough to get MLPACK built
+and installed.
+
+ at section builddir Creating Build Directory
+
+Once the MLPACK source is unpacked, you should create a build directory.
+
+ at code
+$ cd mlpack-1.0.0
+$ mkdir build
+ at endcode
+
+The directory can have any name, not just 'build', but 'build' is sufficient
+enough.
+
+ at section dep Dependencies of MLPACK
+
+MLPACK depends on the following libraries, which need to be installed on the
+system and have headers present:
+
+ - LAPACK
+ - pthreads
+ - Armadillo >= 2.4.0
+ - LibXML2
+ - Boost (math_c99, program_options, unit_test_framework)
+
+ at section config Configuring CMake
+
+Running CMake is the equivalent to running `./configure` with autotools.  If you
+run CMake with no options, it will configure the project to build with debugging
+symbols and profiling information:
+
+ at code
+$ cd build
+$ cmake ../
+ at endcode
+
+You can specify options to compile without debugging information and profiling
+information (i.e. as fast as possible):
+
+ at code
+$ cd build
+$ cmake -D DEBUG=OFF -D PROFILE=OFF ../
+ at endcode
+
+The full list of options MLPACK allows:
+
+ - DEBUG=(ON/OFF): compile with debugging symbols (default ON)
+ - PROFILE=(ON/OFF): compile with profiling symbols (default ON)
+ - ARMA_EXTRA_DEBUG=(ON/OFF): compile with extra Armadillo debugging symbols
+       (default OFF)
+
+Each option can be specified to CMake with the '-D' flag.  Other tools can also
+be used to configure CMake, but those are not documented here.
+
+ at section build Building MLPACK
+
+Once CMake is configured, building the library is as simple as typing 'make'.
+This will build all library components as well as 'mlpack_test'.
+
+ at code
+$ make
+Scanning dependencies of target mlpack
+[  1%] Building CXX object
+src/mlpack/CMakeFiles/mlpack.dir/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.cpp.o
+<...>
+ at endcode
+
+You can specify individual components which you want to build, if you do not
+want to build everything in the library:
+
+ at code
+$ make pca allknn allkfn
+ at endcode
+
+If the build fails and you cannot figure out why, register an account on Trac
+and submit a ticket and the MLPACK developers will quickly help you figure it
+out:
+
+http://mlpack.org/
+
+Alternately, MLPACK help can be found in IRC at \#mlpack on irc.freenode.net.
+
+ at section install Installing MLPACK
+
+If you wish to install MLPACK to /usr/include/mlpack/ and /usr/lib/ and
+/usr/bin/, once it has built, make sure you have root privileges (or write
+permissions to those two directories), and simply type
+
+ at code
+# make install
+ at endcode
+
+You can now run the executables by name; you can link against MLPACK with
+-lmlpack, and the MLPACK headers are found in /usr/include/mlpack/.
+
+*/

Added: mlpack/trunk/doc/guide/iodoc.hpp
===================================================================
--- mlpack/trunk/doc/guide/iodoc.hpp	                        (rev 0)
+++ mlpack/trunk/doc/guide/iodoc.hpp	2011-12-16 06:34:38 UTC (rev 10838)
@@ -0,0 +1,138 @@
+/*! @page iodoc MLPACK Input and Output
+
+ at section iointro Introduction
+
+MLPACK provides the following:
+
+ - mlpack::Log, for debugging / informational / warning / fatal output
+ - mlpack::CLI, for parsing command line options
+
+Each of those classes are well-documented, and that documentation should be
+consulted for further reference.
+
+ at section simplelog Simple Logging Example
+
+MLPACK has four logging levels:
+
+ - Log::Debug
+ - Log::Info
+ - Log::Warn
+ - Log::Fatal
+
+Output to Log::Debug does not show (and has no performance penalty) when MLPACK
+is compiled without debugging symbols.  Output to Log::Info is only shown when
+the program is run with the --verbose (or -v) flag.  Log::Warn is always shown,
+and Log::Fatal will halt the program, when a newline is sent to it.
+
+Here is a simple example, and its output:
+
+ at code
+#include <mlpack/core.hpp>
+
+using namespace mlpack;
+
+int main()
+{
+  Log::Debug << "Compiled with debugging symbols." << std::endl;
+
+  Log::Info << "Some test informational output." << std::endl;
+
+  Log::Warn << "A warning!" << std::endl;
+
+  Log::Fatal << "Program has crashed." << std::endl;
+
+  Log::Warn << "Made it!" << std::endl;
+}
+ at endcode
+
+With debugging output and --verbose, the following is shown:
+
+ at code
+$ ./main --verbose
+[DEBUG] Compiled with debugging symbols.
+[INFO ] Some test informational output.
+[WARN ] A warning!
+[FATAL] Program has crashed.
+ at endcode
+
+The last warning is not reached, because Log::Fatal terminates the program.
+
+Without debugging symbols and without --verbose, the following is shown:
+
+ at code
+$ ./main
+[WARN ] A warning!
+[FATAL] Program has crashed.
+ at endcode
+
+These four outputs can be very useful for both providing informational output
+and debugging output for your MLPACK program.
+
+ at section simplecli Simple CLI Example
+
+Through the mlpack::CLI object, command-line parameters can be easily added
+with the PROGRAM_INFO, PARAM_INT, PARAM_DOUBLE, PARAM_STRING, and PARAM_FLAG
+macros.
+
+Here is a sample use of those macros, extracted from methods/pca/pca_main.cpp.
+
+ at code
+#include <mlpack/core.hpp>
+
+// Document program.
+PROGRAM_INFO("Principal Components Analysis", "This program performs principal "
+    "components analysis on the given dataset.  It will transform the data "
+    "onto its principal components, optionally performing dimensionality "
+    "reduction by ignoring the principal components with the smallest "
+    "eigenvalues.");
+
+// Parameters for program.
+PARAM_STRING_REQ("input_file", "Input dataset to perform PCA on.", "");
+PARAM_STRING_REQ("output_file", "Output dataset to perform PCA on.", "");
+PARAM_INT("new_dimensionality", "Desired dimensionality of output dataset.",
+    "", 0);
+
+using namespace mlpack;
+
+int main(int argc, char** argv)
+{
+  // Parse commandline.
+  CLI::ParseCommandLine(argc, argv);
+
+  ...
+}
+ at endcode
+
+Documentation is automatically generated using those macros, and when the
+program is run with --help the following is displayed:
+
+ at code
+$ pca --help
+Principal Components Analysis
+
+  This program performs principal components analysis on the given dataset.  It
+  will transform the data onto its principal components, optionally performing
+  dimensionality reduction by ignoring the principal components with the
+  smallest eigenvalues.
+
+Required options:
+
+  --input_file [string]         Input dataset to perform PCA on.
+  --output_file [string]        Output dataset to perform PCA on.
+
+Options:
+
+  --help (-h)                   Default help info.
+  --info [string]               Get help on a specific module or option.
+                                Default value ''.
+  --new_dimensionality [int]    Desired dimensionality of output dataset.
+                                Default value 0.
+  --verbose (-v)                Display informational messages and the full list
+                                of parameters and timers at the end of
+                                execution.
+ at endcode
+
+The mlpack::CLI documentation can be consulted for further and complete
+documentation.
+
+*/

Added: mlpack/trunk/doc/guide/matrices.hpp
===================================================================
--- mlpack/trunk/doc/guide/matrices.hpp	                        (rev 0)
+++ mlpack/trunk/doc/guide/matrices.hpp	2011-12-16 06:34:38 UTC (rev 10838)
@@ -0,0 +1,70 @@
+/*! @page matrices Matrices in MLPACK
+
+ at section matintro Introduction
+
+MLPACK uses Armadillo matrices for matrix support.  Armadillo is a fast C++
+matrix library which makes use of advanced template techniques to provide the
+fastest possible matrix operations.
+
+Documentation on Armadillo can be found on their website:
+
+http://arma.sourceforge.net/docs.html
+
+Nonetheless, there are a few further caveats for MLPACK Armadillo usage.
+
+ at section format Column-wise Matrices
+
+Armadillo matrices are stored in a column-major format; this means that on disk,
+each column is located in contiguous memory.
+
+This means that, for the vast majority of machine learning methods, it is faster
+to store observations as columns and dimensions as rows.  This is counter to
+most standard machine learning texts!
+
+Major implications of this are for linear algebra.  For instance, the covariance
+of a matrix is typically
+
+ at f$ C = X^T X @f$
+
+but for a column-wise matrix, it is
+
+ at f$ C = X X^T @f$
+
+and this is very important to keep in mind!  If your MLPACK code is not working,
+this may be a factor in why.
+
+ at section loading Loading Matrices
+
+MLPACK provides a data::Load() and data::Save() function, which should be used
+instead of Armadillo's loading and saving functions.
+
+Most machine learning data is stored in row-major format; a CSV, for example,
+will generally have one observation per line and each column will correspond to
+a dimension.
+
+The data::Load() and data::Save() functions transpose the matrix upon loading,
+meaning that the following CSV:
+
+ at code
+$ cat data.csv
+3,3,3,3,0
+3,4,4,3,0
+3,4,4,3,0
+3,3,4,3,0
+3,6,4,3,0
+2,4,4,3,0
+2,4,4,1,0
+3,3,3,2,0
+3,4,4,2,0
+3,4,4,2,0
+3,3,4,2,0
+3,6,4,2,0
+2,4,4,2,0
+ at endcode
+
+is actually loaded with 5 rows and 13 columns, not 13 rows and 5 columns like
+the CSV is written.
+
+This is important to remember!
+
+*/

Added: mlpack/trunk/doc/guide/sample.hpp
===================================================================
--- mlpack/trunk/doc/guide/sample.hpp	                        (rev 0)
+++ mlpack/trunk/doc/guide/sample.hpp	2011-12-16 06:34:38 UTC (rev 10838)
@@ -0,0 +1,93 @@
+/*! @page sample Simple Sample MLPACK Programs
+
+ at section sampleintro Introduction
+
+On this page, several simple MLPACK examples are contained, in increasing order
+of complexity.
+
+ at section covariance Covariance Computation
+
+A simple program to compute the covariance of a data matrix ("data.csv"),
+assuming that the data is already centered, and save it to file.
+
+ at code
+// Includes all relevant components of MLPACK.
+#include <mlpack/core.hpp>
+
+// Convenience.
+using namespace mlpack;
+
+int main()
+{
+  // First, load the data.
+  arma::mat data;
+  // Use data::Load() which transposes the matrix.
+  data::Load("data.csv", data, true);
+
+  // Now compute the covariance.  We assume that the data is already centered.
+  // Remember, because the matrix is column-major, the covariance operation is
+  // transposed.
+  arma::mat cov = data * trans(data) / data.n_cols;
+
+  // Save the output.
+  data::Save("cov.csv", cov, true);
+}
+ at endcode
+
+ at section nn Nearest Neighbor
+
+This simple program uses the mlpack::neighbor::NeighborSearch object to find the
+nearest neighbor of each point in a dataset using the L1 metric, and then print
+the index of the neighbor and the distance of it to stdout.
+
+ at code
+ at include <mlpack/core.hpp>
+
+using namespace mlpack;
+using namespace mlpack::neighbor; // NeighborSearch and NearestNeighborSort
+using namespace mlpack::metric; // ManhattanDistance
+
+int main()
+{
+  // Load the data from data.csv (hard-coded).  Use CLI for simple command-line
+  // parameter handling.
+  arma::mat data;
+  data::Load("data.csv", data, true);
+
+  // Use templates to specify that we want a NeighborSearch object which uses
+  // the Manhattan distance.
+  NeighborSearch<NearestNeighborSort, ManhattanDistance> nn(data);
+
+  // Create the object we will store the nearest neighbors in.
+  arma::Col<size_t> neighbors;
+  arma::vec distances; // We need to store the distance too.
+
+  // Compute the neighbors.
+  nn.Search(1, neighbors, distances);
+
+  // Write each neighbor and distance using Log.
+  for (size_t i = 0; i < neighbors.n_elem; ++i)
+  {
+    Log::Info << "Nearest neighbor of point " << i << " is point "
+        << neighbors[i] << " and the distance is " << distances[i] << ".\n";
+  }
+}
+ at endcode
+
+ at section other Other examples
+
+For more complex examples, it is useful to refer to the main executables:
+
+ - methods/neighbor_search/allknn_main.cpp
+ - methods/neighbor_search/allkfn_main.cpp
+ - methods/emst/emst_main.cpp
+ - methods/radical/radical_main.cpp
+ - methods/nca/nca_main.cpp
+ - methods/naive_bayes/nbc_main.cpp
+ - methods/pca/pca_main.cpp
+ - methods/lars/lars_main.cpp
+ - methods/linear_regression/linear_regression_main.cpp
+ - methods/gmm/gmm_main.cpp
+ - methods/kmeans/kmeans_main.cpp
+
+*/

Added: mlpack/trunk/doc/guide/timer.hpp
===================================================================
--- mlpack/trunk/doc/guide/timer.hpp	                        (rev 0)
+++ mlpack/trunk/doc/guide/timer.hpp	2011-12-16 06:34:38 UTC (rev 10838)
@@ -0,0 +1,59 @@
+/*! @page timer MLPACK Timers
+
+ at section timerintro Introduction
+
+MLPACK provides a simple timer interface for the timing of machine learning
+methods.  The results of any timers used during the program are displayed at
+output by the mlpack::CLI object, when --verbose is given:
+
+ at code
+$ allknn -i=data.csv -k 5
+<...>
+[INFO ] Program timers:
+[INFO ]   computing_neighbors: 0.044764s
+[INFO ]   total_time: 0.061249s
+[INFO ]   tree_building: 0.003075s
+ at endcode
+
+ at section usingtimer Timer API
+
+The mlpack::Timer class provides three simple methods:
+
+ at code
+void Timer::Start(const char* name);
+void Timer::Stop(const char* name);
+timeval Timer::Get(const char* name);
+ at endcode
+
+Each timer is given a name, and is referenced by that name.
+
+A "total_time" timer is run by default for each MLPACK program.
+
+ at section example Timer Example
+
+Below is a very simple example of timer usage in code.
+
+ at code
+#include <mlpack/core.hpp>
+
+using namespace mlpack;
+
+int main(int argc, char** argv)
+{
+  CLI::ParseCommandLine(argc, argv);
+
+  // Start a timer.
+  Timer::Start("some_timer");
+
+  // Do some things.
+  DoSomeStuff();
+
+  // Stop the timer.
+  Timer::Stop("some_timer");
+}
+ at endcode
+
+If the --verbose flag was given to this executable, the resultant time that
+"some_timer" ran for would be shown.
+
+*/




More information about the mlpack-svn mailing list