[mlpack-git] mlpack-1.0.x: Backport documentation from r17014 (surprised this wasn't in 1.0.10). (d7e6025)

Thu Mar 5 22:06:42 EST 2015

Repository : https://github.com/mlpack/mlpack

On branch  : mlpack-1.0.x
Link       : https://github.com/mlpack/mlpack/compare/904762495c039e345beba14c1142fd719b3bd50e...f94823c800ad6f7266995c700b1b630d5ffdcf40

>---------------------------------------------------------------

commit d7e6025017974425716be1d3c6fcb36efa8b1d41
Author: Ryan Curtin <ryan at ratml.org>
Date:   Sun Dec 7 19:26:16 2014 +0000

    Backport documentation from r17014 (surprised this wasn't in 1.0.10).


>---------------------------------------------------------------

d7e6025017974425716be1d3c6fcb36efa8b1d41
 doc/tutorials/amf/amf.txt                          | 205 +++++++++++++++++++++
 doc/tutorials/tutorials.txt                        |   2 +-
 src/mlpack/methods/amf/amf.hpp                     |   6 +-
 .../methods/amf/init_rules/random_acol_init.hpp    |  12 +-
 .../complete_incremental_termination.hpp           |   5 +
 .../incomplete_incremental_termination.hpp         |   5 +
 src/mlpack/methods/amf/update_rules/nmf_als.hpp    |  19 +-
 .../methods/amf/update_rules/nmf_mult_dist.hpp     |  14 +-
 .../methods/amf/update_rules/nmf_mult_div.hpp      |  24 +--
 9 files changed, 254 insertions(+), 38 deletions(-)

diff --git a/doc/tutorials/amf/amf.txt b/doc/tutorials/amf/amf.txt
new file mode 100644
index 0000000..6b4bf9a
--- /dev/null
+++ b/doc/tutorials/amf/amf.txt
@@ -0,0 +1,205 @@
+/*!
+
+ at file amf.txt
+ at author Sumedh Ghaisas
+ at brief Tutorial for how to use the AMF class.
+
+ at page amftutorial Alternating Matrix Factorization tutorial.
+
+ at section intro_amftut Introduction
+
+Alternating Matrix Factorization
+
+Alternating matrix factorization decomposes matrx V in the form \f$ V \approx WH \f$ 
+where W is called the basis matrix and H is called the encoding matrix. V is 
+taken to be of size n x m and the obtained W is n x r and H is r x m. The size 
+r is called the rank of the factorization. Factorization is done by alternately
+calculating W and H respectively while holding the other matrix constant.
+
+\b mlpack provides:
+
+ - a \ref amf_amftut "simple C++ interface" to perform Alternating Matrix Factorization
+
+ at section toc_amftut Table of Contents
+
+A list of all the sections this tutorial contains.
+
+ - \ref intro_amftut
+ - \ref toc_amftut
+ - \ref amf_amftut
+   - \ref t_policy_amftut
+   - \ref init_rule_amftut
+   - \ref update_rule_amftut
+   - \ref nmf_amftut
+   - \ref svd_amftut
+ - \ref further_doc_amftut
+
+ at section amf_amftut The 'AMF' class
+
+The AMF class is templatized with 3 parameters; the first contains the policy 
+used to determine when the algorithm has converged; the second contains the 
+initialization rule for the W and H matrix; the last contains the update rule 
+to be used during each iteration. This templatization allows the user to try 
+various update rules, initialization rules, and termination policies (including 
+ones not supplied with MLPACK) for factorization.
+
+The class provides the following method that performs factorization
+ at code
+template<typename MatType> double Apply(const MatType& V,
+                                        const size_t r,
+                                        arma::mat& W,
+                                        arma::mat& H);
+ at endcode
+
+ at subsection t_policy_amftut Using different termination policies
+
+The AMF implementation comes with different termination policies to support many
+implemented algorithms. Every termination policy implements the following method 
+which returns the status of convergence.
+ at code
+bool IsConverged(arma::mat& W, arma::mat& H)
+ at endcode
+
+list of all the termination policies 
+
+ - \ref mlpack::amf::SimpleResidueTermination
+ - \ref mlpack::amf::SimpleToleranceTermination
+ - \ref mlpack::amf::ValidationRMSETermination
+ 
+In SimpleResidueTermination, termination decision depends on two factors, value 
+of residue and number of iteration. If the current value of residue drops below 
+the threshold or the number of iterations goes beyond the threshold, positive 
+termination signal is passed to AMF. 
+
+In SimpleToleranceTermination, termination criterion is met when increase in 
+residue value drops below the given tolerance. To accomodate spikes, certain 
+number of successive residue drops are accepted. Secondary termination criterion 
+terminates algorithm when iteration count goes beyond the threshold. 
+
+ValidationRMSETermination divids the data into 2 sets, training set and 
+validation set. Entries of validation set are nullifed in the input matrix. 
+Termination criterion is met when increase in validation set RMSe value drops 
+below the given tolerance. To accomodate spikes certain number of successive 
+validation RMSE drops are accepted. This upper imit on successive drops can be 
+adjusted with reverseStepCount. Secondary termination criterion terminates 
+algorithm when iteration count goes above the threshold. Though this termination
+policy is better measure of convergence than the above 2 termination policies,
+it may cause a overhead in performance.
+
+On the other hand \ref mlpack::amf::CompleteIncrementalTermination 
+"CompleteIncrementalTermination" and \ref mlpack::amf::IncompleteIncrementalTermination
+are just wrapper classes for other termination policies. These policies are used
+when AMF is applied with \ref mlpack::amf::SVDCompleteIncrementalLearning 
+"SVDCompleteIncrementalLearning" and \ref mlpack::amf::SVDIncompleteIncrementalLearning
+"SVDIncompleteIncrementalLearning" respectively. 
+
+ at subsection init_rule_amftut Using different initialization policies
+
+The AMF class comes with 2 initialization policies 
+ - \ref mlpack::amf::RandomInitialization "RandomInitialization"
+ - \ref mlpack::amf::RandomAcolInitialization "RandomAcolInitialization"
+ 
+RandomInitialization initializes matrices W and H with random uniform distribution
+while RandomAcolInitialization initializes the W matrix by averaging p randomly 
+chosen columns of V.  In case of RandomAcolIntialization, p is a template parameter.
+
+To implement their own initialization policy, users need to define the following 
+function in their class.
+ at code
+template<typename MatType>
+inline static void Initialize(const MatType& V,
+                              const size_t r,
+                              arma::mat& W,
+                              arma::mat& H)
+ at endcode
+
+ at subsection update_rule_amftut Using different update rules
+
+AMF supports following update rules
+ - \ref mlpack::amf::NMFALSUpdate "AMFALSUpdate"
+ - \ref mlpack::amf::NMFMultiplicativeDistanceUpdate "NMFMultiplicativeDistanceUpdate"
+ - \ref mlpack::amf::NMFMultiplicativeDivergenceUpdate "NMFMultiplicativeDivergenceUpdate"
+ - \ref mlpack::amf::SVDBatchLearning "SVDBatchLearning"
+ - \ref mlpack::amf::SVDIncompleteIncrementalLearning "SVDIncompleteIncrementalLearning"
+ - \ref mlpack::amf::SVDCompleteIncrementalLearning "SVDCompleteIncrementalLearning"
+ 
+Non-Negative Matrix factorization can be achieved with NMFALSUpdate, 
+NMFMultiplicativeDivergenceUpdate or NMFMultiplicativeDivergenceUpdate. 
+NMFALSUpdate implements simple Alternating Least Square optimization while 
+the other rules implement algorithms given in paper 'Algorithms for Non-negative 
+Matrix Factorization'.
+
+The remaining update rules perform Singular Value Decomposition of matrix V.
+This SVD factorization is optimized for the use by Collaborative Filtering. This
+use of SVD factorizers for Collaborative Filtering is described in the paper
+'A Guide to singular Value Decomposition' by Chih-Chao Ma. For further details 
+about the algorithms refer to the respective class documentation.
+
+ at subsection nmf_amftut Using Non-Negative Matrix Factorization with AMF
+
+The use of AMF for Non-Negative Matrix factorization is simple. The AMF module 
+defines \ref mlpack::amf::NMFALSFactorizer "NMFALSFactorizer" which can be used 
+directly without knowing the internal structure of AMF. For example -
+
+ at code
+#include <iostream>
+#include <mlpack/core.hpp>
+#include <mlpack/methods/amf/amf.hpp>
+
+using namespace std;
+using namespace arma;
+using namespace mlpack::amf;
+
+int main()
+{
+  NMFALSFactorizer nmf;
+  mat W, H;
+  mat V = randu<mat>(100, 100);
+  double residue = Apply(V, W, H);
+  reeturn 1;
+}
+ at endcode
+
+NMFALSFactorizer uses SimpleResidueTermination which is most prefered with 
+Non-Negative Matrix factorizers. Initialization of W and H in NMFALSFactorizer
+is random. The Apply function returns the residue obtained by comparing the 
+constructed matrix W * H with the original matrix V.
+
+ at subsection svd_amftut Using Singular Value Decomposition with AMF
+
+AMF implementation supports following SVD factorizers 
+ - \ref mlpack::amf::SVDBatchFactorizer "SVDBatchFactorizer"
+ - \ref mlpack::amf::SparseSVDBatchFactorizer "SparseSVDBatchFactorizer"
+ - \ref mlpack::amf::SVDIncompleteIncrementalFactorizer "SVDIncompleteIncrementalFactorizer"
+ - \ref mlpack::amf::SparseSVDIncompleteIncrementalFactorizer "SparseSVDIncompleteIncrementalFactorizer"
+ - \ref mlpack::amf::SVDCompleteIncrementalFactorizer "SVDCompleteIncrementalFactorizer"
+ - \ref mlpack::amf::SparseSVDCompleteIncrementalFactorizer "SparseSVDCompleteIncrementalFactorizer"
+ 
+The sparse version of factorizers can be used with Armadillo's sparse matrix 
+support. These  specialized implementations boost runtime performance when the 
+matrix to be factorized is relatively sparse.
+
+ at code
+#include <mlpack/core.hpp>
+#include <mlpack/methods/amf/amf.hpp>
+
+using namespace std;
+using namespace arma;
+using namespace mlpack::amf;
+
+int main()
+{
+  sp_mat V = randu<sp_mat>(100,100);
+  mat W, H;
+  
+  SparseSVDBatchFactorizer svd;
+  double residue = svd.Apply(V, W, H);
+}
+ at endcode   
+
+ at section further_doc_amftut Further documentation
+
+For further documentation on the AMF class, consult the \ref mlpack::amf::AMF 
+"complete API documentation".
+
+*/
diff --git a/doc/tutorials/tutorials.txt b/doc/tutorials/tutorials.txt
index 760cb76..3579146 100644
--- a/doc/tutorials/tutorials.txt
+++ b/doc/tutorials/tutorials.txt
@@ -31,5 +31,5 @@ examples and progress to complex, extensible uses.
  - \ref kmtutorial
  - \ref fmkstutorial
  - \ref emst_tutorial
-
+ - \ref amftutorial
 */
diff --git a/src/mlpack/methods/amf/amf.hpp b/src/mlpack/methods/amf/amf.hpp
index 1049bb3..6b92ca8 100644
--- a/src/mlpack/methods/amf/amf.hpp
+++ b/src/mlpack/methods/amf/amf.hpp
@@ -4,6 +4,8 @@
  * @author Mohan Rajendran
  * @author Ryan Curtin
  *
+ * Alternating Matrix Factorization
+ *
  * The AMF (alternating matrix factorization) class, from which more commonly
  * known techniques such as incremental SVD, NMF, and batch-learning SVD can be
  * derived.
@@ -32,7 +34,7 @@
 #include <mlpack/methods/amf/termination_policies/simple_residue_termination.hpp>
 
 namespace mlpack {
-namespace amf {
+namespace amf /** Alternating Matrix Factorization **/ {
 
 /**
  * This class implements AMF (alternating matrix factorization) on the given
@@ -69,7 +71,7 @@ namespace amf {
  * @tparam UpdateRule The update rule for calculating W and H matrix at each
  *     iteration.
  *
- * @see NMF_MultiplicativeDistanceUpdate
+ * @see NMFMultiplicativeDistanceUpdate, SimpleResidueTermination
  */
 template<typename TerminationPolicyType = SimpleResidueTermination,
          typename InitializationRuleType = RandomInitialization,
diff --git a/src/mlpack/methods/amf/init_rules/random_acol_init.hpp b/src/mlpack/methods/amf/init_rules/random_acol_init.hpp
index fef05a7..380b5fd 100644
--- a/src/mlpack/methods/amf/init_rules/random_acol_init.hpp
+++ b/src/mlpack/methods/amf/init_rules/random_acol_init.hpp
@@ -2,11 +2,7 @@
  * @file random_acol_init.hpp
  * @author Mohan Rajendran
  *
- * Intialization rule for Non-Negative Matrix Factorization. This simple
- * initialization is performed by the random Acol initialization introduced in
- * the paper 'Algorithms, Initializations and Convergence' by Langville et al.
- * This method sets each of the columns of W by averaging p randomly chosen
- * columns of V.
+ * Intialization rule for Alternating Matrix Factorization. 
  *
  * This file is part of MLPACK 1.0.10.
  *
@@ -32,9 +28,11 @@ namespace mlpack {
 namespace amf {
 
 /**
- * This class initializes the W matrix of the NMF algorithm by averaging p
+ * This class initializes the W matrix of the AMF algorithm by averaging p
  * randomly chosen columns of V.  In this case, p is a template parameter.  H is
- * then set randomly.
+ * then set randomly This simple initialization is performed by the random 
+ * Acol initialization introduced in the paper 'Algorithms, Initializations and 
+ * Convergence' by Langville et al.
  *
  * @tparam The number of random columns to average for each column of W.
  */
diff --git a/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp b/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp
index 1b16ec1..6ab81f4 100644
--- a/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp
+++ b/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp
@@ -31,6 +31,11 @@ template <class TerminationPolicy>
 class CompleteIncrementalTermination
 {
  public:
+  /**
+   * Empty constructor
+   *
+   * @param t_policy object of wrapped class.
+   */
   CompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy())
             : t_policy(t_policy) {}
 
diff --git a/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp b/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp
index d596ba5..4021600 100644
--- a/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp
+++ b/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp
@@ -29,6 +29,11 @@ template <class TerminationPolicy>
 class IncompleteIncrementalTermination
 {
  public:
+  /**
+   * Empty constructor
+   *
+   * @param t_policy object of wrapped class.
+   */
   IncompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy())
             : t_policy(t_policy) {}
 
diff --git a/src/mlpack/methods/amf/update_rules/nmf_als.hpp b/src/mlpack/methods/amf/update_rules/nmf_als.hpp
index ba5ce32..17395e4 100644
--- a/src/mlpack/methods/amf/update_rules/nmf_als.hpp
+++ b/src/mlpack/methods/amf/update_rules/nmf_als.hpp
@@ -2,13 +2,7 @@
  * @file nmf_als.hpp
  * @author Mohan Rajendran
  *
- * Update rules for the Non-negative Matrix Factorization. This follows a method
- * titled 'Alternating Least Squares' described in the paper 'Positive Matrix
- * Factorization: A Non-negative Factor Model with Optimal Utilization of
- * Error Estimates of Data Values' by P. Paatero and U. Tapper. It uses least
- * squares projection formula to reduce the error value of
- * \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ by alternately calculating W and H
- * respectively while holding the other matrix constant.
+ * Update rules for the Non-negative Matrix Factorization. 
  *
  * This file is part of MLPACK 1.0.10.
  *
@@ -34,12 +28,17 @@ namespace mlpack {
 namespace amf {
 
 /**
- * The alternating least square update rules of matrices W and H.
+ * This class implements a method titled 'Alternating Least Squares' described 
+ * in the paper 'Positive Matrix Factorization: A Non-negative Factor Model with 
+ * Optimal Utilization of Error Estimates of Data Values' by P Paatero and 
+ * U Tapper. It uses least squares projection formula to reduce the error 
+ * value of \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ by alternately calculating W 
+ * and H respectively while holding the other matrix constant.
  */
 class NMFALSUpdate
 {
  public:
-  // Empty constructor required for the UpdateRule template.
+  //! Empty constructor required for the UpdateRule template.
   NMFALSUpdate() { }
 
   template<typename MatType>
@@ -108,7 +107,7 @@ class NMFALSUpdate
       }
     }
   }
-};
+}; // class NMFALSUpdate
 
 }; // namespace amf
 }; // namespace mlpack
diff --git a/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp b/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp
index b0ed160..73a64b5 100644
--- a/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp
+++ b/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp
@@ -2,12 +2,7 @@
  * @file nmf_mult_dist.hpp
  * @author Mohan Rajendran
  *
- * Update rules for the Non-negative Matrix Factorization. This follows a method
- * described in the paper 'Algorithms for Non-negative Matrix Factorization'
- * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures
- * that the Frobenius norm \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ is
- * non-increasing between subsequent iterations. Both of the update rules
- * for W and H are defined in this file.
+ * Update rules for the Non-negative Matrix Factorization.
  *
  * This file is part of MLPACK 1.0.10.
  *
@@ -33,7 +28,12 @@ namespace mlpack {
 namespace amf {
 
 /**
- * The multiplicative distance update rules for matrices W and H.
+ * The multiplicative distance update rules for matrices W and H. This follows 
+ * a method described in the paper 'Algorithms for Non-negative Matrix Factorization'
+ * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures
+ * that the Frobenius norm \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ is
+ * non-increasing between subsequent iterations. Both of the update rules
+ * for W and H are defined in this file.
  */
 class NMFMultiplicativeDistanceUpdate
 {
diff --git a/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp b/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp
index b09f3ff..c4ecb97 100644
--- a/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp
+++ b/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp
@@ -2,17 +2,7 @@
  * @file mult_div_update_rules.hpp
  * @author Mohan Rajendran
  *
- * Update rules for the Non-negative Matrix Factorization. This follows a method
- * described in the paper 'Algorithms for Non-negative Matrix Factorization'
- * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures
- * that the Kullbackâ€“Leibler divergence
- * \f$ \sum_i \sum_j (V_{ij} log\frac{V_{ij}}{(WH)_{ij}}-V_{ij}+(WH)_{ij}) \f$is
- * non-increasing between subsequent iterations. Both of the update rules
- * for W and H are defined in this file.
- *
- * This set of update rules is not meant to work with sparse matrices.  Using
- * sparse matrices often causes NaNs in the output, so other choices of update
- * rules are better in that situation.
+ * Update rules for the Non-negative Matrix Factorization. 
  *
  * This file is part of MLPACK 1.0.10.
  *
@@ -37,6 +27,18 @@
 namespace mlpack {
 namespace amf {
 
+/**
+ * This follows a method described in the paper 'Algorithms for Non-negative 
+ * Matrix Factorization' by D. D. Lee and H. S. Seung. This is a multiplicative 
+ * rule that ensures that the Kullbackâ€“Leibler divergence
+ * \f$ \sum_i \sum_j (V_{ij} log\frac{V_{ij}}{(WH)_{ij}}-V_{ij}+(WH)_{ij}) \f$
+ * is non-increasing between subsequent iterations. Both of the update rules
+ * for W and H are defined in this file.
+ *
+ * This set of update rules is not meant to work with sparse matrices.  Using
+ * sparse matrices often causes NaNs in the output, so other choices of update
+ * rules are better in that situation.
+ */
 class NMFMultiplicativeDivergenceUpdate
 {
  public: