# [mlpack-svn] r12939 - in mlpack/trunk/doc/tutorials: . emst

fastlab-svn at coffeetalk-1.cc.gatech.edu fastlab-svn at coffeetalk-1.cc.gatech.edu
Mon Jun 4 12:01:30 EDT 2012

Author: march
Date: 2012-06-04 12:01:30 -0400 (Mon, 04 Jun 2012)
New Revision: 12939

mlpack/trunk/doc/tutorials/emst/
mlpack/trunk/doc/tutorials/emst/emst.txt
Log:
wrote EMST tutorial

===================================================================
--- mlpack/trunk/doc/tutorials/emst/emst.txt	                        (rev 0)
+++ mlpack/trunk/doc/tutorials/emst/emst.txt	2012-06-04 16:01:30 UTC (rev 12939)
@@ -0,0 +1,121 @@
+/*!
+
+ at file emst.txt
+ at author Bill March
+ at brief Tutorial for the Euclidean Minimum Spanning Tree algorithm.
+
+ at page emst_tutorial EMST Tutorial
+
+ at section intro_emsttut Introduction
+
+The Euclidean Minimum Spanning Tree problem is widely used in machine learning
+and data mining applications.  Given a set \f$S\f$ of points in \f$\mathbb{R}^d\f$,
+our task is to compute lowest weight spanning tree in the complete graph on \f$S\f$
+with edge weights given by the Euclidean distance between points.
+
+Among other applications, the EMST can be used to compute hierarchical clusterings
+of data.  A \emph{single-linkage clustering} can be obtained from the EMST by deleting
+all edges longer than a given cluster length.  This technique is also referred to as a \emph{Friends-of-Friends} clustering in the astronomy literature.
+
+MLPACK includes an implementation of \emph{Dual-Tree Boruvka} on \f$kd\f$-trees, the empirically and
+theoretically fastest EMST algorithm.  For more details, see March, \emph{et al.}, \emph{Euclidean Minimum Spanning Tree: Algorithm, Analysis, and Applications}, in KDD, 2010.  An implementation on cover trees is forthcoming.
+
+\b mlpack provides:
+
+ - a \ref cli_emsttut "simple command-line executable" to compute the EMST of a given data set
+ - a \ref class_emsttut "simple C++ interface" to compute the EMST
+
+
+A list of all the sections this tutorial contains.
+
+ - \ref intro_emsttut
+ - \ref toc_emsttut
+ - \ref cli_emsttut
+ - \ref dtb_emsttut
+ - \ref further_doc_emsttut
+
+ at section cli_emsttut Command-Line 'EMST'
+
+The emst executable in \b mlpack will compute the EMST of a given set of points and store the resulting edge list to a file.
+
+The output file contains an edge list representation of the MST in an \f$n-1 \times 3 \f$ matrix, where the first and second columns are labels of points and the third column is the edge weight.  The edges are sorted in order of increasing weight.
+
+Below are several examples of simple usage (and the resultant output).  The '-v'
+option is used so that verbose output is given.  Further documentation on each
+individual option can be found by typing
+
+ at code
+$emst --help + at endcode + + at code +$ emst --input_file=dataset.csv --output_file=edge_list.csv -v
+[INFO ] Data read, building tree.
+[INFO ] Tree built, running algorithm.
+[INFO ] 4 edges found so far.
+[INFO ] 5 edges found so far.
+[INFO ] Total squared length: 1002.45
+[INFO ] Saving CSV data to 'edge_list.csv'.
+[INFO ]
+[INFO ] Execution parameters:
+[INFO ]   help: false
+[INFO ]   info: ""
+[INFO ]   input_file: dataset.csv
+[INFO ]   leaf_size: 1
+[INFO ]   naive: false
+[INFO ]   output_file: edge_list.csv
+[INFO ]   verbose: true
+[INFO ]
+[INFO ] Program timers:
+[INFO ]   emst/mst_computation: 0.000179s
+[INFO ]   emst/tree_building: 0.000061s
+[INFO ]   total_time: 0.052641s
+ at endcode
+
+The code performs at most \f$\log N\f$ iterations for \f$N\f$ data points.  It will print an update on the number of MST edges found after each iteration.
+Convenient program timers are given for different parts of the calculation at
+the bottom of the output, as well as the parameters the simulation was run with.
+
+ at code
+$cat dataset.csv +0, 0 +1, 1 +3, 3 +0.5, 0 +1000, 0 +1001, 0 + +$ cat edge_list.csv
+0.0000000000e+00,3.0000000000e+00,5.0000000000e-01
+4.0000000000e+00,5.0000000000e+00,1.0000000000e+00
+1.0000000000e+00,3.0000000000e+00,1.1180339887e+00
+1.0000000000e+00,2.0000000000e+00,2.8284271247e+00
+2.0000000000e+00,4.0000000000e+00,9.9700451353e+02
+ at endcode
+
+The input points are labeled 0-5.  The output tells us that the MST connects point 0 to point 3, point 4 to point 5, point 1 to point 3, point 1 to point 2, and point 2 to point 4, with the corresponding edge weights given in the third column.  The total squared length of the MST is also given in the verbose output.
+
+Note that it is also possible to compute the EMST using a naive (\f$O(N^2)\f$) algorithm for timing and comparison purposes.
+
+ at section dtb_emsttut The 'DualTreeBoruvka' class
+
+The 'DualTreeBoruvka' class contains our implementation of the Dual-Tree Boruvka algorithm.
+
+The class has two constructors: the first takes the data set, constructs the \f$kd\f$-tree, and computes the MST.  The second takes data set and an already constructed tree.
+
+The class provides one method that performs the MST computation:
+ at code
+void ComputeMST(const arma::mat& results);
+ at endcode
+
+This method stores the computed MST in the matrix results in the format given above.
+
+ at subsection further_doc_emsttut Further documentation
+
+For further documentation on the DualTreeBoruvka class, consult the
+\ref mlpack::emst::DualTreeBoruvka "complete API documentation".
+
+*/