[mlpack-svn] r12939 - in mlpack/trunk/doc/tutorials: . emst

fastlab-svn at coffeetalk-1.cc.gatech.edu fastlab-svn at coffeetalk-1.cc.gatech.edu
Mon Jun 4 12:01:30 EDT 2012

Author: march
Date: 2012-06-04 12:01:30 -0400 (Mon, 04 Jun 2012)
New Revision: 12939

wrote EMST tutorial

Added: mlpack/trunk/doc/tutorials/emst/emst.txt
--- mlpack/trunk/doc/tutorials/emst/emst.txt	                        (rev 0)
+++ mlpack/trunk/doc/tutorials/emst/emst.txt	2012-06-04 16:01:30 UTC (rev 12939)
@@ -0,0 +1,121 @@
+ at file emst.txt
+ at author Bill March
+ at brief Tutorial for the Euclidean Minimum Spanning Tree algorithm.
+ at page emst_tutorial EMST Tutorial 
+ at section intro_emsttut Introduction
+The Euclidean Minimum Spanning Tree problem is widely used in machine learning 
+and data mining applications.  Given a set \f$S\f$ of points in \f$\mathbb{R}^d\f$,
+our task is to compute lowest weight spanning tree in the complete graph on \f$S\f$
+with edge weights given by the Euclidean distance between points. 
+Among other applications, the EMST can be used to compute hierarchical clusterings
+of data.  A \emph{single-linkage clustering} can be obtained from the EMST by deleting
+all edges longer than a given cluster length.  This technique is also referred to as a \emph{Friends-of-Friends} clustering in the astronomy literature.
+MLPACK includes an implementation of \emph{Dual-Tree Boruvka} on \f$kd\f$-trees, the empirically and 
+theoretically fastest EMST algorithm.  For more details, see March, \emph{et al.}, \emph{Euclidean Minimum Spanning Tree: Algorithm, Analysis, and Applications}, in KDD, 2010.  An implementation on cover trees is forthcoming.
+\b mlpack provides:
+ - a \ref cli_emsttut "simple command-line executable" to compute the EMST of a given data set
+ - a \ref class_emsttut "simple C++ interface" to compute the EMST
+ at section toc_emsttut Table of Contents
+A list of all the sections this tutorial contains.
+ - \ref intro_emsttut
+ - \ref toc_emsttut
+ - \ref cli_emsttut
+ - \ref dtb_emsttut
+ - \ref further_doc_emsttut
+ at section cli_emsttut Command-Line 'EMST'
+The emst executable in \b mlpack will compute the EMST of a given set of points and store the resulting edge list to a file.  
+The output file contains an edge list representation of the MST in an \f$n-1 \times 3 \f$ matrix, where the first and second columns are labels of points and the third column is the edge weight.  The edges are sorted in order of increasing weight.  
+Below are several examples of simple usage (and the resultant output).  The '-v'
+option is used so that verbose output is given.  Further documentation on each
+individual option can be found by typing
+ at code
+$ emst --help
+ at endcode
+ at code
+$ emst --input_file=dataset.csv --output_file=edge_list.csv -v
+[INFO ] Reading in data.
+[INFO ] Loading 'dataset.csv' as CSV data.
+[INFO ] Data read, building tree.
+[INFO ] Tree built, running algorithm.
+[INFO ] 4 edges found so far.
+[INFO ] 5 edges found so far.
+[INFO ] Total squared length: 1002.45
+[INFO ] Saving CSV data to 'edge_list.csv'.
+[INFO ] 
+[INFO ] Execution parameters:
+[INFO ]   help: false
+[INFO ]   info: ""
+[INFO ]   input_file: dataset.csv
+[INFO ]   leaf_size: 1
+[INFO ]   naive: false
+[INFO ]   output_file: edge_list.csv
+[INFO ]   verbose: true
+[INFO ] 
+[INFO ] Program timers:
+[INFO ]   emst/mst_computation: 0.000179s
+[INFO ]   emst/tree_building: 0.000061s
+[INFO ]   total_time: 0.052641s
+ at endcode
+The code performs at most \f$\log N\f$ iterations for \f$N\f$ data points.  It will print an update on the number of MST edges found after each iteration.  
+Convenient program timers are given for different parts of the calculation at
+the bottom of the output, as well as the parameters the simulation was run with.
+ at code
+$ cat dataset.csv
+0, 0
+1, 1
+3, 3
+0.5, 0
+1000, 0
+1001, 0
+$ cat edge_list.csv
+ at endcode
+The input points are labeled 0-5.  The output tells us that the MST connects point 0 to point 3, point 4 to point 5, point 1 to point 3, point 1 to point 2, and point 2 to point 4, with the corresponding edge weights given in the third column.  The total squared length of the MST is also given in the verbose output.
+Note that it is also possible to compute the EMST using a naive (\f$O(N^2)\f$) algorithm for timing and comparison purposes.  
+ at section dtb_emsttut The 'DualTreeBoruvka' class
+The 'DualTreeBoruvka' class contains our implementation of the Dual-Tree Boruvka algorithm. 
+The class has two constructors: the first takes the data set, constructs the \f$kd\f$-tree, and computes the MST.  The second takes data set and an already constructed tree.  
+The class provides one method that performs the MST computation:
+ at code
+void ComputeMST(const arma::mat& results);
+ at endcode
+This method stores the computed MST in the matrix results in the format given above. 
+ at subsection further_doc_emsttut Further documentation
+For further documentation on the DualTreeBoruvka class, consult the
+\ref mlpack::emst::DualTreeBoruvka "complete API documentation".

More information about the mlpack-svn mailing list