[mlpack-git] master: Update documentation. (79bf330)
gitdub at mlpack.org
gitdub at mlpack.org
Thu Apr 14 10:11:39 EDT 2016
Repository : https://github.com/mlpack/mlpack
On branch : master
Link : https://github.com/mlpack/mlpack/compare/f73d89833d45e0870d97c133ad55094f494c8061...08ffa1b0c6d0a9fa05e2eb3dc9a993ea7fa97d54
>---------------------------------------------------------------
commit 79bf33090e229961a2bd90333bc42e9de69688d3
Author: Ryan Curtin <ryan at ratml.org>
Date: Wed Apr 13 11:03:20 2016 -0400
Update documentation.
>---------------------------------------------------------------
79bf33090e229961a2bd90333bc42e9de69688d3
doc/guide/formats.hpp | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/doc/guide/formats.hpp b/doc/guide/formats.hpp
index 0ce8702..236fce0 100644
--- a/doc/guide/formats.hpp
+++ b/doc/guide/formats.hpp
@@ -29,15 +29,15 @@ following file types:
- Armadillo binary, denoted by .bin
- Raw binary, denoted by .bin \b "(note: this will be loaded as"
\b "one-dimensional data, which is likely not what is desired.)"
- - HDF5, denoted by .hdf, .hdf5, .h5, or .he5 \b "(note: HDF5 must be enabled"
- \b "in the Armadillo configuration)"
- - ARFF, denoted by .arff \b "(note: this is not supported by all mlpack"
- \b "command-line programs"; see \ref formatinfo )
+ - HDF5, denoted by .hdf, .hdf5, .h5, or .he5 (<b>note: HDF5 must be enabled"
+ in the Armadillo configuration</b>)
+ - ARFF, denoted by .arff (<b>note: this is not supported by all mlpack"
+ command-line programs </b>; see \ref formatcat )
-Datasets that are loaded by mlpack should be stored with \b "one row for "
-\b "one point" and \b "one column for one dimension". Therefore, a dataset with
-three two-dimensional points \f$(0, 1)\f$, \f$(3, 1)\f$, and \f$(5, -5)\f$ would
-be stored in a csv file as:
+Datasets that are loaded by mlpack should be stored with <b>one row for
+one point</b> and <b>one column for one dimension</b>. Therefore, a dataset
+with three two-dimensional points \f$(0, 1)\f$, \f$(3, 1)\f$, and \f$(5, -5)\f$
+would be stored in a csv file as:
\code
0, 1
@@ -107,7 +107,6 @@ but also as categorical data (i.e. with numeric but unordered categories). This
support is useful for, e.g., decision trees and other models that support
categorical features.
-
In some machine learning situations, such as, e.g., decision trees, categorical
data can be used. Categorical data might look like this (in CSV format):
@@ -142,6 +141,10 @@ $ mlpack_hoeffding_tree -t dataset.csv -l dataset.labels.csv -v
...
\endcode
+Currently, only the \c mlpack_hoeffding_tree program supports loading
+categorical data, and this is also the only program that supports loading an
+ARFF dataset.
+
@section formatcatcpp Categorical features and C++
When writing C++, loading categorical data is slightly more tricky: the mappings
More information about the mlpack-git
mailing list