[mlpack-svn] r10784 - in mlpack/trunk/src/mlpack/methods/hmm: . distributions
fastlab-svn at coffeetalk-1.cc.gatech.edu
fastlab-svn at coffeetalk-1.cc.gatech.edu
Wed Dec 14 08:48:54 EST 2011
Author: rcurtin
Date: 2011-12-14 08:48:54 -0500 (Wed, 14 Dec 2011)
New Revision: 10784
Removed:
mlpack/trunk/src/mlpack/methods/hmm/README
Modified:
mlpack/trunk/src/mlpack/methods/hmm/distributions/discrete_distribution.hpp
mlpack/trunk/src/mlpack/methods/hmm/distributions/gaussian_distribution.hpp
mlpack/trunk/src/mlpack/methods/hmm/hmm.hpp
Log:
Update documentation for HMMs. Documentation should be done for this class.
Deleted: mlpack/trunk/src/mlpack/methods/hmm/README
===================================================================
--- mlpack/trunk/src/mlpack/methods/hmm/README 2011-12-14 13:36:56 UTC (rev 10783)
+++ mlpack/trunk/src/mlpack/methods/hmm/README 2011-12-14 13:48:54 UTC (rev 10784)
@@ -1,195 +0,0 @@
-This file contains usage information for HMM package of FASTLIB.
-
-0. File format
-==============
-There are 3 file types used in HMM, for describing HMM profile, storing
-data sequences and state sequences. For compactness and human
-readability, the files are a TEXT files consist of several matrices and
-vectors seperated by lines begining with '%' character (these lines can be
-used for notation/comment). The matrices are stored in column-wise manner
-(i.e. each line is a column). The numbers can be seperated by blank spaces
-or commas.
-
-0.1. HMM profile
-================
-
-The library implements 3 types of HMM: discrete, gaussian and mixture of
-gaussian. Each type has a different profile format.
-
-Discrete HMM
-============
-
-The profile of a discrete HMM contains two matrices: transmission probability
-and emission probability. For example:
-
-% Example of a discrete HMM profile
-% transmission (2 states)
-0.9 0.05
-0.1 0.95
-% emission (2 states x 6 symbols)
-0.166 0.1
-0.166 0.1
-0.166 0.1
-0.166 0.1
-0.166 0.1
-0.17 0.5
-
-Gaussian HMM
-============
-
-The profile of a gaussian HMM contains: transmission matrix, gaussian
-distributions (mean/covariance) of every state. For example:
-
-% Example of a gaussian HMM profile
-% transmission (2 states)
-0.9 0.05
-0.1 0.95
-% mean - state 0
-0 0
-% covariance - state 0
-1.0 0.1
-0.0 1.0
-% mean - state 1
-2 2
-% covariance - state 1
-1.0 0.0
-0.1 1.0
-
-Mixture of Gaussian HMM
-=======================
-
-The profile of a mixture of gaussian HMM contains: transmission matrix,
-mixture of gaussian distributions of every state. Each mixture contains a
-priori probability vector and the mean/covariance of every cluster. For example
-
-% Example of a mixture of gaussian HMM profile
-% transmission
-0.9 0.05
-0.1 0.95
-% prior - state 0
-0.5 0.5
-% mean 0 - state 0
-0 0
-% cov 0 - state 0
-1.0 0.0
-0.0 1.0
-% mean 1 - state 0
-0 5
-% cov 1 - state 0
-1.0 0.0
-0.0 1.0
-% prior - state 1
-0.5 0.5
-% mean 0 - state 1
-5 0
-% cov 0 - state 1
-1.0 0.0
-0.0 1.0
-% mean 1 - state 1
-5 5
-% cov 1 - state 1
-1.0 0.0
-0.0 1.0
-
-0.2. Data sequences
-===================
-
-Discrete HMM
-============
-
-The discrete sequences are vectors separated by lines beginning with '%'.
-For example
-
-% total 6 symbols
-% sequence 1
-1,2,3,4,5,0,2,3,4,5
-% sequence 2
-3,2,0,2,3,4,5,0,3,4
-
-Gaussian and Mixture of Gaussian HMM
-====================================
-
-The data sequences are matrices separated by lines beginning with '%'. Each
-line is an observation at each time step.
-
-0.3. State sequences
-====================
-
-State sequences are store similarly to discrete data sequences.
-
-1. Generate a random sequence from HMM
-======================================
-
-Usage:
- generate --type=={discrete|gaussian|mixture} OPTIONS
-[OPTIONS]
- --profile=file : file contains HMM profile
- --length=NUM : sequence length
- --lenmax=NUM : maximum sequence length, default = length
- --numseq=NUM : number of sequence
- --seqfile=file : output file for generated sequences
- --statefile=file : output file for generated state sequences
-
-2. Calculate log-likelihood of sequences (Forward procedure)
-============================================================
-
-Usage:
- loglik --type=={discrete|gaussian|mixture} OPTIONS
-[OPTIONS]
- --profile==file : file contains HMM profile
- --seqfile==file : file contains input sequences
- --logfile==file : output file for log-likelihood of the sequences
-
-3. Compute the most probable sequence (Viterbi algorithm)
-=========================================================
-
-Usage:
- viterbi --type=={discrete|gaussian|mixture} OPTIONS
-[OPTIONS]
- --profile=file : file contains HMM profile
- --seqfile=file : file contains input sequences
- --statefile=file : output file for state sequences
-
-4. Training/Estimating HMM parameters
-=====================================
-
-Usage:
- train --type=={discrete|gaussian|mixture} OPTION
-[OPTIONS]
- --algorithm={baumwelch|viterbi} : algorithm used for training, default Baum-Welch
- --seqfile=file : file contains input sequences
- --guess=file : file contains guess HMM profile
- --numstate=NUM : if no guess profile is specified, at least specify the number of state
- --profile=file : output file for estimated HMM profile
- --maxiter=NUM : maximum number of iteration, default=500
- --tolerance=NUM : error tolerance on log-likelihood, default=1e-3
-
-5. Examples
-===========
-
-To generate 20 data sequences of length 100 come from a discrete HMM stored
-in 'pro.dis' and save data sequences and state sequences in 'seq.dis.out' and
-'state.dis.out'
-
-./generate --type=discrete --profile=pro.dis --length=100 --numseq=20
- --seqfile=seq.dis.out --statefile=state.dis.out
-
-To calculate the log-likelihood of the sequences in 'seq.dis.out' according
-to the HMM stored in 'pro.dis' and save the results in 'loglik.dis.out'
-
-./loglik --type=discrete --profile=pro.dis --seqfile=seq.dis.out
- --logfile=loglik.dis.out
-
-To compute the most probable state sequences of the sequences in 'seq.dis.out'
-according to the HMM stored in 'pro.dis' and save the results in
-'state.viterbi.dis.out'
-
-./viterbi --type=discrete --profile=pro.dis --seqfile=seq.dis.out
- --statefile=state.viterbi.dis.out
-
-To estimate the model parameters using training data from 'seq.dis.out' with
-a starting guess in 'pro.dis' (Baum-Welch algorithm) and save the profile in
-'pro.dis.out'
-
-./train --type=discrete --seqfile=seq.dis.out --guess=pro.dis
- --profile=pro.dis.out
Modified: mlpack/trunk/src/mlpack/methods/hmm/distributions/discrete_distribution.hpp
===================================================================
--- mlpack/trunk/src/mlpack/methods/hmm/distributions/discrete_distribution.hpp 2011-12-14 13:36:56 UTC (rev 10783)
+++ mlpack/trunk/src/mlpack/methods/hmm/distributions/discrete_distribution.hpp 2011-12-14 13:48:54 UTC (rev 10784)
@@ -11,7 +11,7 @@
#include <mlpack/core.hpp>
namespace mlpack {
-namespace distribution {
+namespace distribution /** Probability distributions. */ {
/**
* A discrete distribution where the only observations are discrete
Modified: mlpack/trunk/src/mlpack/methods/hmm/distributions/gaussian_distribution.hpp
===================================================================
--- mlpack/trunk/src/mlpack/methods/hmm/distributions/gaussian_distribution.hpp 2011-12-14 13:36:56 UTC (rev 10783)
+++ mlpack/trunk/src/mlpack/methods/hmm/distributions/gaussian_distribution.hpp 2011-12-14 13:48:54 UTC (rev 10784)
@@ -14,6 +14,9 @@
namespace mlpack {
namespace distribution {
+/**
+ * A single multivariate Gaussian distribution.
+ */
class GaussianDistribution
{
private:
Modified: mlpack/trunk/src/mlpack/methods/hmm/hmm.hpp
===================================================================
--- mlpack/trunk/src/mlpack/methods/hmm/hmm.hpp 2011-12-14 13:36:56 UTC (rev 10783)
+++ mlpack/trunk/src/mlpack/methods/hmm/hmm.hpp 2011-12-14 13:48:54 UTC (rev 10784)
@@ -12,7 +12,7 @@
#include "distributions/discrete_distribution.hpp"
namespace mlpack {
-namespace hmm {
+namespace hmm /** Hidden Markov Models. */ {
/**
* A class that represents a Hidden Markov Model with an arbitrary type of
@@ -47,7 +47,33 @@
* See the mlpack::distribution::DiscreteDistribution class for an example. One
* would use the DiscreteDistribution class when the observations are
* non-negative integers. Other distributions could be Gaussians, a mixture of
- * Gaussians (GMM), or any other probability distribution.
+ * Gaussians (GMM), or any other probability distribution implementing the
+ * four Distribution functions.
+ *
+ * Usage of the HMM class generally involves either training an HMM or loading
+ * an already-known HMM and taking probability measurements of sequences.
+ * Example code for supervised training of a Gaussian HMM (that is, where the
+ * emission output distribution is a single Gaussian for each hidden state) is
+ * given below.
+ *
+ * @code
+ * extern arma::mat observations; // Each column is an observation.
+ * extern arma::Col<size_t> states; // Hidden states for each observation.
+ * // Create an untrained HMM with 5 hidden states and default (N(0, 1))
+ * // Gaussian distributions with the dimensionality of the dataset.
+ * HMM<GaussianDistribution> hmm(5, GaussianDistribution(observations.n_rows));
+ *
+ * // Train the HMM (the labels could be omitted to perform unsupervised
+ * // training).
+ * hmm.Train(observations, states);
+ * @endcode
+ *
+ * Once initialized, the HMM can evaluate the probability of a certain sequence
+ * (with LogLikelihood()), predict the most likely sequence of hidden states
+ * (with Predict()), generate a sequence (with Generate()), or estimate the
+ * probabilities of each state for a sequence of observations (with Estimate()).
+ *
+ * @tparam Distribution Type of emission distribution for this HMM.
*/
template<typename Distribution = distribution::DiscreteDistribution>
class HMM
@@ -71,7 +97,7 @@
/**
* Create the Hidden Markov Model with the given transition matrix and the
- * given emission probability matrix.
+ * given emission distributions.
*
* The transition matrix should be such that T(i, j) is the probability of
* transition to state i from state j. The columns of the matrix should sum
@@ -81,7 +107,7 @@
* emission i while in state j. The columns of the matrix should sum to 1.
*
* @param transition Transition matrix.
- * @param emission Emission probability matrix.
+ * @param emission Emission distributions.
*/
HMM(const arma::mat& transition, const std::vector<Distribution>& emission);
@@ -90,6 +116,12 @@
* unlabeled observations. Instead of giving a guess transition and emission
* matrix here, do that in the constructor.
*
+ * @note
+ * Train() can be called multiple times with different sequences; each time it
+ * is called, it uses the current parameters of the HMM as a starting point
+ * for training.
+ * @endnote
+ *
* @param dataSeq Vector of observation sequences.
*/
void Train(const std::vector<arma::mat>& dataSeq);
@@ -98,6 +130,12 @@
* Train the model using the given labeled observations; the transition and
* emission matrices are directly estimated.
*
+ * @note
+ * Train() can be called multiple times with different sequences; each time it
+ * is called, it uses the current parameters of the HMM as a starting point
+ * for training.
+ * @endnote
+ *
* @param dataSeq Vector of observation sequences.
* @param stateSeq Vector of state sequences, corresponding to each
* observation.
More information about the mlpack-svn
mailing list