[mlpack-svn] [MLPACK] #163: Decide on "basic type" of observation for MLPACK
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Tue Nov 22 21:49:04 EST 2011
#163: Decide on "basic type" of observation for MLPACK
----------------------+-----------------------------------------------------
Reporter: rcurtin | Owner:
Type: wishlist | Status: new
Priority: blocker | Milestone: MLPACK 1.0
Component: MLPACK | Keywords: mlpack observation type
Blocking: 132 | Blocked By:
----------------------+-----------------------------------------------------
Ticket #132 seems like a contrary step to this, but as I have thought more
and more about the goals of that (and run into problems of my own) I am
not sure it is the right direction to take.
The problem, simply stated here, is this:
'''MLPACK methods work on observations, generally vectors of doubles.
Should MLPACK only allow vectors of doubles (i.e. a matrix) as datasets,
or should MLPACK allow more arbitrary observation types, like vectors of
size_t or similar?'''
Here is what I see as the advantages and disadvantages of restricting all
datasets to be of type arma::mat:
Advantages
* '''Simpler user interface''' (```method(arma::mat&)``` not
```method<type>(type&)```)
* '''Faster calculations''': if we are always using matrices, we can use
matrix computations instead of looping over a ```std::vector<type>```
object which is holding our observations.
* '''Faster compiles''' because the template engine does not need to do
as much work.
* '''Easier testing''' because we don't need to consider arbitrary types
in our test cases.
Disadvantages:
* '''Lower generalizability''' because a user has to fit their data to
our scheme, regardless of the real type of their data.
* '''Require more error checking''', such as in the following example:
the Discrete HMM takes observations of type size_t (i.e. integer data
sequences); if we force that into the arma::vec observation scheme, then
when we load our data from a file, we have to check that the sequence only
contains integer data. And in addition, we can't do ```if (value == 3)```
but instead ```if (fabs(value - 3) > 1e-5)``` or something like that.
This is an open question, and I am sure I have not addressed it from all
perspectives.
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/163>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.
More information about the mlpack-svn
mailing list