[mlpack-svn] [MLPACK] #163: Decide on "basic type" of observation for MLPACK

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Tue Nov 22 21:49:04 EST 2011


#163: Decide on "basic type" of observation for MLPACK
----------------------+-----------------------------------------------------
 Reporter:  rcurtin   |        Owner:                         
     Type:  wishlist  |       Status:  new                    
 Priority:  blocker   |    Milestone:  MLPACK 1.0             
Component:  MLPACK    |     Keywords:  mlpack observation type
 Blocking:  132       |   Blocked By:                         
----------------------+-----------------------------------------------------
 Ticket #132 seems like a contrary step to this, but as I have thought more
 and more about the goals of that (and run into problems of my own) I am
 not sure it is the right direction to take.

 The problem, simply stated here, is this:

 '''MLPACK methods work on observations, generally vectors of doubles.
 Should MLPACK only allow vectors of doubles (i.e. a matrix) as datasets,
 or should MLPACK allow more arbitrary observation types, like vectors of
 size_t or similar?'''

 Here is what I see as the advantages and disadvantages of restricting all
 datasets to be of type arma::mat:

 Advantages

  * '''Simpler user interface''' (```method(arma::mat&)``` not
 ```method<type>(type&)```)
  * '''Faster calculations''': if we are always using matrices, we can use
 matrix computations instead of looping over a ```std::vector<type>```
 object which is holding our observations.
  * '''Faster compiles''' because the template engine does not need to do
 as much work.
  * '''Easier testing''' because we don't need to consider arbitrary types
 in our test cases.

 Disadvantages:

  * '''Lower generalizability''' because a user has to fit their data to
 our scheme, regardless of the real type of their data.
  * '''Require more error checking''', such as in the following example:
 the Discrete HMM takes observations of type size_t (i.e. integer data
 sequences); if we force that into the arma::vec observation scheme, then
 when we load our data from a file, we have to check that the sequence only
 contains integer data.  And in addition, we can't do ```if (value == 3)```
 but instead ```if (fabs(value - 3) > 1e-5)``` or something like that.

 This is an open question, and I am sure I have not addressed it from all
 perspectives.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/163>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.


More information about the mlpack-svn mailing list