[mlpack-svn] [MLPACK] #293: RASearchRules::MinimumSamplesReqd() seems to fail when k > 1

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Tue Jun 25 12:14:11 EDT 2013


#293: RASearchRules::MinimumSamplesReqd() seems to fail when k > 1
---------------------+------------------------------------------------------
 Reporter:  rcurtin  |        Owner:  pram                                          
     Type:  defect   |       Status:  new                                           
 Priority:  major    |    Milestone:  mlpack 1.0.7                                  
Component:  mlpack   |     Keywords:  allkrann, rank-approximate, minimumsamplesreqd
 Blocking:           |   Blocked By:                                                
---------------------+------------------------------------------------------
 When I attempt to run RASearch<> with k > 1 using the code below, it gets
 caught in a loop:

 {{{
 arma::mat dataset(5, 2500);
 dataset.randn();

 arma::Mat<size_t> neighbors;
 arma::mat distances;

 RASearch<> allkrann(dataset);
 allkrann.Search(5, neighbors, distances);
 }}}

 Some inspection reveals that it is getting stuck in the loop at
 ra_search_rules_impl.hpp:111.

 In that loop, the probability of success is calculated and then compared
 with the desired probability of success (alpha), and the number of samples
 is updated to find the right number of samples.

 When k > 1, it seems as though the number of samples required is
 converging to the number of points in the dataset, but then the
 probability of success is not high enough, so it does not terminate.  This
 does not seem right, because the probability of success should be 1 when
 all of the points in the dataset are being sampled.

 It would probably be a good idea to write a test in allkrann_test.cpp that
 tested SuccessProbability() and perhaps MinimumSamplesReqd() (maybe that
 one is a bit harder to test and less relevant) to ensure that changes
 after the fix to this bug do not break it again.

 Also, the 'return (m + 1)' line at the end of MinimumSamplesReqd() is
 probably incorrect when m == n (that is, when m is equal to the number of
 points in the dataset) because you can't sample more points than exist in
 the dataset.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/293>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.


More information about the mlpack-svn mailing list