[mlpack-svn] [MLPACK] #293: RASearchRules::MinimumSamplesReqd() seems to fail when k > 1

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Fri Jul 5 16:23:16 EDT 2013


#293: RASearchRules::MinimumSamplesReqd() seems to fail when k > 1
-------------------------------------------------------------+--------------
  Reporter:  rcurtin                                         |        Owner:  pram        
      Type:  defect                                          |       Status:  new         
  Priority:  major                                           |    Milestone:  mlpack 1.0.7
 Component:  mlpack                                          |   Resolution:              
  Keywords:  allkrann, rank-approximate, minimumsamplesreqd  |     Blocking:              
Blocked By:                                                  |  
-------------------------------------------------------------+--------------

Comment (by pram):

 Replying to [ticket:293 rcurtin]:
 > When I attempt to run RASearch<> with k > 1 using the code below, it
 gets caught in a loop:
 >
 > {{{
 > arma::mat dataset(5, 2500);
 > dataset.randn();
 >
 > arma::Mat<size_t> neighbors;
 > arma::mat distances;
 >
 > RASearch<> allkrann(dataset);
 > allkrann.Search(5, neighbors, distances);
 > }}}
 >
 > Some inspection reveals that it is getting stuck in the loop at
 ra_search_rules_impl.hpp:111.

 The way that rank approximation for kNN is set up in the code is that for
 a given 'tau', k neighbors are sought from the top 'ceiling(tau * #points
 / 100)' neighbors. The issue with your sample code is that the default
 level of approximation is 0.1%, which amounts to looking for 5 neighbors
 from the top 3 neighbors (which is impossible). Hence the infinite loop.

 Initially I had a sanity check for this in 'allkrann_main.cpp', but I
 added a similar check in 'ra_search_impl.hpp' in the Search() function
 (Lines 162-169 in revision 15423), and in 'ra_search_rules_impl.hpp' in
 the MinimumSamplesReqd() function (Lines 100-110 in revision 15423). This
 should "fix" the problem.

 >
 > In that loop, the probability of success is calculated and then compared
 with the desired probability of success (alpha), and the number of samples
 is updated to find the right number of samples.
 >
 > When k > 1, it seems as though the number of samples required is
 converging to the number of points in the dataset, but then the
 probability of success is not high enough, so it does not terminate.  This
 does not seem right, because the probability of success should be 1 when
 all of the points in the dataset are being sampled.
 >
 > It would probably be a good idea to write a test in allkrann_test.cpp
 that tested SuccessProbability() and perhaps MinimumSamplesReqd() (maybe
 that one is a bit harder to test and less relevant) to ensure that changes
 after the fix to this bug do not break it again.
 >

 I can write tests for these functions if you want. They should be fairly
 straightforward. I just need to pick a few examples, work them out by hand
 and add them. Let me know if you want me to do it.

 > Also, the 'return (m + 1)' line at the end of MinimumSamplesReqd() is
 probably incorrect when m == n (that is, when m is equal to the number of
 points in the dataset) because you can't sample more points than exist in
 the dataset.

 This was meant mostly to enhance the success probability (we would rather
 overestimate than underestimate). The hope is that usually the required
 number of samples is much less than the total number of points in the set
 (m << n). Hence incrementing m to m + 1 is not too bad. However, if m ==
 n, we are already in trouble. But I did make the fix to make sure that the
 MinimumSamplesReqd() never returns more than n (Line 158 in
 ra_search_rules_impl.hpp).

 I have changed the status to 'fixed'. Let me know if this is not
 appropriate.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/293#comment:2>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.


More information about the mlpack-svn mailing list