[mlpack-svn] [MLPACK] #293: RASearchRules::MinimumSamplesReqd() seems to fail when k > 1
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Tue Jun 25 12:14:11 EDT 2013
#293: RASearchRules::MinimumSamplesReqd() seems to fail when k > 1
---------------------+------------------------------------------------------
Reporter: rcurtin | Owner: pram
Type: defect | Status: new
Priority: major | Milestone: mlpack 1.0.7
Component: mlpack | Keywords: allkrann, rank-approximate, minimumsamplesreqd
Blocking: | Blocked By:
---------------------+------------------------------------------------------
When I attempt to run RASearch<> with k > 1 using the code below, it gets
caught in a loop:
{{{
arma::mat dataset(5, 2500);
dataset.randn();
arma::Mat<size_t> neighbors;
arma::mat distances;
RASearch<> allkrann(dataset);
allkrann.Search(5, neighbors, distances);
}}}
Some inspection reveals that it is getting stuck in the loop at
ra_search_rules_impl.hpp:111.
In that loop, the probability of success is calculated and then compared
with the desired probability of success (alpha), and the number of samples
is updated to find the right number of samples.
When k > 1, it seems as though the number of samples required is
converging to the number of points in the dataset, but then the
probability of success is not high enough, so it does not terminate. This
does not seem right, because the probability of success should be 1 when
all of the points in the dataset are being sampled.
It would probably be a good idea to write a test in allkrann_test.cpp that
tested SuccessProbability() and perhaps MinimumSamplesReqd() (maybe that
one is a bit harder to test and less relevant) to ensure that changes
after the fix to this bug do not break it again.
Also, the 'return (m + 1)' line at the end of MinimumSamplesReqd() is
probably incorrect when m == n (that is, when m is equal to the number of
points in the dataset) because you can't sample more points than exist in
the dataset.
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/293>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.
More information about the mlpack-svn
mailing list