[mlpack-git] [mlpack/mlpack] Refactor for faster assembly of secondHashTable. (#675)
Ryan Curtin
notifications at github.com
Sun Jun 5 14:26:05 EDT 2016
Oh, right, I did not think about the fact that different buckets have different numbers of points in them! Now that I think of that, I do think that perhaps `std::vector<size_t>*` is the right way to go (or actually maybe `std::vector<arma::Col<size_t>>`).
I think that we can have the best of both worlds if we do it like this:
* Use `std::vector<arma::Col<size_t>>` for representing `secondHashTable` (this also avoids memory allocation, which is good---I am pretty sure your code had a subtle bug where the user could initialize the `LSHSearch` object without training, but then the destructor would still try to delete the `std::vector<size_t>*` object which would cause a crash).
* Before filling `secondHashTable`, calculate the sizes of each bin (the code I wrote does this), truncating the length to `bucketSize`. Then we can allocate the exact correct size for each `arma::Col<size_t>` (and also allocate exactly the right number of `arma::Col<size_t>`s), and then fill them like your code does.
* When the object is constructed, if `bucketSize = 0`, set `bucketSize = referenceSet.n_cols`.
What do you think, do you think this would work? We would have to modify the serialization again, but I don't think we need to increment the version from 1 to 2 because we did not release mlpack with the serialization change we did before (which was the change from `std::vector<arma::mat>` to `arma::cube`). I was going to try and release mlpack 2.0.2 today, but, if we are going to change serialization again I will wait on this otherwise we will end up with more-complex-than-necessary legacy code to handle. :)
> I can't see your changes any more because there's something wrong with the commits
Yes, there was a force push to the repository to the state it was in about 20 days ago, but I restored the current state earlier today. It seems like the PR interface has not been updated though, so it still shows way more commits.
---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/675#issuecomment-223828754
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160605/e1211c9b/attachment.html>
More information about the mlpack-git
mailing list