[mlpack-git] (blog) master: Adds Yannis Second Week post (fb751ba)

gitdub at mlpack.org gitdub at mlpack.org
Mon Jun 6 04:26:46 EDT 2016


Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/11a2760bafba4ed5a6453700addfb6a640c34deb...1d87acfcce540d0f63b98c8f285cf7714e843735

>---------------------------------------------------------------

commit fb751badd56b1e9010e3ae654b9cac357c915220
Author: Yannis Mentekidis <mentekid at gmail.com>
Date:   Mon Jun 6 11:26:46 2016 +0300

    Adds Yannis Second Week post


>---------------------------------------------------------------

fb751badd56b1e9010e3ae654b9cac357c915220
 content/blog/YannisWeekTwo.md | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/content/blog/YannisWeekTwo.md b/content/blog/YannisWeekTwo.md
new file mode 100644
index 0000000..ceabedf
--- /dev/null
+++ b/content/blog/YannisWeekTwo.md
@@ -0,0 +1,17 @@
+Title: LSH optimizations, modifications, and benchmarking
+Date: 2016-06-06 20:20:20
+Tags: gsoc, lsh, multiprobe
+Author: Yannis Mentekidis
+
+I began this week by debugging my multiprobe implementation which I discussed in my previous post. The algorithm is quite complicated and so I wanted to make sure it runs correctly before moving on to benchmarking and optimization.
+
+Sure enough, I found a lot of minor bugs lying here and there which the tests I had written didn't catch. That worried me, so I decided to write better tests - my idea was to add some simple deterministic test cases.
+
+To do that, I needed to improve access to LSHSearch object's projection tables, which are randomly generated - to have deterministic tests, you need to be able to specify tables instead of allowing the object to generate random ones for you.
+
+In the process of modifying the LSHSearch code to do that, Ryan and I also decided to make a few other modifications, namely
+
+ * Change the data structure that stores the projection tables from an std::vector to an arma::cube. Each slice of the cube is a projection table. This conserves memory and simplifies the code.
+ * Change the implementation of the second level hashing. In the current version, an arma::Mat<size_t> table is created where each row corresponds to a hash bucket and stores indices to points hashed to that bucket. This is inefficient, both because the default secondHashSize is pretty large and because the number of points in each bucket might be uneven - so the resulting table is quite sparse. After some demo codes and discussion, we decided on a solution to these two problems.
+
+So, with LSHSearch transparent, more easily testable and more efficient, we are now ready to perform benchmarks of single- and multiprobe LSH, see what we can optimize in the multiprobe code, and then move on to parallelization. All this will start today, so stay tuned :D
\ No newline at end of file




More information about the mlpack-git mailing list