[mlpack-git] (blog) master: Adds Yannis Week Six (2ca010c)

Tue Jul 5 04:50:04 EDT 2016

Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/ce320f7086d287369a5bf9b59a6884152ea1b128...2ca010c0c24ac83e283baf38841fb716d8d70bd7

>---------------------------------------------------------------

commit 2ca010c0c24ac83e283baf38841fb716d8d70bd7
Author: Yannis Mentekidis <mentekid at gmail.com>
Date:   Tue Jul 5 09:50:04 2016 +0100

    Adds Yannis Week Six


>---------------------------------------------------------------

2ca010c0c24ac83e283baf38841fb716d8d70bd7
 content/blog/YannisWeekSix.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/content/blog/YannisWeekSix.md b/content/blog/YannisWeekSix.md
new file mode 100644
index 0000000..383bfde
--- /dev/null
+++ b/content/blog/YannisWeekSix.md
@@ -0,0 +1,18 @@
+Title: Multiprobe LSH Finalized, LSH Parallelization
+Date: 2016-07-05 20:20:20
+Tags: gsoc, lsh, multiprobe, parallel
+Author: Yannis Mentekidis
+
+I started this week by making the final changes to Multiprobe LSH - I fixed the style issues, corrected some bugs, and implemented or re-implemented some tests.
+
+As I've discussed before, LSH, and by extension Multiprobe LSH, is quite a hard algorithm to test, because it is randomized and furthermore doesn't give any practically usable guarantees on the rate of its error. 
+
+So while fixing the documentation and style, I ran into a few bugs in the code. What was worse, I found bugs in my testing code that made it too tolerant of mistakes. So, I re-implemented all the deterministic tests for Multiprobe LSH.
+
+Before the end of the week, Ryan merged my Pull Request to the master branch, so now Multiprobe LSH is available for users of the github build.
+
+I continued my week by trying to implement parallel versions of `arma::find()` and `arma::unique()`. I failed miserably, for two reasons: (a) because these functions are aggressively optimized, and (b) because there's a big overhead on lock contention and shared memory coherency, so big that the parallel version ends up being slower than the sequential one.
+
+In the end, we are probably going to go only with the first and simpler parallelization of the querying process - where an n-core machine can process n queries in parallel, which is the only thing I have attempted that actually showed any improvement.
+
+In related news, I merged the Multiprobe code into the OpenMP code branch, so AppVeyor could build the Pull Request and tell us if it fails in other platforms. Things look good, which is good news :)
\ No newline at end of file