[mlpack-git] (blog) master: Yannis Mentekidis, Week 1 (628246a)

gitdub at mlpack.org gitdub at mlpack.org
Sun May 29 10:16:37 EDT 2016


Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/e91cf33d9a3d8ea262560ad25e08021bc68d60aa...628246a565e60833277249a7aa256e91aac5b4f7

>---------------------------------------------------------------

commit 628246a565e60833277249a7aa256e91aac5b4f7
Author: Yannis Mentekidis <mentekid at gmail.com>
Date:   Sun May 29 17:16:37 2016 +0300

    Yannis Mentekidis, Week 1


>---------------------------------------------------------------

628246a565e60833277249a7aa256e91aac5b4f7
 content/blog/YannisWeekOne.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/content/blog/YannisWeekOne.md b/content/blog/YannisWeekOne.md
new file mode 100644
index 0000000..3a8fa4f
--- /dev/null
+++ b/content/blog/YannisWeekOne.md
@@ -0,0 +1,20 @@
+Title: Implementation of Multiprobe LSH
+Date: 2016-05-29 20:20:20
+Tags: gsoc, lsh, multiprobe
+Author: Yannis Mentekidis
+
+
+This summer my goal is to improve various features of the current Locality Sensitive Hashing implementation of mlpack, making it faster, smarter, and easier to use. LSH is an approximate nearest neighbors algorithm that uses hashing to greatly reduce the amount of points needed to be examined
+
+I was looking forward to GSoC week 1 for a while - I began with the implementation of [Multiprobe LSH](http://dl.acm.org/citation.cfm?id=1325958), an algorithm that improves on the classic LSH by identifying more hash buckets in each table where a query's neighboring points might be. The algorithm better utilizes the tables created by LSH, meaning fewer ones need to be created, which makes the search take less time and memory. 
+
+The implementation required the modification of the LSH code and corresponding mlpack test cases.
+
+The new parameter, number of additional probing bins, is now accessible to users both as a command line argument (-T) and via a new parameter in LSHSearch.Search().
+
+
+Another mini-feature I implemented, LSHSearch.ComputeRecall() takes two armadillo matrices and computes the recall (% of neighbors found correctly by LSH). This is also accessible from the command line program by using the -t switch to specify a "truth file" - a file of real neighbors.
+
+Using these two features, a user should be able to reduce the number of tables used by LSH and get as good (or better!) recall by increasing the number of additional probing bins.
+
+I am making documentation, testing and style changes and will be opening a pull request in the next few days.
\ No newline at end of file




More information about the mlpack-git mailing list