[mlpack-git] (blog) master: Lozhnikov, week 8. (471f931)

Mon Jul 18 16:49:06 EDT 2016

Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/4678f6acd72427988c28b3275f8a67ef28c464e8...471f931e36ef68961685ce1fa8a3f7591da15bc6

>---------------------------------------------------------------

commit 471f931e36ef68961685ce1fa8a3f7591da15bc6
Author: Mikhail Lozhnikov <lozhnikovma at gmail.com>
Date:   Mon Jul 18 23:49:06 2016 +0300

    Lozhnikov, week 8.


>---------------------------------------------------------------

471f931e36ef68961685ce1fa8a3f7591da15bc6
 content/blog/LozhnikovWeekEight.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/content/blog/LozhnikovWeekEight.md b/content/blog/LozhnikovWeekEight.md
new file mode 100644
index 0000000..1d2ab4d
--- /dev/null
+++ b/content/blog/LozhnikovWeekEight.md
@@ -0,0 +1,12 @@
+Title: Implementation of tree types : Week 8
+Date: 2016-07-18 23:00:00
+Tags: gsoc, space trees, Vantage point tree, random projection tree
+Author: Mikhail Lozhnikov
+
+Last week I continued working on the vantage point tree. Since the first point of each left subtree of the vantage point tree is the centroid of its bound and the right subtree does not contain the centroid at all, I added a new method (`IsFirstPointCentroid()`) to the TreeType API. This check should allow dual and single tree algorithms to use some optimizations.
+
+Moreover, I implemented two types of the random projection tree: `RPTreeMax` and `RPTreeMean` (these tree types are based on `BinarySpaceTree`). Right now, these tree types use `HRectBound`, I implemented only the split algorithm.
+
+The `RPTreeMax` split divides a node by a random hyperplane. The position of the hyperplane may differ from the median by a random deviation. The `RPTreeMean` split divides a node by a random hyperplane if the diameter is less or equal to the average distance between points multiplied by a constant.
+
+My implementation is sligthly different from the original algorithms. I use a fixed number of random dataset points in order to calculate the median and the average distance between points. And since the algorithm that the paper introduces in order to find the deviation from the median is weird (one of two resulting nodes may be empty), I shrank the bounds of this deviation.