[mlpack-git] master, mlpack-1.0.x: Don't select points from clusters of size 1. (5bea54e)

gitdub at big.cc.gt.atl.ga.us gitdub at big.cc.gt.atl.ga.us
Thu Mar 5 21:47:38 EST 2015


Repository : https://github.com/mlpack/mlpack

On branches: master,mlpack-1.0.x
Link       : https://github.com/mlpack/mlpack/compare/904762495c039e345beba14c1142fd719b3bd50e...f94823c800ad6f7266995c700b1b630d5ffdcf40

>---------------------------------------------------------------

commit 5bea54e8423de4ba7c61f7afd801dce8027c0358
Author: Ryan Curtin <ryan at ratml.org>
Date:   Fri May 23 16:03:41 2014 +0000

    Don't select points from clusters of size 1.


>---------------------------------------------------------------

5bea54e8423de4ba7c61f7afd801dce8027c0358
 src/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp b/src/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp
index e8408a4..c97ef71 100644
--- a/src/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp
+++ b/src/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp
@@ -37,8 +37,12 @@ size_t MaxVarianceNewCluster::EmptyCluster(const MatType& data,
   }
 
   // Divide by the number of points in the cluster to produce the variance.
+  // Although a -nan will occur here for the empty cluster(s), this doesn't
+  // matter because variances.max() won't pick it up.  If the number of points
+  // in the cluster is 1, we ensure that cluster is not selected by forcing the
+  // variance to 0.
   for (size_t i = 0; i < clusterCounts.n_elem; ++i)
-    variances[i] /= clusterCounts[i];
+    variances[i] /= (clusterCounts[i] == 1) ? DBL_MAX : clusterCounts[i];
 
   // Now find the cluster with maximum variance.
   arma::uword maxVarCluster;



More information about the mlpack-git mailing list