[mlpack] Question about KMeans benchmark

Thu Jul 17 11:27:14 EDT 2014

On Thu, Jul 17, 2014 at 11:14 AM, Liu Liu <lliu at stern.nyu.edu> wrote:

> Hi Ryan,
>
> I am interested in using KMeans in MLPACK for my research purpose. I have
> several questions about the benchmark of Kmeans in your website.
>
> 1) What are the datasets? How large (# of items, # of features)?
>

Ah, I should have read the benchmark results more carefully. I find the
dataset names now.

> 2) Is the result based on a single run or multiple run? Matlab has a
> parameter to run Kmeans multiple times and choose the best one as final
> result.
> 3) Do you use Bradley-Fayyad "refined start" when test KMeans for
> benchmark?
> 4) How do you select other parameters for each dataset? The result only
> showed # of clusters.
>
> Regarding how to select a good initial start, you mentioned in the website
> that there are multiple strategies for choosing initial points effectively
> and MLPACK implements some of these, notably the Bradley-Fayyad algorithm.
> Have you tried other initialization methods, e.g., KMeans++
> <http://en.wikipedia.org/wiki/K-means%2B%2B> or XMeans
> <http://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf>, or compared their
> performance?
>
> Thank you!
>
> btw, I real like the project, the coding style and the nice documentation.
> Thank you for making it available to us!!
>
> Best,
> Liu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140717/bb9bcb11/attachment.html>