[mlpack-git] [mlpack/mlpack] LSHSearch Parallelization (#700)

Mon Jun 27 06:17:01 EDT 2016

I have removed the openmp header dependency completely. Here's how this works at the moment.

- Run cmake with `-DHAS_OMP=TRUE` with an openmp-enabled compiler and compile `mlpack_lsh`
- Running `mlpack_lsh` on large query datasets (100+ queries)  will now be significantly faster. There should be no change in the time for loading data and hash building - only the neighbor computations will be different.
- Setting the environment variable `OMP_NESTED` to `TRUE` and running on single-query datasets should also show speedup, but not as much as multi-query datasets (more on that later).

Theoretically this should all be completely transparent to systems without OpenMP... I've had some trouble building with travis, I'm trying to fix it.

**On the per-query parallelization:**
Parallel for loops will not really improve things here, these are actually quite fast and small. The real bottleneck is `arma::find` and `arma::unique`. We could try to write our own parallel functions, and that should actually provide significant speedup, but it will somewhat bloat the code (especially unique, which requires sorting...). I've added placeholder code to show how that would work, right now it just calls `arma::unique` but it could call our own function.

I think the code that would do the thread accounting would be too confusing and might not actually provide the desired results in the end - as we said, parallelism should be as transparent as possible. Instead, right now if a user activates `OMP_NESTED`, only then can they exploit the per-query parallelization. The reason is, per-query parallelization happens inside an already parallel region, and by default OpenMP doesn't allow threads to spawn their own threads - makes sense, spawning 1000 threads by accident will not be good.
If they enable nested parallelism, I'd say it means they probably know what they are doing, and we should let them.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/700#issuecomment-228707130
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160627/d64ff262/attachment.html>