[mlpack-git] [mlpack/mlpack] LSHSearch Parallelization (#700)

Yannis Mentekidis notifications at github.com
Fri Jul 8 05:05:36 EDT 2016


> @@ -811,9 +841,18 @@ void LSHSearch<SortPolicy>::Search(const arma::mat& querySet,
>  
>    Timer::Start("computing_neighbors");
>  
> -  // Go through every query point sequentially.
> -  for (size_t i = 0; i < querySet.n_cols; i++)
> +  // Parallelization to process more than one query at a time.
> +  // use as many threads possible but not more than allowed number
> +  size_t numThreadsUsed = maxThreads;
> +  #pragma omp parallel for \
> +    num_threads ( numThreadsUsed )\
> +    shared(avgIndicesReturned, resultingNeighbors, distances) \
> +    schedule(dynamic)

> Is the dynamic schedule the right one to use here? My understanding was that the dynamic schedule had more overhead. In this case it seems like the default static schedule would be just fine.
The problem with static scheduling is it doesn't leave room for work-stealing. Since queries get unequal sizes of candidate sets, in static scheduling some threads will finish their chunks quickly and then be useless. In dynamic scheduling, the compiler will detect slackers and give them more work to do.
I went with dynamic without trying static first, because of this. I can try that and see if there's a difference.

> I think the num_threads() call here will effectively set the number of threads used to the value of the environment variable OMP_NUM_THREADS, but that would be the same as the default anyway if you hadn't set num_threads(). So it makes me think that the maxThreads member is unnecessary (and the other supporting functionality).

Yes I think I can simplify the code more now that we're not doing nested parallelism.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/700/files/c4c8ff950be8a06e06084764f188095c650b7a60#r70045473
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160708/895ec439/attachment.html>


More information about the mlpack-git mailing list