[mlpack-svn] [MLPACK] #365: allknn is several times slower in 1.0.9 than 1.0.8

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Wed Aug 20 22:07:31 EDT 2014


#365: allknn is several times slower in 1.0.9 than 1.0.8
---------------------+------------------------------------------------------
 Reporter:  rcurtin  |        Owner:                                                           
     Type:  defect   |       Status:  new                                                      
 Priority:  blocker  |    Milestone:                                                           
Component:  mlpack   |     Keywords:  allknn, 1.0.8, 1.0.9, dual-tree traversal, traversal info
 Blocking:           |   Blocked By:                                                           
---------------------+------------------------------------------------------
 I have isolated the problem to the traversal info commit, r16226.
 Pre-r16226:

 {{{
 :[ ~/work/mlpack-trunk/build ]: 5
 :[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d
 d.csv -n n.csv -v
 [INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data.  Size is
 54 x 581012.
 [INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54
 x 581012).
 [INFO ] Building reference tree...
 [INFO ] Trees built.
 [INFO ] Computing 3 nearest neighbors...
 [INFO ] 4084108 node combinations were visited.
 [INFO ] 8206310 node combinations were scored.
 [INFO ] 33333021 base cases were calculated.
 [INFO ] Neighbors computed.
 [INFO ] Re-mapping indices...
 [INFO ] Saving CSV data to 'd.csv'.
 [INFO ] Saving CSV data to 'n.csv'.
 [INFO ]
 [INFO ] Execution parameters:
 [INFO ]   cover_tree: false
 [INFO ]   distances_file: d.csv
 [INFO ]   help: false
 [INFO ]   info: ""
 [INFO ]   k: 3
 [INFO ]   leaf_size: 20
 [INFO ]   naive: false
 [INFO ]   neighbors_file: n.csv
 [INFO ]   query_file: ""
 [INFO ]   random_basis: false
 [INFO ]   reference_file: /home/ryan/datasets/covertype.csv
 [INFO ]   seed: 0
 [INFO ]   single_mode: false
 [INFO ]   verbose: true
 [INFO ]   version: false
 [INFO ]
 [INFO ] Program timers:
 [INFO ]   computing_neighbors: 9.526934s
 [INFO ]   loading_data: 9.164315s
 [INFO ]   saving_data: 2.093499s
 [INFO ]   total_time: 30.320291s
 [INFO ]   tree_building: 9.419669s
 }}}

 r16226:

 {{{
 :[ ~/work/mlpack-trunk/build ]: 3
 :[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d
 d.csv -n n.csv -v
 [INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data.  Size is
 54 x 581012.
 [INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54
 x 581012).
 [INFO ] Building reference tree...
 [INFO ] Trees built.
 [INFO ] Computing 3 nearest neighbors...
 [INFO ] 111957318 node combinations were scored.
 [INFO ] 32719779 base cases were calculated.
 [INFO ] Neighbors computed.
 [INFO ] Re-mapping indices...
 [INFO ] Saving CSV data to 'd.csv'.
 [INFO ] Saving CSV data to 'n.csv'.
 [INFO ]
 [INFO ] Execution parameters:
 [INFO ]   cover_tree: false
 [INFO ]   distances_file: d.csv
 [INFO ]   help: false
 [INFO ]   info: ""
 [INFO ]   k: 3
 [INFO ]   leaf_size: 20
 [INFO ]   naive: false
 [INFO ]   neighbors_file: n.csv
 [INFO ]   query_file: ""
 [INFO ]   random_basis: false
 [INFO ]   reference_file: /home/ryan/datasets/covertype.csv
 [INFO ]   seed: 0
 [INFO ]   single_mode: false
 [INFO ]   verbose: true
 [INFO ]   version: false
 [INFO ]
 [INFO ] Program timers:
 [INFO ]   computing_neighbors: 41.315907s
 [INFO ]   loading_data: 9.201931s
 [INFO ]   saving_data: 2.106239s
 [INFO ]   total_time: 62.342061s (1 mins, 2.3secs)
 [INFO ]   tree_building: 9.596633s
 }}}

 This is a huge issue given mlpack's focus on dual-tree algorithms and the
 fact that I advertise mlpack to everyone as "fast nearest neighbor
 search".  Upon fixing this ticket the fix should be incorporated into
 1.0.9 to release 1.0.10 (also other easy backports should be done).

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/365>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.


More information about the mlpack-svn mailing list