[mlpack-svn] [MLPACK] #365: allknn is several times slower in 1.0.9 than 1.0.8
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Wed Aug 20 22:07:31 EDT 2014
#365: allknn is several times slower in 1.0.9 than 1.0.8
---------------------+------------------------------------------------------
Reporter: rcurtin | Owner:
Type: defect | Status: new
Priority: blocker | Milestone:
Component: mlpack | Keywords: allknn, 1.0.8, 1.0.9, dual-tree traversal, traversal info
Blocking: | Blocked By:
---------------------+------------------------------------------------------
I have isolated the problem to the traversal info commit, r16226.
Pre-r16226:
{{{
:[ ~/work/mlpack-trunk/build ]: 5
:[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d
d.csv -n n.csv -v
[INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data. Size is
54 x 581012.
[INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54
x 581012).
[INFO ] Building reference tree...
[INFO ] Trees built.
[INFO ] Computing 3 nearest neighbors...
[INFO ] 4084108 node combinations were visited.
[INFO ] 8206310 node combinations were scored.
[INFO ] 33333021 base cases were calculated.
[INFO ] Neighbors computed.
[INFO ] Re-mapping indices...
[INFO ] Saving CSV data to 'd.csv'.
[INFO ] Saving CSV data to 'n.csv'.
[INFO ]
[INFO ] Execution parameters:
[INFO ] cover_tree: false
[INFO ] distances_file: d.csv
[INFO ] help: false
[INFO ] info: ""
[INFO ] k: 3
[INFO ] leaf_size: 20
[INFO ] naive: false
[INFO ] neighbors_file: n.csv
[INFO ] query_file: ""
[INFO ] random_basis: false
[INFO ] reference_file: /home/ryan/datasets/covertype.csv
[INFO ] seed: 0
[INFO ] single_mode: false
[INFO ] verbose: true
[INFO ] version: false
[INFO ]
[INFO ] Program timers:
[INFO ] computing_neighbors: 9.526934s
[INFO ] loading_data: 9.164315s
[INFO ] saving_data: 2.093499s
[INFO ] total_time: 30.320291s
[INFO ] tree_building: 9.419669s
}}}
r16226:
{{{
:[ ~/work/mlpack-trunk/build ]: 3
:[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d
d.csv -n n.csv -v
[INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data. Size is
54 x 581012.
[INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54
x 581012).
[INFO ] Building reference tree...
[INFO ] Trees built.
[INFO ] Computing 3 nearest neighbors...
[INFO ] 111957318 node combinations were scored.
[INFO ] 32719779 base cases were calculated.
[INFO ] Neighbors computed.
[INFO ] Re-mapping indices...
[INFO ] Saving CSV data to 'd.csv'.
[INFO ] Saving CSV data to 'n.csv'.
[INFO ]
[INFO ] Execution parameters:
[INFO ] cover_tree: false
[INFO ] distances_file: d.csv
[INFO ] help: false
[INFO ] info: ""
[INFO ] k: 3
[INFO ] leaf_size: 20
[INFO ] naive: false
[INFO ] neighbors_file: n.csv
[INFO ] query_file: ""
[INFO ] random_basis: false
[INFO ] reference_file: /home/ryan/datasets/covertype.csv
[INFO ] seed: 0
[INFO ] single_mode: false
[INFO ] verbose: true
[INFO ] version: false
[INFO ]
[INFO ] Program timers:
[INFO ] computing_neighbors: 41.315907s
[INFO ] loading_data: 9.201931s
[INFO ] saving_data: 2.106239s
[INFO ] total_time: 62.342061s (1 mins, 2.3secs)
[INFO ] tree_building: 9.596633s
}}}
This is a huge issue given mlpack's focus on dual-tree algorithms and the
fact that I advertise mlpack to everyone as "fast nearest neighbor
search". Upon fixing this ticket the fix should be incorporated into
1.0.9 to release 1.0.10 (also other easy backports should be done).
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/365>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.
More information about the mlpack-svn
mailing list