<p>I looked through this on the plane today and I think the biggest issue is providing a way for users to pass <code>oldFromNew</code> to the <code>NeighborSearch</code> class.  I disagree with this... a long explanation follows...</p>


<p>I believe that there are two ways one can use the <code>NeighborSearch</code> class: either you can pass in a dataset (possibly by reference where it may be copied, possibly by move constructor where it won't), and then the <code>NeighborSearch</code> class is responsible for the tree and everything associated with it.  Otherwise, you pass in a pre-constructed tree and the <code>NeighborSearch</code> class is not responsible for it.  One of the key aspects of "responsibility" here is that if the <code>NeighborSearch</code> class is responsible, then it must handle the <code>oldFromNew</code> mappings that <em>might</em> exist if <code>RearrangesDataset == true</code>.  But <code>RearrangesDataset</code> is not always true, so there are only some cases where we need it.</p>


<p>Thus, if we allow the user to make <code>NeighborSearch</code> take control of the tree, this necessitates that <em>sometimes</em>, they will also have to pass <code>oldFromNew</code> in <em>with</em> the tree; but not always.  We have no way to make a clean API that only has the right overload available depending on the <code>TreeType</code>.  (Yes, we can do SFINAE so only one is accepted, but an API with lots of <code>enable_if</code>s that is presented to the end user can be very confusing.  Armadillo handles this by basically hiding all of its internal details, but I don't like that strategy.)</p>


<p>So there is another option: we can hold the mappings inside of the <code>TreeType</code> and do away with <code>RearrangesDataset</code> entirely.  But this is problematic also: it means that every time we call <code>Point()</code>, we must apply the mapping... this will cause unacceptable slowdowns for anything that uses <code>BinarySpaceTree</code>.  So that's not really an option we can pursue.</p>


<p>There is a final option: do some crazy template overloading in order to hold <code>oldFromNew</code> inside of <code>BinarySpaceTree</code> but avoid de-mapping every time <code>Point()</code> is called.  I don't think that's a great idea though (and I'm not even sure it would work).  In addition it would increase the size of the <code>BinarySpaceTree</code> class which is something I'd like to try to avoid if possible.</p>


<p>The policy I have dealt with so far has worked like this: if the user passes in a dataset, then <code>NeighborSearch</code> manages the tree and the mappings, and the results you get are what you expect.  But if the user passes in the tree, then it is up to them to reverse the mappings.  This is potentially useful because it offers some speedup in the situation where the user has their own tree, but, say, aren't concerned with the ordering of their points and when their points are remapped, they don't care about the new ordering.</p>


<p>In my view, we can't know whether or not the user truly wants the points to be unmapped when they pass a tree, and I want to avoid the extra complexity of an API with a bunch of <code>Train()</code> overloads where some take a mappings vector and some don't.  So in my opinion, we can solve this by not allowing an overload like <code>Train(TreeType&amp;&amp;)</code> and instead only allowing <code>Train(TreeType&amp;)</code> (implying that the <code>NeighborSearch</code> object does not own the mappings or tree and is not responsible for unmapping the results when <code>Search()</code> is called).  If we only allow <code>Train(TreeType&amp;)</code>, we don't own the tree, so even if the tree does map points, we do not need to do the unmapping ourselves.</p>


<p>I hope my very long comment makes some amount of sense.  I am open to the concept of an eventual redesign, but I believe we must strongly pursue keeping the API simple enough for the average user but flexible enough for an advanced user to be able to get what they need done.  In this case having fewer overloads of <code>Train()</code> keeps it simple, but if an advanced user wants to do something tricky with a custom-built tree that maps points, then they can also figure out how to map the points back to their original indices.</p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/765#issuecomment-241224203">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFMe0aeb_Cf5ZD3JAviNcP0VLeQwQks5qh26xgaJpZM4JoBeZ">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFPGxd3ukLwKDU8DbPIis672bALldks5qh26xgaJpZM4JoBeZ.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/765#issuecomment-241224203"></link>

  <meta itemprop="name" content="View Pull Request"></meta>

</div>

<meta itemprop="description" content="View this Pull Request on GitHub"></meta>

</div>


<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mlpack/mlpack","title":"mlpack/mlpack","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mlpack/mlpack"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rcurtin in #765: I looked through this on the plane today and I think the biggest issue is providing a way for users to pass `oldFromNew` to the `NeighborSearch` class.  I disagree with this... a long explanation follows...\r\n\r\nI believe that there are two ways one can use the `NeighborSearch` class: either you can pass in a dataset (possibly by reference where it may be copied, possibly by move constructor where it won't), and then the `NeighborSearch` class is responsible for the tree and everything associated with it.  Otherwise, you pass in a pre-constructed tree and the `NeighborSearch` class is not responsible for it.  One of the key aspects of \"responsibility\" here is that if the `NeighborSearch` class is responsible, then it must handle the `oldFromNew` mappings that _might_ exist if `RearrangesDataset == true`.  But `RearrangesDataset` is not always true, so there are only some cases where we need it.\r\n\r\nThus, if we allow the user to make `NeighborSearch` take control of the tree, this necessitates that *sometimes*, they will also have to pass `oldFromNew` in _with_ the tree; but not always.  We have no way to make a clean API that only has the right overload available depending on the `TreeType`.  (Yes, we can do SFINAE so only one is accepted, but an API with lots of `enable_if`s that is presented to the end user can be very confusing.  Armadillo handles this by basically hiding all of its internal details, but I don't like that strategy.)\r\n\r\nSo there is another option: we can hold the mappings inside of the `TreeType` and do away with `RearrangesDataset` entirely.  But this is problematic also: it means that every time we call `Point()`, we must apply the mapping... this will cause unacceptable slowdowns for anything that uses `BinarySpaceTree`.  So that's not really an option we can pursue.\r\n\r\nThere is a final option: do some crazy template overloading in order to hold `oldFromNew` inside of `BinarySpaceTree` but avoid de-mapping every time `Point()` is called.  I don't think that's a great idea though (and I'm not even sure it would work).  In addition it would increase the size of the `BinarySpaceTree` class which is something I'd like to try to avoid if possible.\r\n\r\nThe policy I have dealt with so far has worked like this: if the user passes in a dataset, then `NeighborSearch` manages the tree and the mappings, and the results you get are what you expect.  But if the user passes in the tree, then it is up to them to reverse the mappings.  This is potentially useful because it offers some speedup in the situation where the user has their own tree, but, say, aren't concerned with the ordering of their points and when their points are remapped, they don't care about the new ordering.\r\n\r\nIn my view, we can't know whether or not the user truly wants the points to be unmapped when they pass a tree, and I want to avoid the extra complexity of an API with a bunch of `Train()` overloads where some take a mappings vector and some don't.  So in my opinion, we can solve this by not allowing an overload like `Train(TreeType\u0026\u0026)` and instead only allowing `Train(TreeType\u0026)` (implying that the `NeighborSearch` object does not own the mappings or tree and is not responsible for unmapping the results when `Search()` is called).  If we only allow `Train(TreeType\u0026)`, we don't own the tree, so even if the tree does map points, we do not need to do the unmapping ourselves.\r\n\r\nI hope my very long comment makes some amount of sense.  I am open to the concept of an eventual redesign, but I believe we must strongly pursue keeping the API simple enough for the average user but flexible enough for an advanced user to be able to get what they need done.  In this case having fewer overloads of `Train()` keeps it simple, but if an advanced user wants to do something tricky with a custom-built tree that maps points, then they can also figure out how to map the points back to their original indices."}],"action":{"name":"View Pull Request","url":"https://github.com/mlpack/mlpack/pull/765#issuecomment-241224203"}}}</script>