<p>Current <code>maps</code> object for DatasetMapper can be described as maps of<br>
<code>map<dimension, pair<bimap<string, MappedType>, numMappings>></code> (NumMappings usually being numeric primitive types.)</p>
<p>I think this map can be simplified to two parts.</p>
<pre><code>// MapType = maps<dimension, bimap<string, MappedType>>;
MapType maps;
size_t numMappings;
</code></pre>
<p>and for validation & imputation purposes we could have another mapper (I will call it invalidMaps for now). Which looks like</p>
<pre><code>// InvalidMapType = maps<string, std::pair<dimension, point>>
InvalidMapType invalidMaps;
size_t numInvalidMappings;
</code></pre>
<p>invalidMaps and maps serve two different purposes.<br>
maps is used as usual (mapping categorical feature to numeric feature).<br>
invalidMaps is used as temporary holder for future imputation. Both x and y coordinates have to be stored in order to track the invalid values, since every invalid values are turned to NaNs.</p>
<p>Ultimately, I think this way we could simplify the use of only one mapping policy instead of many.<br>
What do you think of this idea?</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/issues/758">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFPg7RBOPCRryH4anKomJn_DRdqfaks5qdp3hgaJpZM4Jeq8j">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFA4uOYZ9tjcu-vW9ThztX1zWQuePks5qdp3hgaJpZM4Jeq8j.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/issues/758"></link>
<meta itemprop="name" content="View Issue"></meta>
</div>
<meta itemprop="description" content="View this Issue on GitHub"></meta>
</div>
<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mlpack/mlpack","title":"mlpack/mlpack","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/modules/aws/aws-bg.jpg","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mlpack/mlpack"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Simpler Mapping Object for DatasetMapper; (#758)"}],"action":{"name":"View Issue","url":"https://github.com/mlpack/mlpack/issues/758"}}}</script>