[mlpack-git] [mlpack/mlpack] Simpler Mapping Object for DatasetMapper; (#758)

Keon Kim notifications at github.com
Sun Aug 7 23:09:21 EDT 2016


Current `maps` object for DatasetMapper can be described as maps of
`map<dimension, pair<bimap<string, MappedType>, numMappings>>` (NumMappings usually being numeric primitive types.)

I think this map can be simplified to two parts.
```
// MapType = maps<dimension, bimap<string, MappedType>>;
MapType maps;
size_t numMappings;
```

and for validation & imputation purposes we could have another mapper (I will call it invalidMaps for now). Which looks like
```
// InvalidMapType = maps<string, std::pair<dimension, point>> 
InvalidMapType invalidMaps;
size_t numInvalidMappings;
```
invalidMaps and maps serve two different purposes.
maps is used as usual (mapping categorical feature to numeric feature).
invalidMaps is used as temporary holder for future imputation. Both x and y coordinates have to be stored in order to track the invalid values, since every invalid values are turned to NaNs.

Ultimately, I think this way we could simplify the use of only one mapping policy instead of many.
What do you think of this idea?



---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/758
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160807/4c69013c/attachment.html>


More information about the mlpack-git mailing list