[mlpack-git] [mlpack/mlpack] Simpler Mapping Object for DatasetMapper; (#758)

Keon Kim notifications at github.com
Sun Aug 7 23:09:21 EDT 2016

Current `maps` object for DatasetMapper can be described as maps of
`map<dimension, pair<bimap<string, MappedType>, numMappings>>` (NumMappings usually being numeric primitive types.)

I think this map can be simplified to two parts.
// MapType = maps<dimension, bimap<string, MappedType>>;
MapType maps;
size_t numMappings;

and for validation & imputation purposes we could have another mapper (I will call it invalidMaps for now). Which looks like
// InvalidMapType = maps<string, std::pair<dimension, point>> 
InvalidMapType invalidMaps;
size_t numInvalidMappings;
invalidMaps and maps serve two different purposes.
maps is used as usual (mapping categorical feature to numeric feature).
invalidMaps is used as temporary holder for future imputation. Both x and y coordinates have to be stored in order to track the invalid values, since every invalid values are turned to NaNs.

Ultimately, I think this way we could simplify the use of only one mapping policy instead of many.
What do you think of this idea?

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160807/4c69013c/attachment.html>

More information about the mlpack-git mailing list