<p>Data mapping is a good idea but seems it is another schema with DatasetInfo var prepared. <br>
data::Load has different overload, what I refer to is above</p>
<div class="highlight highlight-source-c++"><pre><span class="pl-k">template</span><<span class="pl-k">typename</span> eT>
<span class="pl-k">bool</span> <span class="pl-en">Load</span>(<span class="pl-k">const</span> std::string& filename,
arma::Mat<eT>& matrix,
<span class="pl-k">const</span> <span class="pl-k">bool</span> fatal,
<span class="pl-k">const</span> <span class="pl-k">bool</span> transpose)</pre></div>
<p>This func is a powerful implment, but when the file's loadType != arma::hdf5_binary, we use arma::Mat::load..., so tragedy. armadillo's load method does Not make much optimized work for raw_csv or raw_txt, it's based on c++ std iostream, in fact, iostream is slower than c-style stdio.<br>
To use std::ios::sync_with_stdio(false) will make this a little faster.</p>
<p>For my project, I used some stupid tech as contingency plan...</p>
<div class="highlight highlight-source-c++"><pre> <span class="pl-k">bool</span> success;
<span class="pl-k">switch</span>(loadType) {
<span class="pl-k">case</span> arma::hdf5_binary:
success = matrix.<span class="pl-c1">load</span>(filename, loadType);
<span class="pl-k">break</span>;
<span class="pl-k">case</span> arma::csv_ascii:
<span class="pl-k">case</span> arma::raw_ascii:
success = <span class="pl-c1">stream_to_matrix</span>(stream, matrix);
<span class="pl-k">break</span>;
<span class="pl-k">default</span>:
success = matrix.<span class="pl-c1">load</span>(stream, loadType);
<span class="pl-k">break</span>;
}</pre></div>
<p>And a bad temporary solution:</p>
<div class="highlight highlight-source-c++"><pre><span class="pl-k">template</span><<span class="pl-k">typename</span> IStream, <span class="pl-k">typename</span> eT>
<span class="pl-k">inline</span> <span class="pl-k">bool</span> <span class="pl-en">stream_to_matrix</span>(IStream& stream, arma::Mat<eT>& matrix) {
stream.<span class="pl-c1">clear</span>();
stream.<span class="pl-c1">seekg</span>(<span class="pl-c1">0</span>, std::ios::beg);
<span class="pl-k">if</span> (!stream.<span class="pl-c1">good</span>() || stream.<span class="pl-c1">eof</span>() || stream.<span class="pl-c1">fail</span>()) {
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
std::string line;
arma::uword ncol = <span class="pl-c1">0</span>, nrow = <span class="pl-c1">0</span>;
<span class="pl-c1">std::getline</span>(stream, line);
stream.<span class="pl-c1">clear</span>();
stream.<span class="pl-c1">seekg</span>(<span class="pl-c1">0</span>, std::ios::beg);
<span class="pl-k">if</span> (line.<span class="pl-c1">empty</span>()) {
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
<span class="pl-c1">boost::trim</span>(line);
<span class="pl-k">if</span> (<span class="pl-c1">boost::ends_with</span>(line, <span class="pl-s"><span class="pl-pds">"</span>,<span class="pl-pds">"</span></span>)) {
line.<span class="pl-c1">pop_back</span>();
}
<span class="pl-k">char</span> delim = <span class="pl-s"><span class="pl-pds">'</span>,<span class="pl-pds">'</span></span>;
ncol = <span class="pl-c1">std::count</span>(line.<span class="pl-c1">begin</span>(), line.<span class="pl-c1">end</span>(), <span class="pl-s"><span class="pl-pds">'</span>,<span class="pl-pds">'</span></span>); <span class="pl-c">// csv</span>
<span class="pl-k">if</span> (<span class="pl-c1">0</span> == ncol) {
ncol = <span class="pl-c1">std::count</span>(line.<span class="pl-c1">begin</span>(), line.<span class="pl-c1">end</span>(), <span class="pl-s"><span class="pl-pds">'</span><span class="pl-cce">\t</span><span class="pl-pds">'</span></span>); <span class="pl-c">// tsv</span>
delim = <span class="pl-s"><span class="pl-pds">'</span><span class="pl-cce">\t</span><span class="pl-pds">'</span></span>;
<span class="pl-k">if</span> (<span class="pl-c1">0</span> == ncol) {
ncol = <span class="pl-c1">std::count</span>(line.<span class="pl-c1">begin</span>(), line.<span class="pl-c1">end</span>(), <span class="pl-s"><span class="pl-pds">'</span> <span class="pl-pds">'</span></span>); <span class="pl-c">// txt</span>
delim = <span class="pl-s"><span class="pl-pds">'</span> <span class="pl-pds">'</span></span>;
}
}
<span class="pl-k">if</span> (<span class="pl-c1">0</span> == ncol) {
ncol = <span class="pl-c1">1</span>;
} <span class="pl-k">else</span> {
ncol += <span class="pl-c1">1</span>;
}
<span class="pl-k">while</span> (!stream.<span class="pl-c1">eof</span>() && stream.<span class="pl-c1">good</span>()) {
<span class="pl-c1">std::getline</span>(stream, line);
<span class="pl-k">if</span> (line.<span class="pl-c1">empty</span>()) {
<span class="pl-k">break</span>;
}
++nrow;
}
stream.<span class="pl-c1">clear</span>();
stream.<span class="pl-c1">seekg</span>(<span class="pl-c1">0</span>, std::ios::beg);
matrix.<span class="pl-c1">resize</span>(nrow, ncol);
std::vector<<span class="pl-k">const</span> <span class="pl-k">char</span>*> seps;
arma::uword i = <span class="pl-c1">0</span>;
std::cout << <span class="pl-s"><span class="pl-pds">"</span>...<span class="pl-pds">"</span></span> << std::endl;
<span class="pl-k">while</span> (!stream.<span class="pl-c1">eof</span>() && stream.<span class="pl-c1">good</span>()) {
<span class="pl-c1">std::getline</span>(stream, line);
<span class="pl-c1">boost::trim</span>(line);
<span class="pl-k">if</span> (line.<span class="pl-c1">empty</span>()) {
<span class="pl-k">break</span>;
}
<span class="pl-k">if</span> (<span class="pl-c1">cstyle_tokenize</span>(line, delim, seps) != ncol) {
Log::Warn << <span class="pl-s"><span class="pl-pds">"</span>Error line <span class="pl-pds">"</span></span> << i << <span class="pl-s"><span class="pl-pds">"</span>: <span class="pl-pds">"</span></span> << line << std::endl;
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
<span class="pl-k">char</span>* end = <span class="pl-v">nullptr</span>;
<span class="pl-k">for</span> (arma::uword j = <span class="pl-c1">0</span>; j < ncol; ++j) {
<span class="pl-c1">assert</span>(seps[j] && *seps[j]);
<span class="pl-c1">matrix</span>(i, j) = <span class="pl-c1">std::strtod</span>(seps[j], &end);
}
<span class="pl-k">if</span> (end && *end) {
Log::Warn << <span class="pl-s"><span class="pl-pds">"</span>Error line <span class="pl-pds">"</span></span> << i << <span class="pl-s"><span class="pl-pds">"</span>: <span class="pl-pds">"</span></span> << line << std::endl;
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
++i;
<span class="pl-k">if</span> (!(i % <span class="pl-c1">1000000</span>)) {
Log::Warn << i << <span class="pl-s"><span class="pl-pds">"</span> lines parsed<span class="pl-pds">"</span></span> << std::endl;
}
}
<span class="pl-k">if</span> (i != nrow) {
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
<span class="pl-k">return</span> <span class="pl-c1">true</span>;
}</pre></div>
<p>I will try proposal in <a href="https://github.com/mlpack/mlpack/pull/681" class="issue-link js-issue-link" data-url="https://github.com/mlpack/mlpack/issues/681" data-id="158662064" data-error-text="Failed to load issue title" data-permission-text="Issue title is private">#681</a> and another dev branches, thx :)</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/issues/707#issuecomment-229242682">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe/AJ4bFFGAhNGUcwvrIVvAyZyPbdAj5d8aks5qQd1igaJpZM4I-7-3">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFPC4XY3da1S0Q9nTACA-gMwcojElks5qQd1igaJpZM4I-7-3.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/issues/707#issuecomment-229242682"></link>
<meta itemprop="name" content="View Issue"></meta>
</div>
<meta itemprop="description" content="View this Issue on GitHub"></meta>
</div>