[mlpack-svn] [MLPACK] #300: allknn fails for mnist8m dataset

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Wed Aug 14 23:29:17 EDT 2013


#300: allknn fails for mnist8m dataset
----------------------+-----------------------------------------------------
  Reporter:  rozyang  |        Owner:  rcurtin 
      Type:  defect   |       Status:  accepted
  Priority:  major    |    Milestone:          
 Component:  mlpack   |   Resolution:          
  Keywords:           |     Blocking:          
Blocked By:           |  
----------------------+-----------------------------------------------------

Comment (by rozyang):

 The idx file format is given in the MNIST website

 http://yann.lecun.com/exdb/mnist/

 I used Matlab for the conversion:

 X = readidx_uint8(train8m-images-idx3-ubyte',8100000, 8100000);
 csvwrite('mnist8m.csv', X);

 The first function is given below. The second function is a Matlab built-
 in. I have checked the variable 'X' as well as the content of mnist8m.csv.
 The rows are indeed valid handwritten digit images.

 function [arr1,arr2] = readidx_uint8(FILENAME,t1,t2)

 fid= fopen(FILENAME,'r','b');
 magic = fread(fid, 1, 'int32');
 if magic==2051
    num = fread(fid, 1, 'int32');
     ndim(1) = fread(fid, 1, 'int32');
    ndim(2) = fread(fid, 1, 'int32');
     a = ndim(1)*ndim(2);
 elseif magic==2049
     num = fread(fid, 1, 'int32');
     a=1;
 else
     disp('unknown magic number');
 end
 arr1 = uint8(zeros(a,t1));
 arr2 = uint8(zeros(a,t2-t1));
 for i=1:t1
     arr1(:,i) = fread(fid, a, 'uint8');
 end
 for i=t1+1:t2
     arr2(:,i-t1) = fread(fid, a, 'uint8');
 end

 fclose(fid);

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/300#comment:2>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.


More information about the mlpack-svn mailing list