[mlpack] Google Summer of Code project proposal

Ryan Curtin ryan at ratml.org
Wed Mar 23 15:01:26 EDT 2016


On Tue, Mar 22, 2016 at 10:49:05PM +0000, Leinoff, Alexander wrote:
> Hi!
> If you have time, I’d like to get your feedback on a project idea and
> proposal for the google summer of code. My name is Alexander Leinoff (
> I made a couple of commits yesterday ), and I’m in my final year as an
> undergraduate studying Computer Engineering at The University of Iowa.
> I’d like to submit a project proposal that’s a little off of the
> books, although I think it will end up covering the “More Diverse
> Build Slaves” and possibly the “Profiling for further optimization”
> Summer of Code Ideas. My main goal will be to set up a robust and
> constant cross-platform testing framework, with an online dashboard
> presentation using cmake, ctest, and cdash, such as the one used by
> ITK: https://open.cdash.org/index.php?project=Insight . I’d also like
> to implement a SuperBuild environment with cmake to automatically
> build project dependencies. I think your project could really use a
> more understandable and easy to use (and visualize) testing process,
> and I’d like to help you implement it! Let me know if this is
> something I should be pursuing for the Summer of Code project, or if I
> should be focusing more on the listed ideas as they are stated. I’ve
> shared a draft of my proposal via the Google Summer of Code submission
> page, it still needs some work, but please check it out and let me
> know what you think. Any feedback is greatly appreciated!

Hi Alex,

Thanks for your contributions over the past couple of days.

I'm perfectly fine with proposals for projects that are not what's
written on the Ideas page.

The dashboard for ITK looks pretty nice.  Currently we use Jenkins, set
up at http://big.mlpack.org:7780/ (it is on a slow connection,
unfortunately).  It used to be that we had more systems set up, but
since I finished my Ph.D. I no longer have the resources to set all
those systems up:

http://ratml.org/misc_img/build_farm_new.jpg

Now I am at Symantec, and they can support mlpack, but with different
hardware.  Let me describe what I have access to:

 * "masterblaster", 2x Xeon E5-2699v3 (72 cores) with 256GB RAM and
   3-4TB storage
 * "big", "samedi", "cabbie", "dambala", and "shoeshine": HP i5 desktops
   with 8-16GB of RAM; older desktops from ~2010
 * 3 unnamed Sun SPARC Enterprise T5220s, each with 64 cores and I think
   128GB of RAM?  They've never been powered on, so I need to set them
   up.  I found them in a closet; they were going to be thrown away.

I should be able to get all of these set up in a way that they are
externally accessible by the beginning of the summer.  (Right now, some
of these can only be accessed internally.)

The five desktops are part of the old build farm.  Two of them, cabbie
and shoeshine, are used for benchmarking using the benchmarking system
that Marcus built in GSoC 2013 and Anand improved in GSoC 2014:

https://github.com/zoq/benchmarks
http://www.mlpack.org/benchmarks.html

One of my interest has always been to have Jenkins build mlpack against
all versions of its dependencies and run the tests, to try and find
subtle bugs.  I don't know how well Jenkins will play with CTest and
CDash, which are products I've never used.  The ITK dashboard you linked
to looks nice; Jenkins can give similar output.  (I wouldn't be
surprised if both can be used in tandem.)

So we should definitely work out the details, but we can also change
some things around after the proposal deadline if necessary.  I'll take
a look over your proposal when I have a chance (next day or two?), but I
will be looking for how we can work out the following things:

 * how can we utilize the hardware that we already have?

 * can we automate the benchmarking process better, and integrate the
   benchmarking system Marcus built well?

 * what changes will need to be made to the mlpack codebase to support
   your project?

 * how can we present the information gathered by the automatic build
   system in a concise and manageable way?

Anyway, it may not be possible to answer all these questions, but we
should at least try.  I think you are absolutely correct that mlpack
could use a better CI infrastructure, so I am excited to see what you
can put together for a proposal.

Thanks for getting in touch!  Let me know if I can clarify anything.

Ryan

-- 
Ryan Curtin    | "The enemy cannot press a button... if you have
ryan at ratml.org | disabled his hand." - Sgt. Zim


More information about the mlpack mailing list