[mlpack-git] [mlpack] [Proposal]Develop a scalable Finetune class to fine tune the paramters of deep network (#458)

stereomatchingkiss notifications at github.com
Tue Oct 6 04:53:30 EDT 2015

>I think that std::vector<arma::mat*> is a little bit awkward; could you possibly instead take the >instantiated network as the input and extract/modify the parameters from there?

There are some problems here
1 : Neither softmax nor stackautoencoder can access the input data, but when you are finetuning the parameters, the input need to update frequently. Pass by pointer is a less intrusive solution I can think of by now(not reference because reference cannot take the reference of reference). We have to let the users access the input data if we want to pass in the instantiated network.

2 : The class softmaxRegression cannot  access parameters. However, about the parameters, pass by std::vector<arma::mat const*> should be better, because I do not know when should I update the parameter, this should be done by another api after the whole training process finished.

3 : Pass by the instantiated network would make the implementation details a little bit complex(need to deal with compile time for loop, that means, we need TMP at here).

ex : 
    auto networkTuple = std::forward_as_tuple(sae1, sae2);
    > finetune(networkTuple, softmax);

>What does the function Deriv(arma::mat const&, arma::mat&) do? How is that different from Gradient()?

The details is listed by UFLDL(http://deeplearning.stanford.edu/wiki/index.php/Fine-tuning_Stacked_AEs)

Gradient() is the derivative related to the last layer of the finetune network(J of number 2)
Deriv() will calculate the derivate of back propagation of neural network(f'(z) of number 3)

Example : 

    class SoftmaxFineTune
        template<typename T>
        static void Gradient(arma::mat const &input,
                                       arma::mat const &weights,
                                       T const &model,
                                       arma::mat &gradient)
            gradient = ((weights.t() * model.Probabilities()) %
                             (input % (1 - input))) / input.n_cols;

        static void Deriv(arma::mat const &input, arma::mat &output)
            output = input % (1 - input);

If the last layer always should be softmax, then we could remove this template parameter.

>So this way, the EvaluateWithGradient() function would be optional, but when supplied it could 
>accelerate the computation. What do you think of this idea?

This could save one calculation, but you still need to recalculate the probabilities if you do not cache it when finetune(if you are using softmaxRegression as last layer), one calculation is done by Evaluate, one is Gradient, another one is finetune(when finding the last derivative term). Could we just let the SoftmaxRegression cache the probabilities and let the users access it(read only)?

      //! Gets the probabilities.
      const arma::mat& Probabilities() const { return probabilities; }
      //! Probability matrix
      arma::mat probabilities;

When you call the Evaluate function, rather than create a temporary variable, use the data member

When you call the Gradient function, do not recalculate, just reuse the calculated probabilities, this probabilities only depend on the training data and the weights(parameters), and the weights(parameters) would not update before you call the Gradient function, so it should be safe to cache it(unless there are some optimization algorithms would not call Evaluate before calling Gradient) I try and run the unit test, it works fine.

>I don't know if there are already any plans for fine tuning.

I hope there are, this is a quite useful tool.

Thanks for your helps:).

Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20151006/0ff2d071/attachment.html>

More information about the mlpack-git mailing list