Quick Theano Question [SOLVED]

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
pogrmman
Posts: 337
Joined: Wed Jun 29, 2016 10:53 pm UTC
Location: Probably outside

Quick Theano Question [SOLVED]

Postby pogrmman » Sat Jun 03, 2017 8:14 pm UTC

Hi everybody,

I've been messing around with Theano a bit for machine learning, and have been loving it.
I've got a pretty basic neural network class working great -- I've done training on various examples, and done great.
For instance, I just did a stacked autoencoder on the abalone dataset, and without messing with the hyperparameters for final training, I got around 59% accuracy on a 6 class classification scheme (the site shows between 55-65% performance in a 3 class scheme, so I'm happy with this. I'm going to play around with the hyperparameters to try and get better.).

My biggest question is how do people implement minibatch gradient descent using Theano -- I've just been using plain gradient descent and training on every example during each epoch, and I'd love to speed it up. I'm not sure how to update the gradients after a minibatch is done -- it seems like I'd have to run all the examples through the net at once, and I'm not sure on how to do that without breaking things. I'm also planning on adding momentum, but that is seemingly a lot easier -- I just need a shared Theano scalar, and need to update it with everything else.

Here is the relevant section of my code. The layers are their own objects stored in self.layers, and their make_output method builds up a creates a theano function symbolic expression for the output (I was skimming my code too fast), and binds it to a method called output in the layer object.

Code: Select all

def _set_cost(self, type: "cost type"):
        """Set the _costfunc attributes
       
        Usage:
        _set_cost(type)
       
        Arguments:
        type -- the cost function type
       
        This method may throw a NotImplementedError if the kind of cost function
        has not yet been implemented.
       
        This method is used by __init__ to set the cost type appropriately.
        Not intended to be accessed publicly.
        """
        if type == "categorical crossentropy":
            self._costfunc = tensor.nnet.categorical_crossentropy
        elif type == "binary crossentropy":
            self._costfunc = tensor.nnet.binary_crossentropy
        elif type == "quadratic":
            self._costfunc = quadratic_cost
        else:
            raise NotImplementedError("The cost type " + kind +
                                       " is unimplemented.")
    def _build_forwardprop(self):
        """Compile a theano function for forwardpropagation
       
        Usage:
        _build_forwardprop()
       
        This method is used by __init__ to create the forwardprop method.
        Not intended to be accessed publicly.
        """
        # Make theano symbols for input and output
        self._inpt = tensor.fmatrix("inpt")
        self._otpt = tensor.fmatrix("otpt")
        self.layers[0].make_output(self._inpt)
        for layer in self.layers:
            if layer.id != 0:
                layer.make_output(self.layers[layer.id - 1].output)
        self._output = self.layers[-1].output
        # Compile forwardprop method
        self.forwardprop = function(inputs = [self._inpt],
                                    outputs = self._output,
                                    allow_input_downcast = True)
       
    def _build_backprop(self, rate: "float", reg_coeff: "float"):
        """Compile a theano function for backpropagation
       
        Usage:
        _build_backprop(rate, reg_coeff)
       
        Arguments:
        rate -- The learning rate for the network.
        reg_coeff -- The L2 regularization coefficient.
       
        This method is used by __init__ to create the backprop method.
        Not intended to be accessed publicly.
        """
        # L2 regularization expression
        regularize = 0
        for layer in self.layers:
            regularize += abs(layer.weights).sum() ** 2
        self.cost = (tensor.mean(self._costfunc(self._output, self._otpt)) +
                     (reg_coeff * regularize))
        self.params = []
        for layer in self.layers:
            self.params.append(layer.params[0])
            self.params.append(layer.params[1])
        self._gradients = tensor.grad(cost = self.cost, wrt = self.params)
        self._updates = []
        for grad, param in zip(self._gradients, self.params):
            self._updates.append([param, param - (rate * grad)])
        # Compile backprop method
        self.backprop = function(inputs = [self._inpt, self._otpt],
                                 outputs = self.cost,
                                 updates = self._updates,
                                 allow_input_downcast = True)
   
    def train(self, data: "list of lists",
                    epochs: "integer", ):
        """Train the neural network using SGD
       
        Usage:
        train(data, epochs)
       
        Arguments:
        data -- A list of training examples of the form
                [[data], [intended output]].
        epochs -- The number of epochs to train for.
       
        This method updates the weights and biases of the network using the
        backprop method.
        """
        for i in range(0, epochs):
            random.shuffle(data)
            for item in data:
                self.backprop([item[0]],[item[1]])
Last edited by pogrmman on Tue Jun 06, 2017 3:13 pm UTC, edited 1 time in total.

User avatar
pogrmman
Posts: 337
Joined: Wed Jun 29, 2016 10:53 pm UTC
Location: Probably outside

Re: Quick Theano Question [SOLVED]

Postby pogrmman » Tue Jun 06, 2017 2:34 pm UTC

Well, sorry to bump my own post, but I'm a complete doofus.

Adding minibatches isn't going to be hard at all -- I realized that I can just pass in a nested list of examples, and get out an array that contains the results for each example.

I don't even have to change any code!

Also, having added momentum and early stopping, I've realized that I can get really good results without having to resort to minibatch training.

With these two things, on the abalone dataset, I've gotten up to 69% classification accuracy on the test set. Considering that in this paper, their top performing classifier was 67% on this dataset, I'm very happy.

It'll be interesting to see how minibatches effect it.

EDIT: Minibatches have been added, so I'm marking this as solved. You can see my code here if you're interested.

EDIT 2: In terms of performance, I'm finding that with minibatches it is much easier to get to around 70% classification accuracy on the abalone dataset. Before, a lot of them would get stuck in a particular local minimum at around 48% classification accuracy. I've trained 5 networks in a row up to near 70% -- before, it was like 1 in 10 that would hit that performance.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 6 guests