That's what I thought too! But according to my friends on the Google Brain team,...

mbeissinger · on Nov 14, 2013

Yes this is really interesting. I haven't read those other papers yet (definitely plan on it now thanks for the links), but Bengio's latest paper on denoising autoencoders from earlier this year (http://arxiv.org/abs/1305.6663) still used the unsupervised pretraining. Also the Theano implementation that I run experiments with uses it as well (but that code could be a year or two old).

Definitely going to be researching this more throughout the year.

Dn_Ab · on Nov 14, 2013

Very interesting I was not aware unsupervised pretraining was a distant second to availability of data and Flops. So really, deep learning is essentially the same old MLP of recent peasant like status (90's). Stacks of backpropagating perceptrons with the ancient logistic regression on top - now with more stacking! This makes sense.

Machine learning is really just a form of non-human scripting. After all, every ML system running on a PC is either Turing equivalent or less. An analogy would be something that tries to generate the minimal set of regular expressions (that match non deterministically) which cover given examples. The advantage of an ML model vs a collection of regexes is many interesting problems are vulnerable to calculus (optimize) or counting (probability, integration etc.)

So like good notation, the stacking allows more complicated things to be said more compactly. But more complicated things need more explanation and more thinking to understand.

jules · on Nov 14, 2013

> And around ~2012, a bunch of researchers have reported you don't even need 2nd-derivative information. You just have to initialize the neural net properly.

This sounds very interesting. How do you property initialize the weights? Do you have a link to a paper about this?

kkjkok · on Nov 15, 2013

Check out this paper:

Practical recommendations for gradient-based training of deep architectures, Y. Bengio

http://arxiv.org/abs/1206.5533

There is a section on weight initialization on page 15. In general, this paper has a lot of good information in one place.