I while ago I posted a comment here on HN which got number of upvotes but no answer. Could you maybe take stab at it?
I've followed the developments in Neural Networks somewhat, but have never applied deep learning so far. This is seems like a good place to ask a couple of question I've been having for a while.
1. When does it make sense to apply deep learning? Could it potentially be applied successful applied to any difficult problem given enough data? Could it also be good at the type of problems that Random Forest, Gradient Boosting Machines are traditionally good at versus the problems that SVMs are traditionally good at (Computer Vision, NLP)? [1]
2. How much data is enough?
3. What degree of tuning is required to make it work? Are we at the point yet where deep learning works more or less out the box?
4. Is it fair to say that dropout and maxout always work better in practice? [2]
5. What is the computational effort? How long e.g. does it take to classify an ImageNet image (on a CPU / GPU)? How long does it take train a model like that?
6. How on earth does this fit into memory? Say in ImageNet your have (256 pixels * 256 pixels) * (10,000 classes) * 4 bytes = 2.4 GB, for a NN without any hidden layers.
[1] I am overgeneralizing somewhat, I know. It's my way to avoid overfitting.
Sure, I'll give it a shot -- feel free to email me if you have further questions, email is in my profile.
1. I think it makes sense to try them for any classification, regression, or feature extraction problem. They don't work all the time, sometimes you really don't need the extra depth--one hidden layer can be fine, and they can be pretty slow to train (even with GPU). I've also seen people try to build their own, implement it wrong, get bad results, then complain NNs don't work. So test for yourself, just make sure you're not doing it wrong.
2. It really depends. More is almost always better.
3. Training a bunch of models using Bayesian optimization to optimize the model hyperparameters (so you don't have to pick them) and putting the last few in an ensemble and averaging results is pretty close to out of the box. This is the workflow we use with ersatz.
4. Despite lunches not being free, you should probably use dropout. It's ridiculously good at preventing over fitting but can take longer to train (although there's been some work w/ "fast dropout" to speed it up)
5. GPU gets you ~40x speed up over CPU. So if you're using CPU and I'm using GPU, I can do in 1 day what would take you a month and a half. And then I might train for a week or more on GPU (I think the imagenet models were trained for a week or two, but not sure how many GPUs used). Otherwise, computational effort varies.
6. You use mini batches, so you load on as many samples as fit in GPU memory (with the model params) and then pull those into smaller batches. You rotate the "large batch" periodically. Neural networks can continue taking in new data and updating their model (online learning) and are particularly attractive for very large data sets.
General points: use GPU, don't build your own unless as an academic exercise, use dropout, test empirically on your own data. And check out Bayesian optimization of hyperparameters, I'm becoming more and more convinced it's better at picking them than human experts anyway.
I've followed the developments in Neural Networks somewhat, but have never applied deep learning so far. This is seems like a good place to ask a couple of question I've been having for a while.
1. When does it make sense to apply deep learning? Could it potentially be applied successful applied to any difficult problem given enough data? Could it also be good at the type of problems that Random Forest, Gradient Boosting Machines are traditionally good at versus the problems that SVMs are traditionally good at (Computer Vision, NLP)? [1]
2. How much data is enough?
3. What degree of tuning is required to make it work? Are we at the point yet where deep learning works more or less out the box?
4. Is it fair to say that dropout and maxout always work better in practice? [2]
5. What is the computational effort? How long e.g. does it take to classify an ImageNet image (on a CPU / GPU)? How long does it take train a model like that?
6. How on earth does this fit into memory? Say in ImageNet your have (256 pixels * 256 pixels) * (10,000 classes) * 4 bytes = 2.4 GB, for a NN without any hidden layers.
[1] I am overgeneralizing somewhat, I know. It's my way to avoid overfitting.
[2] My lunch today was free.