Deep learning is progressing rapidly. There is a new interesting research paper every other week. This is a list of essential deep learning research by categories.
- Efficient Backprop. Paper on back propagation, the sauce of neural networks by Yann Lecun from AT&T l in 98 – http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
- Gradient Base Learning Applied Document Recognition. Paper described how to do recognition of hand written characters. Also describes LeNet-5 (the original convolution neural network) by Yann Lecun. https://pdfs.semanticscholar.org/d3f5/87797f95e1864c54b80bc98e957da6746e27.pdf
Convolution Neural Networks (CNN)
These are the recent advances for CNN, original was Lecun-5 in the 98 paper mentioned above .
- AlexNet brought back neural network revolution by winning ImageNet competition- https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- VGG Net a very deep NN – https://arxiv.org/abs/1409.1556.pdf
- GoogleNet – https://arxiv.org/abs/1409.4842
- Residual Neural Network – The deepest Neural network 152 layers – https://arxiv.org/abs/1512.03385
Finding a bounding box around different objects is harder than simply classifying an image. This a class of image localization and detection problems.
- Faster RCNN (there is also Fast RCNN and RCNN, Faster is incremental improvement over all), one of the best – https://arxiv.org/abs/1506.01497
- YOLO, supposed to be the most efficient – https://arxiv.org/abs/1612.08242
Generative Adversarial Neural Networks
One of the hottest areas of research. This is a class of algorithms where 2 neural networks collaborate to generate e.g. realistic images. One network produces fake images (faker), and the other network learns to decipher fake from real (detective). Both networks compete with each other and try to be good at their jobs, till the faker is so good that it can generate realistic images. Fake it till you make it!
- Generative Adversarial Neural Networks – https://arxiv.org/abs/1406.2661
- CycleGAN – Change doodles to real images https://arxiv.org/abs/1703.10593
- Conditional GAN – Control the output of GAN by classes https://arxiv.org/abs/1411.1784
Semi Supervised Learning
Getting labeled data is expensive, while unlabeled data is abundant. Techniques to use little bit of training data and lots of unlabeled data.
- Stacked What Where Auto encoders – https://arxiv.org/abs/1506.02351
- Ladder Networks – https://arxiv.org/abs/1507.02672
- Pseudo Labels – http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf
- Surrogate Class – http://papers.nips.cc/paper/5548-discriminative-unsupervised-feature-learning-with-convolutional-neural-networks.pdf
Visual Question Answering / Reasoning
Research on being able to ask question on images. e.g. asking if there are there more blue balls than yellow about an image.
- Inferring and Executing Programs For Visual Reasoning – https://arxiv.org/abs/1705.03633
- Relation Networks From Deep Mind, generic NN component that can be used on visual and text QA systems – https://arxiv.org/abs/1706.01427.pdf
Being able to take a picture and a style image e.g. a painting, and redraw the picture in the painting style. See my blog on painting like Picaso.
- Neural Artistic Style – https://arxiv.org/abs/1508.06576
Recurrent Neural Networks (RNN)
- LSTM – http://dl.acm.org/citation.cfm?id=1246450 Blog explaining LSTM – http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- GRU – https://arxiv.org/abs/1502.02367
This is area of unsupervised learning. An auto encoder is a neural network that tries to recreate the original image. e.g. give it any picture and it will try to recreate the same image. Why would anyone want to do that. The neural network tries to learn a condensed representation of images given that there are commonalities. Auto encoders can be used to pre train a neural network with unlabeled data.
- Lecture Notes Sparse Auto encoders – https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf
Visualizing High Dimensional Data
- Reading Text in the Wild – https://www.robots.ox.ac.uk/~vgg/publications/2016/Jaderberg16/jaderberg16.pdf
- Neural Programmer Interpreters, learn to program for simple tasks – https://arxiv.org/abs/1511.06279
- Visual Interaction Networks – Deep mind paper to learn to predict physical future of objects from a few frames – https://arxiv.org/abs/1706.01433