My Podcast interview about machine learning, my talk at Google Next by Sam Charrington
Gave a talk at Google Next 2018 about machine learning.
Talk includes usage of the following for applied machine learning in context of media and news publishing.
- Google NLP
- Google Sound
Gave a talk about BQML along with Abhishek product manager of AI at Google. Democratization of AI via BigQuery.
Google launches Google News initiative to promote quality journalism via technology
Case study, featuring my work using Google Cloud, Machine Learning and BigQuery
Have been reading research work for recommendation engine, specifically that can be used to do better news/blog recommendations.
Links on work in this area including open source code.
Fundamental Building Blocks
- Convert a document or paragraph into a vector representation Doc2Vec https://arxiv.org/pdf/1405.4053.pdf
- Using lstm/gru to represent sentences, works better than Doc2Vec for information retrieval tasks. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval https://arxiv.org/pdf/1502.06922.pdf
- Survey of Deep Recommendation Engines. Good starting point https://arxiv.org/pdf/1707.07435.pdf
- Google Deep and Wide https://arxiv.org/pdf/1606.07792.pdf
- Multitask Recommender System Using GRU https://arxiv.org/pdf/1609.02116.pdf
- DeepFM, no need for feature engineering as in Google Deep and Wide https://arxiv.org/abs/1703.04247
- Multi-Rate Deep Learning for Temporal Recommendation. Using multiple time scales and user features trains using DSSM (Deep Semantic Structured Model). http://sonyis.me/paperpdf/spr209-song_sigir16.pdf
- YouTube Recommendation System https://pdfs.semanticscholar.org/bcdb/4da4a05f0e7bc17d1600f3a91a338cd7ffd3.pdf
- Session Based Recommendation System https://arxiv.org/pdf/1511.06939.pdf . Only uses sequence of content, and not the content itself.
- Subreddit recommendation. RNN based, does not use content. https://cole-maclean.github.io/blog/RNN-Based-Subreddit-Recommender-System
Classification on iOS
Just ran first ran deep learning model with the camera app example. Pretty good image recognition!!
Detection on iOS
The next level is object detection, i.e creating a bounding box around detected image.
Deep learning is progressing rapidly. There is a new interesting research paper every other week. This is a list of essential deep learning research by categories.
- Efficient Backprop. Paper on back propagation, the sauce of neural networks by Yann Lecun from AT&T l in 98 – http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
- Gradient Base Learning Applied Document Recognition. Paper described how to do recognition of hand written characters. Also describes LeNet-5 (the original convolution neural network) by Yann Lecun. https://pdfs.semanticscholar.org/d3f5/87797f95e1864c54b80bc98e957da6746e27.pdf
Convolution Neural Networks (CNN)
These are the recent advances for CNN, original was Lecun-5 in the 98 paper mentioned above .
- AlexNet brought back neural network revolution by winning ImageNet competition- https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- VGG Net a very deep NN – https://arxiv.org/abs/1409.1556.pdf
- GoogleNet – https://arxiv.org/abs/1409.4842
- Residual Neural Network – The deepest Neural network 152 layers – https://arxiv.org/abs/1512.03385
Finding a bounding box around different objects is harder than simply classifying an image. This a class of image localization and detection problems.
- Faster RCNN (there is also Fast RCNN and RCNN, Faster is incremental improvement over all), one of the best – https://arxiv.org/abs/1506.01497
- YOLO, supposed to be the most efficient – https://arxiv.org/abs/1612.08242
Generative Adversarial Neural Networks
One of the hottest areas of research. This is a class of algorithms where 2 neural networks collaborate to generate e.g. realistic images. One network produces fake images (faker), and the other network learns to decipher fake from real (detective). Both networks compete with each other and try to be good at their jobs, till the faker is so good that it can generate realistic images. Fake it till you make it!
- Generative Adversarial Neural Networks – https://arxiv.org/abs/1406.2661
- CycleGAN – Change doodles to real images https://arxiv.org/abs/1703.10593
- Conditional GAN – Control the output of GAN by classes https://arxiv.org/abs/1411.1784
Semi Supervised Learning
Getting labeled data is expensive, while unlabeled data is abundant. Techniques to use little bit of training data and lots of unlabeled data.
- Stacked What Where Auto encoders – https://arxiv.org/abs/1506.02351
- Ladder Networks – https://arxiv.org/abs/1507.02672
- Pseudo Labels – http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf
- Surrogate Class – http://papers.nips.cc/paper/5548-discriminative-unsupervised-feature-learning-with-convolutional-neural-networks.pdf
Visual Question Answering / Reasoning
Research on being able to ask question on images. e.g. asking if there are there more blue balls than yellow about an image.
- Inferring and Executing Programs For Visual Reasoning – https://arxiv.org/abs/1705.03633
- Relation Networks From Deep Mind, generic NN component that can be used on visual and text QA systems – https://arxiv.org/abs/1706.01427.pdf
Being able to take a picture and a style image e.g. a painting, and redraw the picture in the painting style. See my blog on painting like Picaso.
- Neural Artistic Style – https://arxiv.org/abs/1508.06576
Recurrent Neural Networks (RNN)
- LSTM – http://dl.acm.org/citation.cfm?id=1246450 Blog explaining LSTM – http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- GRU – https://arxiv.org/abs/1502.02367
This is area of unsupervised learning. An auto encoder is a neural network that tries to recreate the original image. e.g. give it any picture and it will try to recreate the same image. Why would anyone want to do that. The neural network tries to learn a condensed representation of images given that there are commonalities. Auto encoders can be used to pre train a neural network with unlabeled data.
- Lecture Notes Sparse Auto encoders – https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf
Visualizing High Dimensional Data
- Reading Text in the Wild – https://www.robots.ox.ac.uk/~vgg/publications/2016/Jaderberg16/jaderberg16.pdf
- Neural Programmer Interpreters, learn to program for simple tasks – https://arxiv.org/abs/1511.06279
- Visual Interaction Networks – Deep mind paper to learn to predict physical future of objects from a few frames – https://arxiv.org/abs/1706.01433
Released CatGan code. This was done as last assignment for NYU Deep Learning course, taught by Yann Lecun. This is a conditional GAN, and can train it to generate 4 different types of cats i.e. white, golden, black and mix.
The following is output conditioned on golden cats. By favorite one is 3rd one from the right in the first row. Everytime the GAN is run it will generate unique cats like these. For more cats visit the github page.
I have release ipython tutorial notebooks for neural network using pytorch. Pytorch is implementation of torch in python released by Facebook. This is what is being used in the Deep Learning course that I am taking at NYU, taught by professor Yann Lecun
This uses the autograd feature that is unique to pytoch and torch (not available in tensorflow). This is pytorch version of cs231n http://cs231n.github.io/neural-networks-case-study/
Have been researching what are available options for taking a deep learning course living in NY/NJ. I have already taken most of the free content cs231n, machine learning coursera, udacity. Looking into either NYU or Stanford for an official course for Winter 2017.
Free online courses
- Deep Learning Convolution Neural Networks, Stanford, http://cs231n.stanford.edu/
- Deep Learning Natural Language Processing, Stanford http://cs224d.stanford.edu/syllabus.html
- Deep Learning Udacity, https://www.udacity.com/course/deep-learning–ud730
- Deep Learning, NYU, http://cilvr.cs.nyu.edu/doku.php?id=deeplearning:slides:start
- Deep Learning Natural Language Processing, Stanford Online, http://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=11754
- Self Driving Car Engineer, Udacity, https://www.udacity.com/drive
- Introduction to Deep Learning, Princeton, https://www.cs.princeton.edu/courses/archive/spring16/cos495/
Notes from ICML 2016 Held in New York
Attended the biggest ever machine learning conference in number of participants and papers. Red hot interest in deep learning and reinforcement learning. Great advancements in vision (Microsoft deep residual networks 1000 level deep neural networks), sound to text (Bidu Deepspeech 2.0), reinforcement learning (Deepmind A3C algorithm, a AI player learns to explore and play in 3D Lybrinth maze, folks who developed AlphaGo). Image captioning /understanding getting even more sophisticated (dense captioning work by Fei Fei and team). Language understanding is still lagging and needs breakthrough, however a couple of papers from Metamind about question answering system on text and especially on images seemed promising.
Active areas that need more digging
- Memory /attention,
- Ways to teach machines with less data. Currently deep learning is data hungry, needs lots of annotated data
- Understanding the story in an image (Dr Fei Fei work)
- Text understanding, lags image and speech
My personal conclusion is that there is still a lot to go towards the goal of strong AI. Though AlphaGo (Deepmind system that beat Go) and DeepQ are great strides in AI, these systems only learn by intuition encoded in neural network weights backed by huge compute resources, and this learning seems to be different from the way humans learn. A true AI systems should be able to use the same architecture and apply to car driving, learning to play chess, a new language or cook. I feel if breakthroughs are not made in a few more years, there could be another AI winter coming. Also at the same time it feels we are almost there to the quest of true AI!
- Metamind acquired by Salesforce. Should be watching the salesforce conference announcements how they indent to use deep learning technologies.
- NVidia and NYU partner to develop end to end neural network for autonomous cars
- Clafiai – NY based startup for image captioning. Interesting use case for CMS and for accesibility.
- Netflix – Patterns for machine learning. Netflix uses Time machine an interesting architecture to train models using production data.
- Maluuba – Upcoming Canadian startup that specializes in natural langauge processing. Claimed that thier results are better than Google/Facebook.
Reading List For Papers presented
My synthesized list to read over
- Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
- Pixel Recurrent Neural Networks (Best paper award)
- Dueling Network Architectures for Deep Reinforcement Learning (Best paper award)
- Control of Memory, Active Perception, and Action in Minecraft
- Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (Metamind Question Answering)
- Dynamic Memory Networks for Visual and Textual Question Answering (Metamind on question answering on images e.g. ask question what sports is he playing on an image!)
- Asynchronous Methods for Deep Reinforcement Learning
- Learning Simple Algorithms from Examples
Important List for Papers Referenced From Previous Conferences
- Teaching Machines to Read and Comprehend.
- Neural Turing Machines.
- End-To-End Memory Networks
- Playing Atari with Deep Reinforcement Learning
- Dr Fei-Fei Li (Stanford) after her keynote. Her work on image captioning is covered on NYTimes. Interesting talk about deep captioning her latest work on understanding the story.
- Yauan Lecunn (NYU) after his workshop discussion asked about meta thinking, learning to think. Also asked if he will be teaching the deep learning course at NYU next spring, which he affirmed.
- David Silver (Google Deepmind). Excellent tutorial on deep reinforcement learning, that learnt to play arcade game just from raw pixel data, and alphago. Asked him question what are the limitations, and he told me that challenges are for robotics where decisions have to made quicker, and for rewards that are far in the future e.g. needle in the hawstack rewards.
- Richard Socher (Metamind CEO/ Bought by Salesforce). Chat at the poster session about his paper on question answering system on text and images. Am curious to know how Salesforce intends to use deep learning. Wonder if SugarCrm is diving into machine learning.
- Matthew Zeiler (ClarifAI CEO). Meeting at the Intrepid after party. Clarifi provides api for image analysis. Discussion on interesting use cases for news industry.
- Justin Basilico (Machine Learning Netflix). Movie recommendations, which rows and position the movie appears in etc, all driven by machine learning. Netflix has a catalog of machine learning design patterns. Discussion about the Time Machine design pattern
- Adam Trischler (Maluuba Researcher). Talk about question answering system. They are soon to release products Canadian startup, and claim to have better results than Facebook and Google on public datasets.
- Howard Mansell (Facebook AI). Chat about Torch usage in Facebook. The talk was about how Torch is a deep learning tool for research.
- James Zhang (Bloomberg Machine Learning Researcher). Discussion about how to use news in time series prediction.
- Yan Xu (SAS). Talk about how deep learning can be used in marketing automation. SAS is working on predictive modeling.