ICML 2016 – International Conference for Machine Learning Notes

Notes from ICML 2016 Held in New York

David Silver (Deep Mind), Yoshua Bengio (Univ of Montreal)

David Silver (Deep Mind), Yoshua Bengio (Univ of Montreal)

Summary

Attended the biggest ever machine learning conference in number of participants and papers. Red hot interest in deep learning and reinforcement learning. Great advancements in vision (Microsoft deep residual networks 1000 level deep neural networks), sound to text (Bidu Deepspeech 2.0), reinforcement learning (Deepmind A3C algorithm, a AI player learns to explore and play in  3D Lybrinth maze, folks who developed AlphaGo). Image captioning /understanding getting even more sophisticated (dense captioning work by Fei Fei and team). Language understanding is still lagging and needs breakthrough, however a couple of papers from Metamind  about question answering system on text and especially on images seemed promising.

Active areas that need more digging

  • Memory /attention,
  • Ways to teach machines with less data. Currently deep learning is data hungry, needs lots of annotated data
  • Understanding the story in an image (Dr Fei Fei work)
  • Text understanding, lags image and speech

My personal conclusion is that there is still a lot to go towards the goal of strong AI. Though AlphaGo (Deepmind system that beat Go) and DeepQ are great strides in AI, these systems only learn by intuition encoded in neural network weights backed by huge compute resources, and this learning seems to be different from the way humans learn. A true AI systems should be able to use the same architecture and apply to car driving, learning to play chess,   a new language or cook. I feel if breakthroughs are not made in a few more years, there could be another AI winter coming. Also at the same time it feels we are almost there to the quest of true AI!

Industry

  • Metamind acquired by Salesforce. Should be watching the salesforce conference announcements how they indent to use deep learning technologies.
  • NVidia and NYU partner to develop end to end neural network for autonomous cars
  • Clafiai – NY based startup for image captioning. Interesting use case for CMS and for accesibility.
  • Netflix – Patterns for machine learning. Netflix uses Time machine an interesting architecture to train models using production data.
  • Maluuba – Upcoming Canadian startup that specializes in natural langauge processing. Claimed that thier results are better than Google/Facebook.

Reading List For Papers presented

All papers presented at ICML 2016

My synthesized list to read over

Important List for Papers Referenced From Previous Conferences

People Met

  • Dr Fei-Fei Li (Stanford) after her keynote. Her work on image captioning is covered on NYTimes.  Interesting talk about deep captioning her latest work on understanding the story.
  • Yauan Lecunn (NYU) after his workshop discussion asked about meta thinking, learning to think. Also asked if he will be teaching the deep learning course at NYU next spring, which he affirmed.
  • David Silver (Google Deepmind). Excellent tutorial on deep reinforcement learning, that learnt to play arcade game just from raw pixel data, and alphago. Asked him question what are the limitations, and he told me that challenges are for robotics where decisions have to made quicker, and for rewards that are far in the future e.g. needle in the hawstack rewards.
  • Richard Socher (Metamind CEO/ Bought by Salesforce). Chat at the poster session about his paper on question answering system on text and images. Am curious to know how Salesforce intends to use deep learning. Wonder if SugarCrm is diving into machine learning.
  • Matthew Zeiler (ClarifAI CEO). Meeting at the Intrepid after party. Clarifi provides api for image analysis. Discussion on interesting use cases for news industry.
  • Justin Basilico (Machine Learning Netflix). Movie recommendations, which rows and position the movie appears in etc, all driven by machine learning. Netflix has a catalog of machine learning design patterns. Discussion about the Time Machine design pattern
  • Adam Trischler (Maluuba Researcher). Talk about question answering system. They are soon to release products Canadian startup, and claim to have better results than Facebook and Google on public datasets.
  • Howard Mansell (Facebook AI). Chat about Torch usage in Facebook. The talk was about how Torch is a deep learning tool for research.
  • James Zhang (Bloomberg Machine Learning Researcher). Discussion about how to use news in time series prediction.
  • Yan Xu (SAS). Talk about how deep learning can be used in marketing automation. SAS is working on predictive modeling.

Pictures

David Silver Deep Mind

David Silver Deep Mind

Yauan Lecunn at ICML 2016 Workshop

Yauan Lecunn at ICML 2016 Workshop

Dr Fei-Fei Li Keynote

Dr Fei-Fei Li Keynote

Google Arm Robot

Google Arm Robot Researcher (the Google IO one)

Google Vision API Quick Test on Gym Schedule

While driving back with the next weeks Gym schedule, a startup idea struck me. Wouldn’t it be nice if I could just snap a picture of the gym schedule and all the classes get added to my calendar? Even though the schedule is available online, but i felt it would be convenient for some people just to add it calendar with a snap. Also been looking into Google vision api that Google released in the recently concluded GCP Next  .  Thought this would be a nice test for usability of the API. In order to use Google cloud API, one needs to sign up for the free trial (needs credit card). To get started with vision API, here is  a quick tutorial. So I tried it out. Took the snap of the gym schedule and uploaded this image to cloud platform storage bucket. Then modified the URL for the image to my uploaded image. The request type was modified to TEXT_DETECTION. The request took less then a second to complete, and returned  json response. The json response is divided in 2 sections. The first has a list of all the text it has detected. The second section has coordinates of the bounding rectangle for each word it has detected. I was specifically looking if it  detected the Bodypump class at 7:30pm for Thursday apr-21. My hope was that using the coordinates of the bodypump sections I could determine the date from the column header, and time from the row header, using some geometry/math.

  • It was not able to detect all the times ending in p.m. in the row headers.
  • It did detect the bodypump blobs even though they were in white font with black background, which i felt was smart work by the api.
  • It was not able to detect the smaller text under bodypump e.g. Studio 1,  45 mins, Nancy
  • It was able to detect some variations of the fonts. Notice it detects Fitness, but not the 24 Hour on the top of the page.

The Google vision does a nice job detecting some of the text. But the accuracy is not good enough that I can base my app idea on i.e. capture the text and use coordinates to find the time and date for the class, and automatically add to my calendar. Will have to wait till the api is either more accurate.

See schedule image, request and response below.

24ScheduleMine

I was specifically interested if the response from

The following is the API request

POST https://vision.googleapis.com/v1/images:annotate?key={YOUR_API_KEY}
{
 "requests": [
  {
   "features": [
    {
     "type": "TEXT_DETECTION"
    }
   ],
   "image": {
    "source": {
     "gcsImageUri": "gs://emailclassification/24ScheduleMine.JPG"
    }
   }
  }
 ]
}

The following is the API response

{
 "responses": [
  {
   "textAnnotations": [
    {
     "locale": "en",
     "description": "FITITESS\nHOUR\nGROUP X\nweek of april 18, 2016\nmon apr 18\ntue apr-19\nwed apr 20\nthu apr 21\nfri apr 22\nsat apr 23\nsun apr 24\n5:30am\n8:00am\n9:00am\nBODYPUMP\nBODYCOMBAT BODYPUMP BODYPUMP\nBODYCOMBAT\n0 M\n9:30am\nPop\n30 M\nBODYPUMP\nZUMBA\n10:00am\nOZVMSA\nBODYFLOW\nZUMBA\nStudio\nZUMBA\nBODY FLOW\n11:00am\nBODYPUMP\nPLYO\nGRIT STRENGTH 30M\nStudeo 2\nBODYPUMP\nchedule\n8 AM and 9 PM only. The\nmembers may attend ca\ny Aq\nd Zumba Gold\nd by S\n",
     "boundingPoly": {
      "vertices": [
       {
        "x": 66,
        "y": 142
       },
       {
        "x": 1962,
        "y": 142
       },
       {
        "x": 1962,
        "y": 1449
       },
       {
        "x": 66,
        "y": 1449
       }
      ]
     }
    },
    {
     "description": "FITITESS",
     "boundingPoly": {
      "vertices": [
       {
        "x": 66,
        "y": 921
       },
       {
        "x": 67,
        "y": 535
       },
       {
        "x": 143,
        "y": 535
       },
       {
        "x": 142,
        "y": 921
       }
      ]
     }
    },

 ]
}

Full API Response Text File

Deep Learning – My Painting Picaso Style – Neural Algorithm of Artistic Style

 

 Naveed Portrait Deep Art

appendsigns

Weekend project reading through this interesting paper. The gist of the paper is that it is possible to extract the content of one image, and the style from another and produce a third with the content and style mixed. This is enabled by deep convolution neural networks. Convolution networks learn features in a hierarchy, where lower layers learn features such as line segments, and each layer above learn higher abstractions, such as nose, face or a scenery.  Example above is my own school picture plus Picaso art work produced a fascinating painting of me.

The paper is available here, A Neural Algorithm of Artistic Style

The source code for it is available here GitHub Source

The fastest way to get started on a mac or pc is to use a docker image, with precompiled dependencies. It took about 10 hours to generate the master piece. On a GPU based machine it would take 15 or so minutes.

Get Docker and start docker terminal. Set the docker virtual box memory to a higher number 8GB, stop the virtual machine and change the system settings in virtual box.

docker pull kchentw/neural-style

docker run -i -t kchentw/neural-style /bin/bash

#To exit shell without terminating container do CTRL P, Q

docker ps  # will show container id

docker cp picaso.jpg :/tmp

docker cp kid.jpg :/tmp

docker attach 

cd ~/neural-style

th neural_style.lua -gpu -1 -style_image /tmp/picaso.jpg -content_image /tmp/kid.jpg

Wait for about 10 or so hours and it will produce out.png, the picaso master piece!

Sublime Text 3 is my favorite text editor

http://www.sublimetext.com/

This text editor is by far my favorite! It picks up where textmate left off. It has the ease of use of a simple text editor, and can be enhanced to mimic a full fledge IDE. It has  a great plugin support. I especially love the SublimeRepl plugin and make scripting repl programming very easy (easier than emacs). I have tried repl with python, octave, shell, works really great. Some useful plugins are

  • SublimeRepl – for repl integration to python, shell, octave, ocaml, scheme etc.
  • GoSublime – for GoLang development
  • PacakgeControl – the first plugin to install to manage all plugins
  • Git – Can do most of git commands from Sublime
  • Terminal – Open path of file in xterm

Command P or Command+Shit+P gives you most of the power to run most of your plugins, or search.

 

Lahore Winters

In memory of my favorite pass time and hobby when I was a kid and teenager. Basant was one festival I would wait all year long, marking days on my calendar. I recently discovered that Basant is celebrated in Brooklyn and I went to celebrate it. Basant is the festival to celebrate end of winters and beginning of spring.

IMG_2420.JPG

At JavaOne 2013

Went to JavaOne 2013 held in San Francisco. Was a great learning experience. However I felt that the JavaOne website to schedule conference was bit outdated. Java seems to be an aging language, though the most used. As the architect for Java said, Java is not dead yet, referencing the new feature of Java 8 i.e. closures and streams.

Some of the stuff I learnt

  • Java 8 Closures Streams
  • Programming with Java on Raspberry Pi. See my project mentioned on the Java site called Domotix.
  • MongoDb
  • Elastic Search
  • New features for Spring
  • Introduction to R

Also saw Maroon 5 at the Oracle party at Treasure Island.

With Humanoid robot Nao at the conference. Nao is the humanoid platform for robocup tournament.
IMG_0936