Category Archives: Technology

Deep Learning Image Recognition and Detection on iOS Camera Using tensorflow

Classification on iOS

Just ran first ran deep learning model with the camera app example. Pretty good image recognition!!

Detection on iOS

The next level is object detection, i.e creating a bounding box around detected image.

Image classification iOS camera using deep learning

image detection using deep learning put a bounding box

image detection using deep learning put a bounding box

Naveed’s favorite Deep Learning papers

Deep learning is progressing rapidly. There is a new interesting research paper every other week. This is a list of essential deep learning research by categories.



Convolution Neural Networks (CNN)

These are the recent advances for CNN, original was Lecun-5 in the 98 paper mentioned above .

Image Detection

Finding a bounding box around different objects is harder than simply classifying an image. This a class of image localization and detection problems.

Generative Adversarial Neural Networks

One of the hottest areas of research. This is a class of algorithms where 2 neural networks collaborate to generate e.g. realistic images. One network produces fake images (faker), and the other network learns to decipher fake from real (detective). Both networks compete with each  other and try to be good at their jobs, till the faker is so good that it can generate realistic images. Fake it till you make it!

Semi Supervised Learning

Getting labeled data is expensive, while unlabeled data is abundant. Techniques to use little bit of training data and lots of unlabeled data.

Visual Question Answering / Reasoning

Research on being able to ask question on images. e.g. asking if there are there more blue balls than yellow about an image.

Neural Style

Being able to take a picture and a style image e.g. a painting, and redraw the picture in the painting style. See my blog on painting like Picaso.

Recurrent Neural Networks (RNN)


This is area of unsupervised learning. An auto encoder is a neural network that tries to recreate the original image. e.g. give it any picture and it will try to recreate the same image. Why would anyone want to do that. The neural network tries to learn a condensed representation of images given that there are commonalities. Auto encoders can be used to pre train a neural network with unlabeled data.

Visualizing High Dimensional Data

Text Recognition

Neural Programming

Neural Physics

CatGAN – Cat Faces Generative Adversarial Networks Conditional GAN Using Pytorch

Released CatGan code. This was done as last assignment for NYU Deep Learning course, taught by Yann Lecun. This is a conditional GAN, and can train it to generate 4 different types of cats i.e. white, golden, black and mix.

The following is output conditioned on golden cats. By favorite one is 3rd one from the right in the first row. Everytime the GAN is run it will generate unique cats like these. For more cats visit the github page.

Golden Cats from CatGAN

Golden Cats from CatGAN



PyTorch Deep Learning Neural Network and Chain Rule Tutorial

I have release ipython tutorial notebooks for neural network  using pytorch. Pytorch is implementation of torch in python released by Facebook. This is what is being used in the Deep Learning course that I am taking at NYU, taught by professor Yann Lecun

This uses the autograd feature that is unique to pytoch and torch (not available in tensorflow). This is pytorch version of cs231n


Deep Learning Courses Free / Paid

Have been researching what are available options for taking a deep learning course living in NY/NJ. I have already taken most of the free content cs231n, machine learning coursera, udacity. Looking into either NYU or Stanford for an official course for Winter 2017.

Free online courses

Paid courses



ICML 2016 – International Conference for Machine Learning Notes

Notes from ICML 2016 Held in New York

David Silver (Deep Mind), Yoshua Bengio (Univ of Montreal)

David Silver (Deep Mind), Yoshua Bengio (Univ of Montreal)


Attended the biggest ever machine learning conference in number of participants and papers. Red hot interest in deep learning and reinforcement learning. Great advancements in vision (Microsoft deep residual networks 1000 level deep neural networks), sound to text (Bidu Deepspeech 2.0), reinforcement learning (Deepmind A3C algorithm, a AI player learns to explore and play in  3D Lybrinth maze, folks who developed AlphaGo). Image captioning /understanding getting even more sophisticated (dense captioning work by Fei Fei and team). Language understanding is still lagging and needs breakthrough, however a couple of papers from Metamind  about question answering system on text and especially on images seemed promising.

Active areas that need more digging

  • Memory /attention,
  • Ways to teach machines with less data. Currently deep learning is data hungry, needs lots of annotated data
  • Understanding the story in an image (Dr Fei Fei work)
  • Text understanding, lags image and speech

My personal conclusion is that there is still a lot to go towards the goal of strong AI. Though AlphaGo (Deepmind system that beat Go) and DeepQ are great strides in AI, these systems only learn by intuition encoded in neural network weights backed by huge compute resources, and this learning seems to be different from the way humans learn. A true AI systems should be able to use the same architecture and apply to car driving, learning to play chess,   a new language or cook. I feel if breakthroughs are not made in a few more years, there could be another AI winter coming. Also at the same time it feels we are almost there to the quest of true AI!


  • Metamind acquired by Salesforce. Should be watching the salesforce conference announcements how they indent to use deep learning technologies.
  • NVidia and NYU partner to develop end to end neural network for autonomous cars
  • Clafiai – NY based startup for image captioning. Interesting use case for CMS and for accesibility.
  • Netflix – Patterns for machine learning. Netflix uses Time machine an interesting architecture to train models using production data.
  • Maluuba – Upcoming Canadian startup that specializes in natural langauge processing. Claimed that thier results are better than Google/Facebook.

Reading List For Papers presented

All papers presented at ICML 2016

My synthesized list to read over

Important List for Papers Referenced From Previous Conferences

People Met

  • Dr Fei-Fei Li (Stanford) after her keynote. Her work on image captioning is covered on NYTimes.  Interesting talk about deep captioning her latest work on understanding the story.
  • Yauan Lecunn (NYU) after his workshop discussion asked about meta thinking, learning to think. Also asked if he will be teaching the deep learning course at NYU next spring, which he affirmed.
  • David Silver (Google Deepmind). Excellent tutorial on deep reinforcement learning, that learnt to play arcade game just from raw pixel data, and alphago. Asked him question what are the limitations, and he told me that challenges are for robotics where decisions have to made quicker, and for rewards that are far in the future e.g. needle in the hawstack rewards.
  • Richard Socher (Metamind CEO/ Bought by Salesforce). Chat at the poster session about his paper on question answering system on text and images. Am curious to know how Salesforce intends to use deep learning. Wonder if SugarCrm is diving into machine learning.
  • Matthew Zeiler (ClarifAI CEO). Meeting at the Intrepid after party. Clarifi provides api for image analysis. Discussion on interesting use cases for news industry.
  • Justin Basilico (Machine Learning Netflix). Movie recommendations, which rows and position the movie appears in etc, all driven by machine learning. Netflix has a catalog of machine learning design patterns. Discussion about the Time Machine design pattern
  • Adam Trischler (Maluuba Researcher). Talk about question answering system. They are soon to release products Canadian startup, and claim to have better results than Facebook and Google on public datasets.
  • Howard Mansell (Facebook AI). Chat about Torch usage in Facebook. The talk was about how Torch is a deep learning tool for research.
  • James Zhang (Bloomberg Machine Learning Researcher). Discussion about how to use news in time series prediction.
  • Yan Xu (SAS). Talk about how deep learning can be used in marketing automation. SAS is working on predictive modeling.


David Silver Deep Mind

David Silver Deep Mind

Yauan Lecunn at ICML 2016 Workshop

Yauan Lecunn at ICML 2016 Workshop

Dr Fei-Fei Li Keynote

Dr Fei-Fei Li Keynote

Google Arm Robot

Google Arm Robot Researcher (the Google IO one)

Google Vision API Quick Test on Gym Schedule

While driving back with the next weeks Gym schedule, a startup idea struck me. Wouldn’t it be nice if I could just snap a picture of the gym schedule and all the classes get added to my calendar? Even though the schedule is available online, but i felt it would be convenient for some people just to add it calendar with a snap. Also been looking into Google vision api that Google released in the recently concluded GCP Next  .  Thought this would be a nice test for usability of the API. In order to use Google cloud API, one needs to sign up for the free trial (needs credit card). To get started with vision API, here is  a quick tutorial. So I tried it out. Took the snap of the gym schedule and uploaded this image to cloud platform storage bucket. Then modified the URL for the image to my uploaded image. The request type was modified to TEXT_DETECTION. The request took less then a second to complete, and returned  json response. The json response is divided in 2 sections. The first has a list of all the text it has detected. The second section has coordinates of the bounding rectangle for each word it has detected. I was specifically looking if it  detected the Bodypump class at 7:30pm for Thursday apr-21. My hope was that using the coordinates of the bodypump sections I could determine the date from the column header, and time from the row header, using some geometry/math.

  • It was not able to detect all the times ending in p.m. in the row headers.
  • It did detect the bodypump blobs even though they were in white font with black background, which i felt was smart work by the api.
  • It was not able to detect the smaller text under bodypump e.g. Studio 1,  45 mins, Nancy
  • It was able to detect some variations of the fonts. Notice it detects Fitness, but not the 24 Hour on the top of the page.

The Google vision does a nice job detecting some of the text. But the accuracy is not good enough that I can base my app idea on i.e. capture the text and use coordinates to find the time and date for the class, and automatically add to my calendar. Will have to wait till the api is either more accurate.

See schedule image, request and response below.


I was specifically interested if the response from

The following is the API request

 "requests": [
   "features": [
     "type": "TEXT_DETECTION"
   "image": {
    "source": {
     "gcsImageUri": "gs://emailclassification/24ScheduleMine.JPG"

The following is the API response

 "responses": [
   "textAnnotations": [
     "locale": "en",
     "description": "FITITESS\nHOUR\nGROUP X\nweek of april 18, 2016\nmon apr 18\ntue apr-19\nwed apr 20\nthu apr 21\nfri apr 22\nsat apr 23\nsun apr 24\n5:30am\n8:00am\n9:00am\nBODYPUMP\nBODYCOMBAT BODYPUMP BODYPUMP\nBODYCOMBAT\n0 M\n9:30am\nPop\n30 M\nBODYPUMP\nZUMBA\n10:00am\nOZVMSA\nBODYFLOW\nZUMBA\nStudio\nZUMBA\nBODY FLOW\n11:00am\nBODYPUMP\nPLYO\nGRIT STRENGTH 30M\nStudeo 2\nBODYPUMP\nchedule\n8 AM and 9 PM only. The\nmembers may attend ca\ny Aq\nd Zumba Gold\nd by S\n",
     "boundingPoly": {
      "vertices": [
        "x": 66,
        "y": 142
        "x": 1962,
        "y": 142
        "x": 1962,
        "y": 1449
        "x": 66,
        "y": 1449
     "description": "FITITESS",
     "boundingPoly": {
      "vertices": [
        "x": 66,
        "y": 921
        "x": 67,
        "y": 535
        "x": 143,
        "y": 535
        "x": 142,
        "y": 921


Full API Response Text File

Deep Learning – My Painting Picaso Style – Neural Algorithm of Artistic Style


 Naveed Portrait Deep Art


Weekend project reading through this interesting paper. The gist of the paper is that it is possible to extract the content of one image, and the style from another and produce a third with the content and style mixed. This is enabled by deep convolution neural networks. Convolution networks learn features in a hierarchy, where lower layers learn features such as line segments, and each layer above learn higher abstractions, such as nose, face or a scenery.  Example above is my own school picture plus Picaso art work produced a fascinating painting of me.

The paper is available here, A Neural Algorithm of Artistic Style

The source code for it is available here GitHub Source

The fastest way to get started on a mac or pc is to use a docker image, with precompiled dependencies. It took about 10 hours to generate the master piece. On a GPU based machine it would take 15 or so minutes.

Get Docker and start docker terminal. Set the docker virtual box memory to a higher number 8GB, stop the virtual machine and change the system settings in virtual box.

docker pull kchentw/neural-style

docker run -i -t kchentw/neural-style /bin/bash

#To exit shell without terminating container do CTRL P, Q

docker ps  # will show container id

docker cp picaso.jpg :/tmp

docker cp kid.jpg :/tmp

docker attach 

cd ~/neural-style

th neural_style.lua -gpu -1 -style_image /tmp/picaso.jpg -content_image /tmp/kid.jpg

Wait for about 10 or so hours and it will produce out.png, the picaso master piece!

Sublime Text 3 is my favorite text editor

This text editor is by far my favorite! It picks up where textmate left off. It has the ease of use of a simple text editor, and can be enhanced to mimic a full fledge IDE. It has  a great plugin support. I especially love the SublimeRepl plugin and make scripting repl programming very easy (easier than emacs). I have tried repl with python, octave, shell, works really great. Some useful plugins are

  • SublimeRepl – for repl integration to python, shell, octave, ocaml, scheme etc.
  • GoSublime – for GoLang development
  • PacakgeControl – the first plugin to install to manage all plugins
  • Git – Can do most of git commands from Sublime
  • Terminal – Open path of file in xterm

Command P or Command+Shit+P gives you most of the power to run most of your plugins, or search.


At JavaOne 2013

Went to JavaOne 2013 held in San Francisco. Was a great learning experience. However I felt that the JavaOne website to schedule conference was bit outdated. Java seems to be an aging language, though the most used. As the architect for Java said, Java is not dead yet, referencing the new feature of Java 8 i.e. closures and streams.

Some of the stuff I learnt

  • Java 8 Closures Streams
  • Programming with Java on Raspberry Pi. See my project mentioned on the Java site called Domotix.
  • MongoDb
  • Elastic Search
  • New features for Spring
  • Introduction to R

Also saw Maroon 5 at the Oracle party at Treasure Island.

With Humanoid robot Nao at the conference. Nao is the humanoid platform for robocup tournament.

Synology NAS – my home cloud

Recently I purchased a Synology NAS Server DS213. This is to archive all my digital content

  • Pictures
  • Documents,
  • Videos
  • Music

into a central repository. The NAS is 

  • Accesible from all my devices, mac, iphone, ipad, xbox. Running 24×7
  • Durable/Redundancy, it has Synology raid system with 2 x 4TB of disk space.
  • Extra redundancy with  external drives taking backup of documents and pictures. 1 weekly backup, and 1 monthly backup. The monthly drive is stored in a separate place.
  • VPN server, though i have disabled it for now. This allows to VPN into home network and for example access my local machines, or watch youtube on networks that block it 😛
  • Picture and video server.
  • Git over ssh server.
  • Time Machine Backup
  • Can SSH and program in python, or optionally install java as well on the server.
  • iPhone/iPad apps to look at pictures and videos

I have put all the images and digitized all my tape home videos and put it there. My kids now enjoy watching their childhood videos on their ipads, which were sitting in drawers for about a decade.

Future Projects of the NAS

  • Digitize all the older paper pictures, and documents and store them on the NAS.  
  • Explore the Astrix server and make it a home based VOIP server.
  • Could make it a web and wiki server but am hesitant to expose it to internet.
  • Integrate home security camera system.