<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: habibc</title>
    <description>The latest articles on Forem by habibc (@habibc).</description>
    <link>https://forem.com/habibc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F33715%2F122a5921-bc5c-4853-bd50-304e8c18d32f.jpeg</url>
      <title>Forem: habibc</title>
      <link>https://forem.com/habibc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/habibc"/>
    <language>en</language>
    <item>
      <title>One thousand captcha photos organized with a neural network</title>
      <dc:creator>habibc</dc:creator>
      <pubDate>Fri, 18 Aug 2017 17:22:15 +0000</pubDate>
      <link>https://forem.com/clarifai/one-thousand-captcha-photos-organized-with-a-neural-network</link>
      <guid>https://forem.com/clarifai/one-thousand-captcha-photos-organized-with-a-neural-network</guid>
      <description>&lt;p&gt;In this post, we’ll dive deeper into organizing photos by visual similarity in three steps: embedding via a neural net, further dimension reduction via t-SNE, and snapping things to a grid by solving an assignment problem. Then we’ll walk you through doing this yourself by calling one of our endpoints on your own Clarifai application.&lt;/p&gt;

&lt;p&gt;The below image shows 1024 of the captcha photos used in “I’m not a human: Breaking the Google reCAPTCHA by Sivakorn, Polakis, and Keromytis arranged on a 32×32 grid in such a way that visually-similar photos appear in close proximity to each other on the grid.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/http%3A%2F%2Fblog.clarifai.com%2Fwp-content%2Fuploads%2F2017%2F08%2F1000captcha.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/http%3A%2F%2Fblog.clarifai.com%2Fwp-content%2Fuploads%2F2017%2F08%2F1000captcha.jpeg" alt="1000captcha"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How did we do this?
&lt;/h2&gt;

&lt;p&gt;To get from the collection of captcha photos to the grid above we take three steps: embedding via a neural net, further dimension reduction via t-SNE, and finally snapping things to a grid by solving an assignment problem. Images are naturally very high-dimensional objects, even a “small 224×224 image requires 224*224*3=150,528 RGB values. When represented naively as huge vectors of pixels visually-similar images may have enormous vector distances between them. For example, a left/right flip will generate a visually-similar image but can easily lead to a situation where each pixel in the flipped version has an entirely different value from the original.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Remark: Code for all of this is available here: &lt;a href="https://github.com/Clarifai/public-notebooks/blob/master/gridded_tsne_blog_public.ipynb" rel="noopener noreferrer"&gt;https://github.com/Clarifai/public-notebooks/blob/master/gridded_tsne_blog_public.ipynb&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs3.amazonaws.com%2Fimtagco%2Fblog%2F2x2captcha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs3.amazonaws.com%2Fimtagco%2Fblog%2F2x2captcha.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Reducing from 150528 to 1024 dimensions with a neural net
&lt;/h3&gt;

&lt;p&gt;Our photos begin as 224x224x3 arrays of RGB values. We pass each image through an existing pre-trained neural network, Clarifai’s &lt;a href="https://developer.clarifai.com/models/general-embedding-image-recognition-model/bbb5f41425b8468d9b7a554ff10f8581" rel="noopener noreferrer"&gt;general embedding model&lt;/a&gt; which provides us with the activations from one of the top layers of the net. Using the higher layers from a neural net provides us with representations of our images which are rich in semantic information – the vectors of visually similar images will be close to each other in the 1024-dimensional space.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Reducing from 1024 to 2 dimensions with t-SNE
&lt;/h3&gt;

&lt;p&gt;In order to bring things down to a space where we can start plotting, we must reduce dimensions again. We have lots of options here. Some examples:&lt;/p&gt;

&lt;h4&gt;
  
  
  Inductive methods for embedding learning
&lt;/h4&gt;

&lt;p&gt;Techniques such as the remarkably hard-to-Google &lt;a href="http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf" rel="noopener noreferrer"&gt;Dr. LIM&lt;/a&gt; or Siamese Networks with triplet losses learn a function that can embed new images to fewer dimensions without any additional retraining. These techniques perform extremely well on benchmark datasets and are a great fit for online systems which must index previously-unseen images. For our application, we only need to get a fixed set of vectors reduced to 2D in one large, slow, step.&lt;/p&gt;

&lt;h4&gt;
  
  
  Transductive methods for dimensionality reduction
&lt;/h4&gt;

&lt;p&gt;Rather than learning a function which can new points to few dimensions we can attack our problem more directly by learning a mapping from the high-dimensional space to 2D which preserves distances in the high-dimensional space as much as possible. Several techniques are available: &lt;a href="https://distill.pub/2016/misread-tsne/" rel="noopener noreferrer"&gt;t-SNE&lt;/a&gt; and &lt;a href="https://github.com/lferry007/LargeVis" rel="noopener noreferrer"&gt;largeVis&lt;/a&gt; to name a few. Other methods, such as PCA, are not optimized for distance preservation or visualization and tend to produce less interesting plots. t-SNE, even during convergence, can produce very interesting plots (cf. this demonstration by &lt;a href="https://twitter.com/genekogan" rel="noopener noreferrer"&gt;@genekogan&lt;/a&gt; &lt;a href="https://vimeo.com/191187346" rel="noopener noreferrer"&gt;here&lt;/a&gt; ).&lt;/p&gt;

&lt;p&gt;We use t-SNE to map our 1024D vectors down to 2D and generate the first entry in the above grid. Recall that our high-dimensional space here are 1024D vector embeddings from a neural net, so proximal vectors show correspond to visually similar photos. Without the neural net t-SNE would be a poor choice as distances between the initial 224x224x3 vectors are uninteresting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Snapping to a grid with the Jonker-Volgenant algorithm
&lt;/h3&gt;

&lt;p&gt;One problem with t-SNE’d embeddings is that if we displayed the images directly over their corresponding 2D points we’d be left with swaths of empty white space and crowded regions where images overlap each other. We remedy this by building a 32×32 grid and moving the t-SNE’d points to the grid in such a way that total distance traveled is optimal.&lt;/p&gt;

&lt;p&gt;It turns out that this operation can be incredibly sophisticated. There is an entire field of mathematics, &lt;a href="https://en.wikipedia.org/wiki/Transportation_theory_(mathematics)" rel="noopener noreferrer"&gt;transportation theory&lt;/a&gt;, concerned with solutions to problems in optimal transport under various circumstances. For example, if one’s goal is to minimize the sum of the squares of all distances traveled rather than simply the sum of the distances traveled (ie the l2 Monge-Kantorovitch mass transfer problem) an optimal mapping can be found by recasting the assignment problem as one in computational fluid dynamics and &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.7.6791&amp;amp;rep=rep1&amp;amp;type=pdf" rel="noopener noreferrer"&gt;solving the corresponding PDEs&lt;/a&gt;. &lt;a href="https://en.wikipedia.org/wiki/C%C3%A9dric_Villani" rel="noopener noreferrer"&gt;Cedric Villani&lt;/a&gt;, who won a Fields medal in 2010, wrote a great &lt;a href="http://cedricvillani.org/wp-content/uploads/2012/08/preprint-1.pdf" rel="noopener noreferrer"&gt;book&lt;/a&gt; on optimal transportation theory which is worth taking a look at when you get tired of corporate machine learning blogs.&lt;/p&gt;

&lt;p&gt;In our setting, we just want the t-SNE’d points to snap to the grid in a way that makes this look visually appealing and be as simple as possible. Thus, we search for a mapping that minimizes the sum of the distances traveled via a &lt;a href="https://en.wikipedia.org/wiki/Assignment_problem" rel="noopener noreferrer"&gt;linear assignment problem&lt;/a&gt;. The textbook solution here is to use the &lt;a href="https://en.wikipedia.org/wiki/Hungarian_algorithm" rel="noopener noreferrer"&gt;Hungarian algorithm&lt;/a&gt;, however, this can be also be solved quite easily and much faster using &lt;a href="https://blog.sourced.tech/post/lapjv/" rel="noopener noreferrer"&gt;Jonker-Volgenant&lt;/a&gt; and &lt;a href="https://github.com/src-d/lapjv" rel="noopener noreferrer"&gt;open source tools&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How easy can we make this?
&lt;/h2&gt;

&lt;p&gt;Pretty easy. In addition to the notebook listed above, we’ve also set up an API endpoint that will generate an image similar to the one above for an existing Clarifai application. Here we assume you already have created an application by visiting &lt;a href="https://clarifai.com/developer/account/applications/" rel="noopener noreferrer"&gt;https://clarifai.com/developer/account/applications&lt;/a&gt; and added your favorite images to it by calling the resource &lt;em&gt;&lt;a href="https://api.clarifai.com/v2/inputs" rel="noopener noreferrer"&gt;https://api.clarifai.com/v2/inputs&lt;/a&gt;&lt;/em&gt;. Then all you have to do is this:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Kick off an asynchronous gridded t-SNE visualization
&lt;/h3&gt;

&lt;p&gt;Since generating a visualization takes a while, we generate one asynchronously. We kick off a visualization by calling&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST https://api.clarifai.com/v2/visualizations/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get a response like below informing us a “pending visualization is scheduled to be computed.&lt;/p&gt;

&lt;p&gt; &lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "output": {
        "id": "ca69f34d53c742e1b4a1b71d7b4b4586",
        ...
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the id &lt;em&gt;ca69f34d53c742e1b4a1b71d7b4b4586&lt;/em&gt;. We will use that id to get the visualization we just kicked off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Check to see if the visualization is done
&lt;/h3&gt;

&lt;p&gt;Call&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /v2/visualizations/ca69f34d53c742e1b4a1b71d7b4b4586
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned visualization will be “pending for a while, but eventually, we should get a response like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "output": {
        "data": {
            "image": {
                "url": "https://s3.amazonaws.com/clarifai-visualization/gridded-tsne/staging/your-visualization.jpg"
            }
        },
        ...
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At last, the &lt;code&gt;output.data.image.url&lt;/code&gt; contains your gridded t-SNE visualization.&lt;/p&gt;

&lt;p&gt;If you have any questions on the post you can reach out to &lt;a href="//mailto:hackers@clarifai.com"&gt;hackers@clarifai.com&lt;/a&gt;. Also send us your t-SNE visualizations if you want them shared!&lt;/p&gt;

</description>
      <category>engineering</category>
      <category>neuralnetwork</category>
      <category>imagerecognition</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
