<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: IgorSusmelj</title>
    <description>The latest articles on Forem by IgorSusmelj (@igorsusmelj).</description>
    <link>https://forem.com/igorsusmelj</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F344770%2Fa5333efd-feeb-420d-9f64-f02396c86f97.jpeg</url>
      <title>Forem: IgorSusmelj</title>
      <link>https://forem.com/igorsusmelj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/igorsusmelj"/>
    <language>en</language>
    <item>
      <title>RustyNum Follow-Up: Fresh Insights and Ongoing Development</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Sun, 16 Feb 2025 20:38:17 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/rustynum-follow-up-fresh-insights-and-ongoing-development-18f9</link>
      <guid>https://forem.com/igorsusmelj/rustynum-follow-up-fresh-insights-and-ongoing-development-18f9</guid>
      <description>&lt;p&gt;Hey Dev Community!&lt;/p&gt;

&lt;p&gt;As a follow-up to my previous introduction to &lt;a href="https://github.com/IgorSusmelj/rustynum" rel="noopener noreferrer"&gt;RustyNum&lt;/a&gt;, I want to share a developer-focused update about what I’ve been working on these last few weeks. RustyNum, as you might recall, is my lightweight, Rust-powered alternative to NumPy published on GitHub under MIT license. It uses Rust’s portable SIMD features for faster numerical computations, while staying small (around ~300kB for the Python wheel). In this post, I’ll explore a few insights gained during development, point out where it really helps, and highlight recent additions to the documentation and tutorials.&lt;/p&gt;

&lt;h2&gt;
  
  
  Brief Recap
&lt;/h2&gt;

&lt;p&gt;If you missed the initial announcement, RustyNum focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High performance using Rust’s SIMD&lt;/li&gt;
&lt;li&gt;Memory safety in Rust, without GC overhead&lt;/li&gt;
&lt;li&gt;Small distribution size (much smaller than NumPy wheels)&lt;/li&gt;
&lt;li&gt;NumPy-like interface to reduce friction for Python users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a more detailed overview, head over to the &lt;a href="https://rustynum.com/" rel="noopener noreferrer"&gt;official RustyNum website&lt;/a&gt; or check out &lt;a href="https://dev.to/igorsusmelj/building-rustynum-crafting-a-numpy-alternative-with-rust-and-python-48ad"&gt;my previous post on dev.to&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer’s Perspective: What’s New?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Working with Matrix Operations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I’ve spent a good chunk of time ensuring matrix operations feel familiar. Being able to do something like matrix-vector or matrix-matrix multiplication with minimal code changes from NumPy was a primary goal. A highlight is the &lt;code&gt;.dot()&lt;/code&gt; function and the &lt;code&gt;@&lt;/code&gt; operator, which both support these operations.&lt;/p&gt;

&lt;p&gt;Check out the dedicated tutorial:&lt;br&gt;
&lt;a href="https://rustynum.com/tutorials/better-matrix-operations/" rel="noopener noreferrer"&gt;Better Matrix Operations with RustyNum&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s a quick snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;rustynum&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;

&lt;span class="n"&gt;matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NumArray&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NumArray&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Use the dot function
&lt;/span&gt;&lt;span class="n"&gt;result_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;Matrix&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Vector&lt;/span&gt; &lt;span class="n"&gt;Multiplication&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_vec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s neat to see how close this is to NumPy’s workflow. Benchmarks suggest RustyNum can often handle these tasks at speeds comparable to, and sometimes faster than, NumPy on smaller or medium-sized datasets. For very large matrices, I’m still optimizing the approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Speeding Up Common Analytics Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Apart from matrix multiplications, I’ve kept refining operations like &lt;code&gt;mean()&lt;/code&gt;, &lt;code&gt;min()&lt;/code&gt;, &lt;code&gt;max()&lt;/code&gt;, and &lt;code&gt;dot()&lt;/code&gt;. These straightforward methods are prime candidates for SIMD acceleration. There’s also a &lt;a href="https://rustynum.com/tutorials/replacing-numpy-for-faster-analytics/" rel="noopener noreferrer"&gt;tutorial on how to replace specific NumPy calls with RustyNum for analytics&lt;/a&gt;, which might be useful if you’re bottlenecked by Python loops.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;rustynum&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;

&lt;span class="c1"&gt;# Generate test data
&lt;/span&gt;
&lt;span class="n"&gt;data_np&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data_rn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NumArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# NumPy approach
&lt;/span&gt;&lt;span class="n"&gt;mean_np&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# RustyNum approach
&lt;/span&gt;&lt;span class="n"&gt;mean_rn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_rn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;NumPy&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean_np&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;RustyNum&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean_rn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Python overhead can sometimes offset the raw Rust speed, but in many cases, RustyNum still shows advantages.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Tutorials: Real-World Examples
&lt;/h2&gt;

&lt;p&gt;One of the best ways to see RustyNum in action is through practical examples. I’ve added several new tutorials with real-world coding scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Better Matrix Operations&lt;/strong&gt; – Focus on dot products, matrix-vector, and matrix-matrix tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replacing Core NumPy Calls&lt;/strong&gt; – Demonstrates how to switch from NumPy’s mean, min, dot to RustyNum.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamlining ML Preprocessing&lt;/strong&gt; – Explores scaling, normalization, and feature engineering for machine learning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The last tutorial is a personal favorite. &lt;a href="https://rustynum.com/tutorials/streamlining-machine-learning-preprocessing/" rel="noopener noreferrer"&gt;It covers the typical data transformations you’d do in a machine learning pipeline—just swapping out NumPy calls for RustyNum&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check out a snippet of scaling code from that guide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;min_max_scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;col_mins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;col_maxes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col_idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;col_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;col_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;col_mins&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;col_maxes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="n"&gt;scaled_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col_idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;col_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;col_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;numerator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col_data&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;col_mins&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;denominator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col_maxes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;col_mins&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
        &lt;span class="n"&gt;scaled_col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;numerator&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;denominator&lt;/span&gt;
        &lt;span class="n"&gt;scaled_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scaled_col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NumArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scaled_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s a small snippet, but it shows how RustyNum can do row/column manipulations quite effectively. After scaling, you can still feed the data into your favorite machine learning frameworks. The overhead of converting RustyNum arrays back into NumPy or direct arrays is minimal compared to the cost of big model training steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ongoing Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Large Matrix Optimizations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I’ve noticed that for very large matrices (like 10k×10k), RustyNum’s current code paths aren’t yet fully optimized compared to NumPy. This area remains an active project. RustyNum is still young, and I’m hoping to introduce further parallelization or block-based multiplication techniques for better large-scale performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Expanded Data Types&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RustyNum supports float32 and float64 well, plus some integer types. I’m considering adding stronger integer support for data science tasks like certain indexing or small transformations. Meanwhile, advanced data types (e.g., complex numbers) might appear further down the line if the community needs them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Documentation and API Enhancements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The docs site at &lt;a href="https://rustynum.com/" rel="noopener noreferrer"&gt;rustynum.com&lt;/a&gt; has an API reference and a roadmap. I’m continuously adding to it. If you spot anything missing or if you have a specific use case in mind, feel free to open a GitHub issue or submit a pull request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The big goal of Rustynum&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RustyNum is simply a learning exercise for me to combine Rust and Python. Since I spend every day around machine learning I would love to have RustyNum replace part of my daily Numpy routines. And we're slowly getting there. I started adding more and more methods around the topic of how to integrate RustyNum in ML pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Code Example: ML Integration
&lt;/h2&gt;

&lt;p&gt;To demonstrate how RustyNum fits into a data pipeline, here’s a condensed example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;rustynum&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="c1"&gt;# 1) Create synthetic data in NumPy
&lt;/span&gt;&lt;span class="n"&gt;train_np&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;labels_np&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2) Convert to RustyNum for fast scaling
&lt;/span&gt;&lt;span class="n"&gt;train_rn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NumArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Basic scaling (compute min and max per column)
&lt;/span&gt;&lt;span class="n"&gt;scaled_rn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col_idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_rn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="n"&gt;col_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_rn&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;col_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;mn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;rng&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mx&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mn&lt;/span&gt; &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mx&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;mn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
    &lt;span class="n"&gt;scaled_col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col_data&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;
    &lt;span class="n"&gt;scaled_rn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scaled_col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;train_scaled_rn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NumArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scaled_rn&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3) Convert back to NumPy for scikit-learn
&lt;/span&gt;&lt;span class="n"&gt;train_scaled_np&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_scaled_rn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4) Train a logistic regression model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_scaled_np&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;labels_np&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model Coefficients:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coef_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script highlights that RustyNum can handle data transformations with a Pythonic feel, after which you can pass the arrays into other libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;It’s been fun to expand RustyNum’s features and see how well Rust can integrate with Python for high-performance tasks. The recent tutorials are a window into how RustyNum might replace parts of NumPy in data science or ML tasks, especially when smaller array sizes or mid-range tasks are involved.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check out the tutorials at rustynum.com&lt;/li&gt;
&lt;li&gt;Contribute or report issues on GitHub&lt;/li&gt;
&lt;li&gt;Share feedback if there’s a feature you’d love to see&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for tuning in to this developer-focused update, and I look forward to hearing how RustyNum helps you in your own projects!&lt;/p&gt;

&lt;p&gt;Happy Coding!&lt;br&gt;
Igor&lt;/p&gt;

</description>
      <category>rust</category>
      <category>python</category>
      <category>opensource</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Building RustyNum: a NumPy Alternative with Rust and Python</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Sun, 22 Sep 2024 14:14:20 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/building-rustynum-crafting-a-numpy-alternative-with-rust-and-python-48ad</link>
      <guid>https://forem.com/igorsusmelj/building-rustynum-crafting-a-numpy-alternative-with-rust-and-python-48ad</guid>
      <description>&lt;p&gt;Hey Dev Community!&lt;/p&gt;

&lt;p&gt;I wanted to share a side project I’ve been working on called &lt;a href="https://github.com/IgorSusmelj/rustynum" rel="noopener noreferrer"&gt;RustyNum&lt;/a&gt;. As someone who uses NumPy daily for data processing and scientific computing, I often wondered how challenging it would be to create a similar library from scratch using Rust and Python. This curiosity sparked the development of RustyNum—a lightweight alternative to NumPy that leverages Rust’s powerful features.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is RustyNum?
&lt;/h2&gt;

&lt;p&gt;RustyNum combines the speed and memory safety of Rust with the simplicity and flexibility of Python. One of the standout features is that &lt;a href="https://doc.rust-lang.org/std/simd/index.html" rel="noopener noreferrer"&gt;it's using Rust’s portable SIMD&lt;/a&gt; (Single Instruction, Multiple Data) feature, which allows RustyNum to optimize computations across different CPU architectures seamlessly. This means you can achieve high-performance array manipulations without leaving the Python ecosystem. I wanted to learn building a library from scratch and as a result RustyNum is not using any 3rd party dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why RustyNum?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Performance Boost: By utilizing Rust’s portable SIMD, RustyNum can handle performance-critical tasks more efficiently than traditional Python libraries.&lt;/li&gt;
&lt;li&gt;Memory Safety: Rust ensures memory safety without a garbage collector, reducing the risk of memory leaks and segmentation faults.&lt;/li&gt;
&lt;li&gt;Learning Experience: This project has been a fantastic way for me to dive deeper into Rust-Python interoperability and explore the intricacies of building numerical libraries.&lt;/li&gt;
&lt;li&gt;Because no external dependencies are used the Python wheels are super small (300kBytes) compared to alternatives such as Numpy (&amp;gt;10MBytes).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to Consider RustyNum:
&lt;/h2&gt;

&lt;p&gt;If you’re working on data analysis, scientific computing, or small-scale machine learning projects and find NumPy a bit heavy for your needs, RustyNum might be the perfect fit. It’s especially useful when you need optimized performance across various hardware without the complexity of integrating with C-based libraries. However, be aware that the library is pretty much in its early days and only covers basic operations from Numpy as of today.&lt;/p&gt;

&lt;p&gt;You can &lt;a href="https://github.com/IgorSusmelj/rustynum" rel="noopener noreferrer"&gt;check out RustyNum on GitHub&lt;/a&gt;. I’d love to hear your feedback, suggestions, or contributions!&lt;/p&gt;

&lt;p&gt;Update January 28th: &lt;a href="https://rustynum.com/" rel="noopener noreferrer"&gt;RustyNum also has its own website!&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks for reading, and happy coding!&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br&gt;
Igor&lt;/p&gt;

</description>
      <category>rust</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Self-Supervised Models are More Robust and Fair</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Thu, 07 Apr 2022 18:00:26 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/self-supervised-models-are-more-robust-and-fair-3f4h</link>
      <guid>https://forem.com/igorsusmelj/self-supervised-models-are-more-robust-and-fair-3f4h</guid>
      <description>&lt;p&gt;&lt;strong&gt;‍A recent paper from Meta AI Research shows that their new 10 billion parameter model trained using self-supervised learning breaks new ground in robustness and fairness.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In spring 2021 Meta AI (former Facebook AI) published &lt;a href="https://arxiv.org/abs/2103.01988"&gt;SEER (Self-supervised Pretraining of Visual Features in the Wild)&lt;/a&gt;. SEER showed that training models using self-supervision works well on large-scale uncurated datasets and their model reached state-of-the-art when it was published.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3yEnErKA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pmm5alce7vcta91sq5or.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3yEnErKA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pmm5alce7vcta91sq5or.png" alt="Accuracy plot from SEER paper. Although the models are fine-tuned and evaluated on ImageNet they use different datasets for pre-training and different models. SEER for example uses a RegNet and has been pre-trained on a dataset of 1B images." width="880" height="810"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model has been pre-trained on 1 billion random and uncurated images from Instagram. The accompanying blog post created some noise around the model as it set the ground for going further with larger models and larger datasets using self-supervised learning: &lt;a href="https://ai.facebook.com/blog/seer-the-start-of-a-more-powerful-flexible-and-accessible-era-for-computer-vision/"&gt;https://ai.facebook.com/blog/seer-the-start-of-a-more-powerful-flexible-and-accessible-era-for-computer-vision/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In contrast to supervised learning, the approach of self-supervised learning does not require vasts amounts of labeled datasets, therefore significantly reducing the costs.&lt;br&gt;
Basically, the two main ingredients-lots of data and lots of compute-are enough, as shown in this paper. Besides the independence of labeled data, there are further &lt;a href="https://www.lightly.ai/post/the-advantage-of-self-supervised-learning"&gt;advantages in using self-supervised learning&lt;/a&gt; we talked about in another post.&lt;/p&gt;

&lt;h2&gt;
  
  
  SEER is more robust and fair
&lt;/h2&gt;

&lt;p&gt;In this post, we’re more interested in the new follow-up paper to the initial SEER paper. The new paper has been published in late February 2022:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2202.08360"&gt;Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision, 2022&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The original SEER paper showed that larger datasets combined with self-supervised pre-training result in higher model accuracy on downstream tasks such as ImageNet. The new paper takes this one step further and investigates what happens if we train even larger models with respect to robustness and fairness.&lt;/p&gt;

&lt;p&gt;Model robustness and fairness have recently gained more attention in the ML community.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;robustness&lt;/strong&gt;, we are interested in how reliable the model works when facing changes in its input data distribution. This is a common problem as models once deployed might face scenarios they have never seen during training.&lt;/p&gt;

&lt;p&gt;Model &lt;strong&gt;fairness&lt;/strong&gt; focuses on the evaluation of models towards gender, geographical and other diversity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;But what exactly is fairness and how can we measure it?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous work in Fairness Indicators
&lt;/h2&gt;

&lt;p&gt;Although Fairness in ML is an older research area the field has recently gained lots of interest. For example, the paper &lt;a href="https://arxiv.org/abs/2202.07603"&gt;Fairness Indicators for Systematic Assessments of Visual Feature Extractors, 2022&lt;/a&gt; introduces three rather simple indicators one can use to evaluate model fairness.&lt;/p&gt;

&lt;p&gt;The approach is to fine-tune (think about transfer learning) a trained backbone to make predictions across three indicators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;harmful mislabeling&lt;/strong&gt; of images of people by training a classifier
&lt;em&gt;Harmful mislabeling happens when a model associates attributes like “crime” or non-human attributes like “ape” or “puppet” with a human. Apart from being low overall, the mislabeling rate should be independent of gender or skin color. If however, people with a certain gender or skin color are mislabeled more often than others, the model is unfair.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;geographical disparity&lt;/strong&gt; in object recognition by training a classifier
&lt;em&gt;How well do we recognize objects all around the world? Common objects like chairs/ streets/ houses look different across the globe.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;disparities in learned visual representations&lt;/strong&gt; of social memberships of people by using similarity search to retrieve similar examples
&lt;em&gt;If we do similarity lookup of people of different skintone do we also get similar skintones in the set of nearest neighbors?&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In their paper, three different training methodologies and datasets are used for the evaluation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supervised training on ImageNet&lt;/li&gt;
&lt;li&gt;Weakly-supervised training on filtered Instagram data&lt;/li&gt;
&lt;li&gt;Self-Supervised training on ImageNet or uncurated Instagram data (SEER model)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three training paradigms use the same model architecture. A RegNetY-128 backbone with 700M parameters.&lt;/p&gt;

&lt;p&gt;The results of the evaluation show that training models with less supervision seem to improve the fairness of the trained models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9_7AkNeO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/413ikvch81euxq8kzb7f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9_7AkNeO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/413ikvch81euxq8kzb7f.png" alt="Label association results. For harmful association, lower hit-rate is better. For non-harmful association, higher hit-rate is better. From Fairness Indicators paper, 2022." width="880" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1UBY7kFf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0cn7pqd8wk865omc3j78.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1UBY7kFf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0cn7pqd8wk865omc3j78.png" alt="Geodiversity, hit rates for Supervised, WSL and SSL. From Fairness Indicators paper, 2022." width="880" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Going Deeper into Fairness Evaluation
&lt;/h2&gt;

&lt;p&gt;Now, how does the new SEER paper build on top of the &lt;a href="https://arxiv.org/abs/2202.07603"&gt;Fairness Indicators paper&lt;/a&gt; we just discussed?&lt;/p&gt;

&lt;p&gt;There are two main additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A larger model (A RegNetY-10B model with 10B parameters, over 14x times more)&lt;/li&gt;
&lt;li&gt;Evaluation across 50+ benchmarks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to fit the model onto the GPUs tricks like model sharding, Fully Sharded Data Parallel (FSDP), and Activation Checkpointing are used. Finally, the authors used a batch size of 7,936 across 496 NVIDIA A100 GPUs. Note that according to the paper only 1 Epoch is used for the self-supervised pertaining. On ImageNet these models are often trained for 800 epochs or more. This means that even though the dataset is almost 1'000 times larger the images seen during training are comparable.&lt;/p&gt;

&lt;p&gt;Since the authors used a very large dataset (1 Billion images) they argue that they can also scale the model size: &lt;em&gt;We scale our model size to dense 10 billion parameters to avoid underfitting on a large data size.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;The paper is full of interesting plots and tables. We will just highlight a few of them in this post. For more information, please have a look at the paper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fairness
&lt;/h3&gt;

&lt;p&gt;This benchmark is the one we looked at previously when talking about the Fairness Indicators paper. If you look at the table you find different genders and skintones as well as different age groups. Interesting here is that all SSL pre-trained models perform much better than the supervised counterpart.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nBTp_OkL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/448v6f4z1hf19w5lmbjx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nBTp_OkL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/448v6f4z1hf19w5lmbjx.png" alt="Image description" width="880" height="608"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Out-of-distribution Performance
&lt;/h3&gt;

&lt;p&gt;Out-of-distribution or commonly called robustness towards data distribution shifts is a common problem in ML. A model trained on a set of images might find slightly different images once deployed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sHAGCvVv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zbl6bc80nyui9k6igp8m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sHAGCvVv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zbl6bc80nyui9k6igp8m.png" alt="Previous papers on SSL have already shown that SSL pre-training results in more robust models compared to supervised pre-training. The large 10B SEER model shows that even with larger models the performance still increases." width="880" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;While self-supervised learning (SSL) is still in its infancy the research direction looks very promising. Not having to rely on large and fully labeled datasets allows training models that are more robust to data distribution shifts (data drifts) and are fairer than their supervised counterparts. The new paper shows very interesting insights.&lt;/p&gt;

&lt;p&gt;If you’re interested in self-supervised learning and want to try it out yourself you can check out our &lt;a href="https://github.com/lightly-ai/lightly"&gt;open-source repository for self-supervised learning&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We’re using self-supervised learning at Lightly from day one as our initial experiments showed its benefits when doing large-scale data curation. We’re super happy that new research papers support our approach and we hope that we can help curate datasets to allow for less biased datasets and more fair models.&lt;/p&gt;

&lt;p&gt;Igor, co-founder&lt;br&gt;
&lt;a href="https://lightly.ai/"&gt;Lightly.ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Data Annotation and What Data Annotation Companies do</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Mon, 21 Feb 2022 22:07:02 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/data-annotation-and-what-data-annotation-companies-do-5ej1</link>
      <guid>https://forem.com/igorsusmelj/data-annotation-and-what-data-annotation-companies-do-5ej1</guid>
      <description>&lt;p&gt;&lt;strong&gt;Data annotation is one of the core functions of machine learning. The more data an ML model is trained with, the more accurate it will become.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Just like humans learn through training and practice, machine learning models are also trained by feeding them with huge volumes of data.&lt;/p&gt;

&lt;p&gt;One of the reasons Google is still the best search engine is because it has a lot of data compared to its competitors, including Yahoo and Bing (Microsoft’s search engine). With this data, Google is able to give users the best search results that match their search queries. Several other web apps also rely on data annotation to improve their algorithms in order to enhance their users’ experience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2cE8R-Mk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x8g25cg0tebrj1ojkk8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2cE8R-Mk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x8g25cg0tebrj1ojkk8t.png" alt="An autonomous robot learns to navigate and understand its surrounding after learning from annotated data." width="880" height="660"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  So, what is data annotation?
&lt;/h2&gt;

&lt;p&gt;Data annotation refers to the process of categorizing and labeling information or data so that machine learning models can use it. The data used to train machine learning models has to be accurately labeled and categorized for specific use cases. For instance, the categorization and labeling of data to be used by a search engine ML model is different from a speech recognition ML model.&lt;/p&gt;

&lt;p&gt;Data annotation involves assessing four primary types of data; text, audio, video, and image. This article will focus mainly on images and texts annotation since they are the most popular types of data used to train machine learning models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text annotation
&lt;/h2&gt;

&lt;p&gt;A 2020 State of AI and Machine Learning report shows that over 70% of companies relied on text to train their AI and machine learning models. The common types of annotations used with text include; sentiment, intent, and query. Let’s discuss each of these in detail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sentiment Annotation&lt;/strong&gt;&lt;br&gt;
Sentiment annotation involves assessing emotions, attitudes, and opinions, making it crucial to have the proper training data for machine learning models. Sentiment annotation is done by humans because it involves moderating content and sentiments on platforms such as social media and eCommerce sites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query annotation&lt;/strong&gt;&lt;br&gt;
This type of text annotation involves training search algorithms by tagging the various components within product titles and search queries to improve the relevance of search results. Algorithms that use query annotation are usually found in search engines for eCommerce platforms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent annotation&lt;/strong&gt;&lt;br&gt;
This type of text annotation involves training machine learning models to identify intention in a particular text. Intent annotations help ML models to differentiate various inputs into categories, including requests, commands, bookings, recommendations, and confirmations. This type of text annotation is mainly used to train search engine Machine Learning models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image annotation
&lt;/h2&gt;

&lt;p&gt;Image annotation involves training machine learning models with several images to help them learn about the features in those images. Some of the applications that use such algorithms include; computer vision, robotic vision, and apps that have facial recognition functionalities.&lt;/p&gt;

&lt;p&gt;For effective training of ML models with image annotation, metadata has to be attached to all the images used. This metadata usually includes identifiers, captions, and keywords. Some of the popular use cases that take advantage of image annotation include; health apps that auto-identify medical conditions, computer vision systems in self-driving cars, machines used for sorting things, and many more.&lt;/p&gt;

&lt;p&gt;Image annotation is more intense and requires more computation power than text annotation. This is simply because images carry way more data than texts. Training ML models with images involves learning about all the pixels in the various images fed into the ML model.&lt;/p&gt;

&lt;p&gt;Images annotation has five main types, and these include;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bounding boxes annotation&lt;/strong&gt;&lt;br&gt;
With bounding boxes, human annotators are tasked to draw boxes around specific subjects within the image. This type of annotation is mainly used to train autonomous vehicle algorithms to detect objects such as road labels, traffic, potholes, etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3D cuboids annotation&lt;/strong&gt;&lt;br&gt;
This type of image annotation involves drawing 3D boxes around specific objects in an image. Unlike bounding boxes that only consider length and width, 3D cuboids include the height or depth of the object.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Polygons&lt;/strong&gt;&lt;br&gt;
At times some objects may not fit well in a bounding box or 3D cuboid because not all things are rectangular. Objects such as cars, humans, and buildings are usually not perfectly rectangular, so they can’t fit in a rectangle or cuboid. In this case, human annotators have to draw polygons around the non-rectangular objects before feeding this data to an ML model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lines and spines&lt;/strong&gt;&lt;br&gt;
These are used to train machine learning models to identify lanes and boundaries. So, annotators are required to draw lanes between certain boundaries that you would wish your ML model to learn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic segmentation&lt;/strong&gt;&lt;br&gt;
This is a much more precise and deeper type of annotation that involves associating every pixel in a given image with a tag. This annotation type is mainly used in machine learning models for autonomous vehicles and medical image diagnostics.&lt;/p&gt;

&lt;h2&gt;
  
  
  What do data annotation companies do?
&lt;/h2&gt;

&lt;p&gt;One of the major challenges involved in training machine learning models is finding the right quality and quantity of data to feed them. Remember, the quality and amount of data you provide these models determine the overall outcome of the tasks these models will be finally be deployed to do.&lt;/p&gt;

&lt;p&gt;To help fix these issues, data annotation companies avail the appropriate amount of data that can be used to train various types of AI and ML models. These companies use the human-assisted approach and machine-learning assistance to provide high-quality data to train AI and ML models.&lt;/p&gt;

&lt;p&gt;Besides providing training data for AI and ML models, data annotation companies also offer deploying and maintaining services for AI and ML projects. These are follow-up services meant to ensure the provided data provides the desirable results wherever the ML algorithm trained using this data is deployed.&lt;/p&gt;

&lt;p&gt;For instance, if it is a search algorithm deployed in an eCommerce site, the data annotation company has to ensure the algorithm provides the best search results for the various user queries.&lt;/p&gt;

&lt;p&gt;Check out our &lt;a href="https://data-annotation.com/list-of-data-annotation-companies/"&gt;list of data annotation companies&lt;/a&gt; to learn more!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally posted here: &lt;a href="https://data-annotation.com/data-annotation-and-what-data-annotation-companies-do/"&gt;https://data-annotation.com/data-annotation-and-what-data-annotation-companies-do/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Train Test Split in Deep Learning</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Sun, 20 Feb 2022 18:59:11 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/train-test-split-in-deep-learning-4gbl</link>
      <guid>https://forem.com/igorsusmelj/train-test-split-in-deep-learning-4gbl</guid>
      <description>&lt;p&gt;&lt;strong&gt;One of the golden rules in machine learning is to split your dataset into train, validation, and test set. Learn how to bypass the most common caveats!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The reason we do that is very simple. If we would not split the data into different sets the model would be evaluated on the same data it has seen during training. We therefore could run into problems such as overfitting without even knowing it.&lt;/p&gt;

&lt;p&gt;Back before using deep learning models we often used three different sets.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;train set&lt;/strong&gt; is used for training the model&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;validation set&lt;/strong&gt; that is used to evaluate the model during the training process&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;test set&lt;/strong&gt; that is used to evaluate the final model accuracy before deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How do we use the train, validation, and test set?
&lt;/h2&gt;

&lt;p&gt;Usually, we use the different sets as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We split the dataset randomly into three subsets called the &lt;strong&gt;train&lt;/strong&gt;, &lt;strong&gt;validation&lt;/strong&gt;, and &lt;strong&gt;test set&lt;/strong&gt;. Splits could be 60/20/20 or 70/20/10 or any other ratio you desire.&lt;/li&gt;
&lt;li&gt;We train a model using the &lt;strong&gt;train set&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;During the training process, we evaluate the model on the &lt;strong&gt;validation set&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If we are not happy with the results we can change the hyperparameters or pick another model and &lt;em&gt;go again to step 2&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Finally, once we’re happy with the results on the &lt;strong&gt;validation set&lt;/strong&gt; we can evaluate our model on the &lt;strong&gt;test set&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If we’re happy with the results we can now again train our model on the &lt;strong&gt;train&lt;/strong&gt; and &lt;strong&gt;validation set&lt;/strong&gt; combined using last the hyperparameters we derived.&lt;/li&gt;
&lt;li&gt;We can again evaluate the model accuracy on the &lt;strong&gt;test set&lt;/strong&gt; and if we’re happy deploy the model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most ML frameworks provide built-in methods for random train/ test splits of a dataset. The most well-known example is the &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html"&gt;train_test_split function of scikit-learn&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Are there any issues when using a very small dataset?
&lt;/h2&gt;

&lt;p&gt;Yes, this could be a problem. With very small datasets the test set will be tiny and therefore a single wrong prediction has a strong impact on the test accuracy. Fortunately, there is a way to work around this problem.&lt;/p&gt;

&lt;p&gt;The solution to this problem is called &lt;a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)"&gt;cross-validation&lt;/a&gt;. We essentially create partitions of our dataset as shown in the image below. We always hold out a set for testing and use all the other data for training. Finally, we gather and average all the results from the testing sets. We essentially trained k models and using this trick managed to get statistics of evaluating the model on the full dataset (as every sample has been part of one of the k test sets).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uYY_cKoF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9agmdxw2iycf2igvld1m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uYY_cKoF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9agmdxw2iycf2igvld1m.png" alt="Illustration from Wikipedia showing how k-fold cross-validation works. We iteratively shuffle the data that is used for training and testing and evaluate the overall statistics." width="880" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach is barely used in recent deep learning methods as it’s super expensive to train a model k times.&lt;/p&gt;

&lt;p&gt;With the rise of deep learning and the massive increase in dataset sizes, the need for techniques such as cross-validation or having a separate validation set has diminished. One reason for this is that experiments are very expensive and take a long time. Another one is that due to the large datasets and nature of most deep learning methods the models got less affected by overfitting.&lt;/p&gt;

&lt;p&gt;Overfitting is still a problem in deep learning. But overfitting to 50 samples with 10 features happens faster than overfitting to 100k images with millions of pixels&lt;/p&gt;

&lt;p&gt;One could argue that researchers and practitioners got lazy/ sloppy. It would be interesting to see any recent paper investigating such effects again. For example, it could be that researchers in the past years have heavily overfitted their models to the test set of ImageNet as there has been an ongoing struggle to improve it and become state-of-the-art.&lt;/p&gt;

&lt;h2&gt;
  
  
  How should I pick my train, validation, and test set?
&lt;/h2&gt;

&lt;p&gt;Naively, one could just manually split the dataset into three chunks. The problem with this approach is that we humans are very biased and this bias would get introduced into the three sets.&lt;/p&gt;

&lt;p&gt;In academia, we learn that we should pick them randomly. A random split into the three sets guarantees that all three sets follow the same statistical distribution. And that’s what we want since ML is all about statistics.&lt;/p&gt;

&lt;p&gt;Deriving the three sets from completely different distributions would yield some unwanted results. There is not much value in training a model on pictures of cats if we want to use it to classify flowers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9ujq7jcB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7z6e8urmeke9bc9j9jbt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9ujq7jcB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7z6e8urmeke9bc9j9jbt.png" alt="How should I pick my train, validation, and test set?" width="880" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, the underlying assumption of a random split is that the initial dataset already matches the statistical distribution of the problem we want to solve. That would mean that for problems such as autonomous driving the assumption is that our dataset covers all sorts of cities, weather conditions, vehicles, seasons of the year, special situations, etc.&lt;/p&gt;

&lt;p&gt;As you might think this assumption is actually not valid for most practical deep learning applications. Whenever we collect data using sensors in an uncontrolled environment we might not have the desired data distribution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But that’s bad. What am I supposed to do if I’m not able to collect a representative dataset of the problem I try to solve?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What you’re looking for is the research area around finding and dealing with &lt;strong&gt;domain gaps&lt;/strong&gt;, &lt;strong&gt;distributional shifts&lt;/strong&gt;, or &lt;strong&gt;data drift&lt;/strong&gt;. All these terms have their own specific definition. I’m listing them here so you can search for the relevant problems easily.&lt;/p&gt;

&lt;p&gt;With a &lt;em&gt;domain&lt;/em&gt;, we refer to the data domain, as the source and type of the data we use. There are three ways to move forward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solve the data gap by collecting more representative data&lt;/li&gt;
&lt;li&gt;Use data curation methods to make the data already collected more representative&lt;/li&gt;
&lt;li&gt;Focus on building a robust enough model to handle such domain gaps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The latter approach is focusing on building models for out-of-distribution tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Picking a train test split for out-of-distribution tasks
&lt;/h2&gt;

&lt;p&gt;In machine learning, we refer to out-of-distribution whenever our model has to perform well in a situation where the new input data is from a different distribution than the training data. Going back to our autonomous driving example from before, we could say that for a model that has only been trained on sunny California weather, doing predictions in Europe is out of distribution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Now, how should we do the split of the dataset for such a task?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Since we collected the data using different sensors we also might have additional information about the source for each of the samples (a sample could be an image, lidar frame, video, etc.).&lt;/p&gt;

&lt;p&gt;We can solve this problem by splitting the dataset in the following way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we train on a set of data from cities in list A&lt;/li&gt;
&lt;li&gt;and evaluate the model on a set of data from cities in list B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is a &lt;a href="https://medium.com/yandex-self-driving-car/yandex-publishes-industrys-largest-av-dataset-launches-prediction-challenge-at-neurips-28d6bdfde78d"&gt;great article from Yandex research about their new dataset to tackle distributional shifts in datasets&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things that could go wrong
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The validation set and test set accuracy differ a lot&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You very likely overfitted your model to the validation set or validation and test set are very different. But how?&lt;/p&gt;

&lt;p&gt;You likely did several iterations of tweaking the parameters to squeeze out the last bit of accuracy your model can yield on the validation set. The validation set is no longer fulfilling its purpose. At this point, you should relax some of your hyperparameters or introduce regularization methods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After deriving my final hyperparameters I want to retrain my model on the full dataset (train + validation + test) before shipping&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No, don’t do this. The hyperparameters have been tuned for the train (or maybe the train + validation) set and might yield a different result when used for the full dataset.&lt;br&gt;
Furthermore, you won’t be able to answer the question anymore of how good your model really performs as the test set no longer exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I have a video dataset and want to split the frames randomly into train, validation, and test set&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since video frames are very likely highly correlated (e.g. two frames next to each other almost look the same) this is a bad idea. It’s almost the same as if we would evaluate the model on the training data. Instead, you should split the dataset across videos (e.g. videos 1,3,5 are used for training and video 2,4 for validation). You can again use a random train test split but this time on the video level instead of the frame level.&lt;/p&gt;

&lt;p&gt;Igor, co-founder&lt;br&gt;
&lt;a href="https://www.lightly.ai/"&gt;Lightly.ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This blog post has originally been published here: &lt;a href="https://www.lightly.ai/post/train-test-split-in-deep-learning"&gt;https://www.lightly.ai/post/train-test-split-in-deep-learning&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>Active Learning using Detectron2</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Sun, 30 May 2021 09:12:24 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/active-learning-using-detectron2-3i5g</link>
      <guid>https://forem.com/igorsusmelj/active-learning-using-detectron2-3i5g</guid>
      <description>&lt;p&gt;&lt;strong&gt;Tired of labeling all your data? Learn more about how model predictions and embeddings can help you select the right data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Supervised machine learning requires labeled data. In computer vision applications such as autonomous driving, labeling a single frame can cost up to $10. The fast growth of new, connected devices and cheaper sensors leads to a continuous increase in new data. Labeling everything is simply not possible anymore. Many companies in fact only label between 0.1% and 1% of the data they collect. But finding the right 0.1% of data is like finding the needle in the haystack without knowing what the needle looks like. So, how can we do it efficiently?&lt;/p&gt;

&lt;p&gt;One approach to tackle the problem is active learning. When doing active learning, we use a pre-trained model and use the model predictions to select the next batch of data for labeling. Different algorithms exist which help you select the right data based on model predictions. For example, the well-known approach of uncertainty sampling selects new data based on low model confidence. Let's assume a scenario where we have two images with cats, one where the model says it's 60% sure it's a cat and one where the model is 90% certain that there is a cat. We would now pick the image where the model has only 60% confidence. We essentially pick the "harder" example. &lt;br&gt;
With active learning, we iterate this prediction and selection process until we reach our target metrics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vsM--pIj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/71k4chnwcdleyhyxw3rj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vsM--pIj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/71k4chnwcdleyhyxw3rj.jpg" alt="Example image from Comma10k with model predictions of a Faster R-CNN model with a ResNet-50 backbone trained on MS COCO."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, we won't go into detail about how active learning works. There are many great resources about active learning. Instead, we will focus on how you can use active learning with just a few lines of code using the &lt;a href="https://docs.lightly.ai/getting_started/active_learning.html"&gt;Active Learning feature of Lightly&lt;/a&gt;. &lt;a href="https://lightly.ai/"&gt;Lightly&lt;/a&gt; is a data curation platform for computer vision. It leverages recent advances in self-supervised learning and active learning to help you work with unlabeled datasets.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Datasets: From MS COCO to Comma10k
&lt;/h2&gt;

&lt;p&gt;It is very common these days to use pre-trained models and fine-tune them on new tasks using transfer learning. Since we are interested in object detection here we use a pre-trained model from &lt;a href="https://cocodataset.org/#home"&gt;MS COCO&lt;/a&gt; (or COCO). Consisting of more than 100k labeled images, it is a very common dataset used for transfer learning for image segmentation, object detection, or keypoint/pose estimation.&lt;/p&gt;

&lt;p&gt;Our goal is to use active learning to use a COCO pre-trained model and fine-tune it on a dataset for autonomous driving. For this transfer task, we are using the &lt;a href="https://github.com/commaai/comma10k"&gt;Comma10k dataset&lt;/a&gt;. From the &lt;a href="https://github.com/commaai/comma10k"&gt;repository&lt;/a&gt;: "It's 10,000 PNGs of real driving captured from the comma fleet. It's MIT license, no academic-only restrictions or anything."&lt;/p&gt;

&lt;p&gt;As you might have noticed already the Comma10k dataset has annotations for training "segnets" (semantic segmentation networks). However, there are no annotations for bounding boxes we require for our transfer task. We, therefore, have to add the missing annotations. Instead of annotating all 10k images, we will use active learning to pick the first 100 images where we expect the highest return in model improvement and annotate them first.&lt;/p&gt;

&lt;p&gt;Let's have a look at how active learning can help us select the first 100 images for annotation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Let's get started
&lt;/h2&gt;

&lt;p&gt;This post is based on the &lt;a href="https://docs.lightly.ai/tutorials/platform/tutorial_active_learning_detectron2.html"&gt;Active Learning using Detectron2 on Comma10k tutorial&lt;/a&gt;. If you want to run the code yourself there is also a &lt;a href="https://colab.research.google.com/drive/1r0KDqIwr6PV3hFhREKgSjRaEbQa5N_5I?usp=sharing"&gt;ready-to-use Google Colab Notebook&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To get active learning working can be really hard. Many companies fail to implement active learning properly and get little to no value out of it. One of the main reasons for this is that they focus only on uncertainty sampling. Uncertainty sampling is one of two big categories of active learning algorithms. The following illustration shows the Knowledge Quadrant - The right column is Active Learning. (&lt;a href="https://medium.com/pytorch/https-medium-com-robert-munro-active-learning-with-pytorch-2f3ee8ebec"&gt;see Active Learning with PyTorch&lt;/a&gt;)&lt;br&gt;
You find the two active learning approaches on the right side.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pfrutGzX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/4800/0%2AnROAlmspxWcLEEpq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pfrutGzX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/4800/0%2AnROAlmspxWcLEEpq.png" alt="Knowledge Quadrant - The right column is Active Learning. (see Active Learning with PyTorch)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Uncertainty sampling is probably the most common approach. You pick new samples based on where model predictions have low confidence.&lt;/p&gt;

&lt;p&gt;The Lightly Platform supports both, uncertainty sampling as well as diversity sampling algorithms.&lt;/p&gt;

&lt;p&gt;The second approach is diversity sampling. You can use it to diversify the dataset. We pick images that are visually/ semantically distinct from each other.&lt;/p&gt;

&lt;p&gt;Uncertainty sampling can be used with a variety of scores (least confidence, margin, entropy…).&lt;br&gt;
For diversity sampling Lightly uses the coreset algorithm and &lt;a href="https://github.com/lightly-ai/lightly"&gt;embeddings obtained from its open-source self-supervised learning framework lightly&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, there is more.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Lightly has another algorithm for active learning called CORAL (&lt;strong&gt;COR&lt;/strong&gt;eset &lt;strong&gt;A&lt;/strong&gt;ctive &lt;strong&gt;L&lt;/strong&gt;earning) which uses a combination of diversity and uncertainty sampling.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The goal is to overcome the limitations of the individual methods by selecting images with low model confidence but at the same time making sure that they are visually distinct from each other.&lt;/p&gt;

&lt;p&gt;Let's see how we can make use of active learning and the Lightly Platform.&lt;/p&gt;
&lt;h2&gt;
  
  
  Embed and Upload your Dataset
&lt;/h2&gt;

&lt;p&gt;Let's start by creating embeddings and uploading the dataset to the Lightly Platform. We will use the embeddings later for the diversification part of the CORAL algorithm.&lt;/p&gt;

&lt;p&gt;You can easily train, embed, and upload a dataset using the &lt;a href="https://github.com/lightly-ai/lightly"&gt;lightly Python package&lt;/a&gt;. &lt;br&gt;
First, we need to install the package. We recommend using pip for this. Make sure you're in a Python3.6+ environment. If you're on Windows you should create a conda environment.&lt;/p&gt;

&lt;p&gt;Run the following command in your shell to install the latest version of lightly:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Now that we have lightly installed we can run the command line command &lt;code&gt;lightly-magic&lt;/code&gt; to train, embed, and upload our dataset. You need to pass a token and a dataset_id argument to the command. You find both in the Lightly Platform after creating a new dataset.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Once you ran the &lt;code&gt;lightly-magic&lt;/code&gt; CLI command you should see the uploaded dataset in the Lightly Platform. You can have a look at the 2d visualizations of your dataset. Do you spot the two clusters forming images of day and night?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--toA1AoUw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jdji8lzhx7zltg25jye2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--toA1AoUw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jdji8lzhx7zltg25jye2.gif" alt="2D visualization of the Comma10k dataset on the Lightly Platform"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Active Learning Workflow
&lt;/h2&gt;

&lt;p&gt;Now, that we have a dataset uploaded to the Lightly Platform with embeddings we can start with the active learning workflow. We are interested in the part where you have a trained model and are ready to run predictions on unlabeled data. We start by creating an &lt;code&gt;ActiveLearningAgent&lt;/code&gt;. This agent will help us managing the images which are unlabeled and makes sure we interface with the platform properly.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;In our case, we don't have a model yet. Let's load it from the disk and get it ready to run predictions on unlabeled data.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Finally, we can use our pre-trained model and run predictions on the unlabeled data. It's important that we use the same order of the individual files as we have on the Lightly Platform. We can simply do this by iterating over the &lt;code&gt;al_agent.query_set&lt;/code&gt;which contains a list of filenames in the right order.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;In order to upload the predictions, we need to turn them into scores. And since we're working on an "object detection" problem here we use the &lt;code&gt;ScorerObjectDetection&lt;/code&gt;.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;We're finally ready to query the first batch of images.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query the first 100 images for labeling
&lt;/h2&gt;

&lt;p&gt;To query data based on the model predictions and our embeddings on the Lightly Platform we can use the &lt;code&gt;.query(...)&lt;/code&gt; method of the agent. We can pass it to an &lt;code&gt;SamplerConfig&lt;/code&gt; object to describe the kind of sampling algorithm and its parameters we want to run.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;After querying the new 100 images we can simply access their filenames using the &lt;code&gt;added_set&lt;/code&gt; of the active learning agent.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Congratulations, you did your (first) active learning iteration! &lt;br&gt;
Now you can label the 100 images and train your model with them. &lt;br&gt;
Active learning is usually done in a continuous feedback loop. After training your model using the new data you would do another iteration and predict + select another batch of images for labeling.&lt;/p&gt;

&lt;p&gt;I hope you got a good idea of how you can use active learning for your next computer vision project. For more information check out the &lt;a href="https://docs.lightly.ai/tutorials/platform/tutorial_active_learning_detectron2.html"&gt;Active Learning using Detectron2 on Comma10k tutorial&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Igor Susmelj, co-founder &lt;br&gt;
&lt;a href="https://lightly.ai/"&gt;Lightly&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The original post has been published here: &lt;a href="https://www.lightly.ai/post/active-learning-using-detectron2"&gt;https://www.lightly.ai/post/active-learning-using-detectron2&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>deeplearning</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Advantage of Self-Supervised Learning
</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Sat, 06 Mar 2021 20:52:08 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/the-advantage-of-self-supervised-learning-2g03</link>
      <guid>https://forem.com/igorsusmelj/the-advantage-of-self-supervised-learning-2g03</guid>
      <description>&lt;p&gt;‍&lt;strong&gt;A few personal thoughts on why self-supervised learning will have a strong impact on AI. From recent NLP to computer vision papers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is not a prediction but rather a summary of personal findings and trends from research and industry.&lt;/p&gt;

&lt;p&gt;First, let’s discuss the difference between self-supervised learning and &lt;strong&gt;unsupervised learning&lt;/strong&gt;. Whether there actually is a difference between the two is still an open discussion.&lt;br&gt;
Unsupervised learning is the idea of models learning without any supervision. Clustering algorithms are often an example of unsupervised learning. There is no supervision or training involved on how clusters are formed (at least not for simple methods such as K-Means).&lt;br&gt;
In &lt;strong&gt;self-supervised learning&lt;/strong&gt;, we use the data itself as a label. We essentially turn unsupervised learning into supervised learning by leveraging something called a proxy task. A proxy task is different from the downstream or model task because we are not interested in the proxy itself.&lt;/p&gt;

&lt;p&gt;In NLP popular methods such as &lt;a href="https://arxiv.org/pdf/1810.04805.pdf"&gt;Googles BERT, 2019&lt;/a&gt; use a pre-training procedure where the model would predict missing words within a sentence or the next sentence based on the current sentence. We can create a sentence with a missing word by simply removing a single word from it. Now the ground truth information (our label) is the missing word. We can train the model in a self-supervised way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WECa-ytF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/00cju0hfh2ocm72comq6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WECa-ytF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/00cju0hfh2ocm72comq6.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;computer vision&lt;/strong&gt;, we can apply the very same technique to train a model. We take an image and remove part of it (we essentially color it with a single color). The task of the model is to predict the missing pixels (we call this image inpainting). Since we have access to the original image and the missing pixels (ground truth) we can train the model in a supervised way. The paper &lt;a href="https://openaccess.thecvf.com/content_cvpr_2016/papers/Pathak_Context_Encoders_Feature_CVPR_2016_paper.pdf"&gt;Context Encoders: Feature Learning by Inpainting, CVPR, 2016&lt;/a&gt; is an example of such a self-supervised training procedure using inpainting. Unfortunately, this approach in computer vision doesn’t work that well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tfnhc0Ty--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3rmdjosj3s2y6l06wpil.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tfnhc0Ty--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3rmdjosj3s2y6l06wpil.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Newer methods use image augmentations. A single image would go twice through an augmentation pipeline. We end up with two new versions of the original image (we call them views). If we do the same for multiple images we can train a model to find the pairs which belong to the same original image (before augmentation). We essentially learn the model to be invariant to whatever augmentation we choose.&lt;/p&gt;

&lt;p&gt;Now, let’s have a look at the advantages self-supervised learning can bring to the world of AI.&lt;/p&gt;

&lt;h1&gt;
  
  
  Lifelong Learning
&lt;/h1&gt;

&lt;p&gt;When we talk about AI we all think about some smart system learning over time and improving itself. Unfortunately, this is quite difficult. Supervised learning systems require new labels for new data to be trained on. Improving the systems require continuous re-labeling and re-training.&lt;/p&gt;

&lt;p&gt;However, using self-supervision we don’t require human labels anymore. There has been some great work into that direction from &lt;a href="http://people.eecs.berkeley.edu/~efros/"&gt;Alexey Efros Lab&lt;/a&gt; like the following paper using self-supervised learning for adapting to new environments in reinforcement learning: &lt;a href="https://arxiv.org/pdf/1705.05363.pdf"&gt;Curiosity-driven Exploration by Self-supervised Prediction, ICML, 2017&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Data Labeling
&lt;/h1&gt;

&lt;p&gt;Supervised learning requires ground-truth data. We call them labels or annotations and in domains such as computer vision, they are mostly generated by humans. A single label can cost between a few cents and up to multiple dollars. It all depends on how much time the annotation task takes and how much expertise is required. Whereas lots of people can draw a bounding box around a car and a pedestrian fewer can do the same for medical images.&lt;/p&gt;

&lt;p&gt;Self-supervised learning can help to reduce the required amount of labeling. On one hand, we can pre-train a model on unlabeled data and fine-tune it on a smaller labeled set. A popular example is &lt;a href="https://arxiv.org/pdf/2002.05709.pdf"&gt;A Simple Framework for Contrastive Learning of Visual Representations, ICML 2020&lt;/a&gt;. Btw. the last author of this paper is no one else than Turing Award winner Geoffrey Hinton. Another way to help with labeling efficiency is that we can use the obtained features from a self-supervised model to guide the selection process of which data to label. One approach is to simply pick the data samples which are diverse and not similar. We do this at &lt;a href="https://lightly.ai/"&gt;Lightly&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I hope you got an idea of how self-supervised learning works and why there is a good reason to be excited about it. If you’re interested in self-supervised learning in computer vision don’t forget to check out our &lt;a href="https://github.com/lightly-ai/lightly"&gt;open-source Python framework for self-supervised learning on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Igor, co-founder&lt;br&gt;
&lt;a href="https://lightly.ai/"&gt;Lightly.ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post has originally been published here: &lt;a href="https://www.lightly.ai/post/the-advantage-of-self-supervised-learning"&gt;https://www.lightly.ai/post/the-advantage-of-self-supervised-learning&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>ai</category>
    </item>
    <item>
      <title>Embedded COVID mask detection on an Arm Cortex-M7 processor using PyTorch</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Tue, 23 Feb 2021 19:26:00 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/embedded-covid-mask-detection-on-an-arm-cortex-m7-processor-using-pytorch-b4d</link>
      <guid>https://forem.com/igorsusmelj/embedded-covid-mask-detection-on-an-arm-cortex-m7-processor-using-pytorch-b4d</guid>
      <description>&lt;p&gt;&lt;strong&gt;How we built a visual COVID-19 mask quality inspection prototype running on-device on an OpenMV-H7 board and the challenges on the way.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TLDR; The source code to train and deploy your own image classifier can be found here: &lt;a href="https://github.com/ARM-software/EndpointAI/tree/master/ProofOfConcepts/Vision/OpenMvMaskDefaults"&gt;https://github.com/ARM-software/EndpointAI/tree/master/ProofOfConcepts/Vision/OpenMvMaskDefaults&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the summer of 2020, we worked with Arm to build an easy-to-use tutorial on how to train and deploy an image classifier on an Arm microcontroller. In this post, we show how we approached and solved the following challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Convert a PyTorch ResNet to TensorFlow and quantize it to use 8-bit integer values&lt;/li&gt;
&lt;li&gt;Collect, select, and annotate data of faulty and non-faulty masks&lt;/li&gt;
&lt;li&gt;Use self-supervised pre-training to boost model performance when working on fewer images.
‍&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  The Results to Expect
&lt;/h1&gt;

&lt;p&gt;The goal of this project was to show an end-to-end workflow on how to train and deploy a convolutional neural network to an OpenMV-H7 board.&lt;/p&gt;

&lt;p&gt;The video below showcases how our classifier detects faulty masks in real-time.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/ba1c1JkBnNc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  The OpenMV-H7 Board
&lt;/h1&gt;

&lt;p&gt;The board consists of an &lt;a href="https://www.st.com/en/microcontrollers/stm32h743vi.html"&gt;STM32H743VI&lt;/a&gt; Arm Cortex-M7 processor running at 480MHz, multiple peripherals, and a camera module mounted on it.&lt;br&gt;
The camera module has an &lt;a href="http://www.ovt.com/products/sensor.php?id=80"&gt;OV7725&lt;/a&gt; sensor from OmniVision and can record in VGA resolution (640x480) at 75 FPS.&lt;/p&gt;

&lt;p&gt;Since the board has limited computing power and memory, we aimed for a very small deep learning model. We call the variant ResNet-9 since it’s more of a cut in half ResNet-18 variant. Below you can find some numbers about the model configuration, runtime, and other metrics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input size:&lt;/strong&gt; 64x64x3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU Freq.:&lt;/strong&gt; 480 MHz&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations:&lt;/strong&gt; 33.4 MOp&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model size:&lt;/strong&gt; 90 kBytes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inference Time:&lt;/strong&gt; 150 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations/s:&lt;/strong&gt; 249 MOp/s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Detailed specs can be found on the official website of OpenMV here.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--E-kSGCiz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/548pfoiw8rahqtcszi8l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--E-kSGCiz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/548pfoiw8rahqtcszi8l.png" alt="image"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;A close-up picture of the OpenMV H7 Board we used.&lt;/p&gt;
&lt;h1&gt;
  
  
  Data Collection
&lt;/h1&gt;

&lt;p&gt;Neural networks are very data-hungry. In order to efficiently collect enough training data we did the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We used the camera on the OpenMV-H7 board to record video sequences. With the USB interface and the OpenMV IDE, we were able to easily record the camera stream and save it as a video file.&lt;/li&gt;
&lt;li&gt;To simulate a real production line we mounted the camera on cardboard to make sure the camera is stable. The optics point to the production line which is a metal plate with tall borders. This setup ensures, that the camera sees defect and non-defect masks within the same environment.&lt;/li&gt;
&lt;li&gt;Finally, we moved masks through our inspection line using a combination of push and pull.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eoyLtJnz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/esb13mkd6v8karb5bde7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eoyLtJnz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/esb13mkd6v8karb5bde7.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A picture of our data collection pipeline. We cut a small hole into the cardboard to clamp the USB table holding the board into it.&lt;/p&gt;
&lt;h1&gt;
  
  
  Data Selection and Annotation
&lt;/h1&gt;

&lt;p&gt;At this stage we have multiple video files, each having captured a few minutes. The next challenge is to extract the frames and annotate the data. We use FFmpeg for the frame extraction and Lightly to select a diverse set of frames. Note that we had more than 20k frames but no time to annotate all of them. Using Lightly we selected a few hundred frames covering all relevant scenarios.&lt;br&gt;
Lightly uses self-supervised learning to get good representations of the images. It then uses these representations to select the most interesting images which should be annotated. The benefit of this method is that we can access the pre-trained model and fine-tune it on only a handful of labeled images.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tiK7a2OP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/brzh8kubztkf945jvrl4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tiK7a2OP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/brzh8kubztkf945jvrl4.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example images taken with the OpenMV H7 camera showing the three labels for the data. From left to right: good mask, defect mask, no mask.&lt;/p&gt;
&lt;h3&gt;
  
  
  Model Fine-Tuning
&lt;/h3&gt;

&lt;p&gt;To prevent the model from overfitting, we simply froze the pre-trained backbone and added a linear classification head to the model. We then trained the classifier for 100 epochs on a total of 500 annotated images.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;
&lt;h1&gt;
  
  
  From PyTorch to Keras to TensorFlow Lite
&lt;/h1&gt;

&lt;p&gt;Moving the pre-trained PyTorch model to TensorFlow Lite turned out to be the most difficult part of our endeavor.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;We tried out several tricks with ONNX to export our model. A simple library called pytorch2keras worked fine for a model only consisting of linear layers but not for our conv + linear model.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The main problem we encountered, was that PyTorch uses the CxHxW (channel, height, width) format for tensors whereas TensorFlow uses HxWxC. This meant that, after transforming our model to TensorFlow Lite, the output of the layer just before the classifier was permuted, and hence, the output of the classifier was incorrect. In order to address this problem, we considered manually permuting the weights of the linear classifier.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;However, we decided to go for a simpler solution. We pooled the output of the last convolutional layer into a Cx1x1 shape. That way, changing the order of the channels does not affect the output of the neural network.&lt;/p&gt;

&lt;p&gt;The final step is to quantize and export the Keras model to TensorFlow Lite. In our case quantization reduces the model size and speeds up running the model in inference at the cost of a few percent lower accuracy.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Special thanks to our collaborators at Arm and &lt;a href="https://medium.com/u/297b71d663f3"&gt;Philipp Wirth&lt;/a&gt; from Lightly for making this project possible. The &lt;a href="https://github.com/ARM-software/EndpointAI/tree/master/ProofOfConcepts/Vision/OpenMvMaskDefaults"&gt;full source code&lt;/a&gt; is available here. You can easily train your own classifier and run it on an embedded device. Feel free to reach out or leave a comment if you have any questions!&lt;/p&gt;

&lt;p&gt;‍&lt;br&gt;
Igor, co-founder&lt;br&gt;
&lt;a href="https://lightly.ai"&gt;Lightly.ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The original post was published here: &lt;a href="https://lightly.ai/post/embedded-covid-mask-detection-on-an-arm-m7-using-pytorch"&gt;https://lightly.ai/post/embedded-covid-mask-detection-on-an-arm-m7-using-pytorch&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>ai</category>
    </item>
    <item>
      <title>Few-Shot Learning with fast.ai</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Fri, 28 Aug 2020 18:12:23 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/few-shot-learning-with-fast-ai-1po9</link>
      <guid>https://forem.com/igorsusmelj/few-shot-learning-with-fast-ai-1po9</guid>
      <description>&lt;p&gt;In few-shot learning, we train a model using only a few labeled examples. Learn how to train your classifier using transfer learning and a novel framework for sample selection.&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Lately, posts and tutorials about new deep learning architectures and training strategies have dominated the community. However, one very interesting research area, namely few-shot learning, is not getting the attention it deserves. If we want widespread adoption of ML we need to find ways to train them efficiently, with little data and code. In this tutorial, we will go through a Google Colab Notebook to train an image classification model using only 5 labeled samples per class. Using only 5 exemplary samples is also called 5-shot learning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S4wYhTkE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/1017/1%2AaG6UvkQordnAzMN0OCJbPw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S4wYhTkE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/1017/1%2AaG6UvkQordnAzMN0OCJbPw.png" alt="Image showing top losses of our trained classifier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Don’t forget to check out our Google Colab Notebook for the full code of this tutorial!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Frameworks and libraries we use
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Jupyter Notebook (Google Colab)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://colab.research.google.com/drive/11j4aJv50UxEVZbuUluOGqCMKIZrx7pUM?usp=sharing"&gt;The full code of this tutorial will be provided as a notebook&lt;/a&gt;. Jupyter Notebooks are python programming environments accessible by web browsers and are very useful for fast prototyping and experiments. Colab is a service from Google where you get access to notebooks running on instances for free.&lt;/p&gt;

&lt;h1&gt;
  
  
  Fast.ai
&lt;/h1&gt;

&lt;p&gt;Training a deep learning model can be quite complicated and involve 100s of lines of code. This is where &lt;a href="https://www.fast.ai/"&gt;fast.ai&lt;/a&gt; comes to the rescue. A library developed by former Kaggler Jeremy Howard specifically aimed to make training deep learning models fast and simple. Using fast.ai we can train and evaluate our classifier with just a few lines of code. Under the hood, fast.ai is using the PyTorch framework.&lt;/p&gt;

&lt;h1&gt;
  
  
  WhatToLabel and borisml
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://www.whattolabel.com/"&gt;WhatToLabel&lt;/a&gt; and it’s python package &lt;a href="https://pypi.org/project/borisml/"&gt;borisml&lt;/a&gt; aim to solve the question which samples you should work with. If you only label a few samples out of your dataset one of the key questions arising is how do you pick the samples? WhatToLabel aims at solving exactly this problem by providing you with different methods and metrics for selecting your samples&lt;/p&gt;

&lt;h1&gt;
  
  
  Setup your Notebook
&lt;/h1&gt;

&lt;p&gt;We start by installing the necessary dependencies and downloading the dataset. You can run any shell command in a notebook by start the code with an &lt;strong&gt;“!”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;E.g. to install our dependencies we can run the following code within a notebook cell:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;In this tutorial, we work with a dataset consisting of cats and dogs images. You can download it from &lt;a href="https://www.kaggle.com/"&gt;Kaggle&lt;/a&gt; using the fastai CLI (command-line interface) by running the following command. Note that you need to adapt the token you get from Kaggle:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h1&gt;
  
  
  Select the samples for few-shot learning
&lt;/h1&gt;

&lt;p&gt;In order to get robust results with our few-shot learning algorithm, we want our training set to cover the full space of samples. That means we don’t want lots of similar examples but rather select a very diverse set of images. To achieve this we can create an embedding of our dataset followed by a sampling method called coreset[1] sampling. Coreset sampling ensures that we build up our dataset by only adding samples which lie furthest apart from the existing set as possible.&lt;/p&gt;

&lt;p&gt;Now we will use &lt;a href="https://www.whattolabel.com/"&gt;WhatToLabel&lt;/a&gt; and its python package &lt;a href="https://pypi.org/project/borisml/"&gt;borisml&lt;/a&gt; to select the most diverse 10 samples we want to work with. We first need to create an embedding. Borisml allows us to do this without any labels by leveraging recent success in self-supervised learning. We can simply run the following command to train the model for a few epochs and create our embedding:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Finally, we need to upload our dataset and embedding to the &lt;a href="https://app.whattolabel.com/"&gt;WhatToLabel app&lt;/a&gt; to run our selection algorithm. Since we don’t want to upload the images we can tell the CLI to consider only metadata of the samples.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Once the data and embedding are uploaded we can go back to the web platform and run our sampling algorithm. This might take a minute to complete. If everything went smooth you should see a plot with a slider. Move the slider to the left to only keep 10 samples in the new subset. Hint: You can use the arrow keys to move the slider one by one. Once we have our 10 samples selected we need to create a new tag (left menu). For this tutorial, we use “tiny” as the name and press the enter key to create it.&lt;/p&gt;

&lt;p&gt;Download the newly created subset using the following CLI command:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;You might see that the dataset you downloaded is not perfectly balanced. E.g. you might have 4 images of cats and 6 of dogs. This is due to the algorithm we chose for selecting the samples. Our goal was to cover the whole embedding/ feature space. It might very well be that there are more similar images of cats in our dataset than images of dogs. As a result, more images of dogs than cats will be selected.&lt;/p&gt;

&lt;h1&gt;
  
  
  Train our model using fast.ai
&lt;/h1&gt;

&lt;p&gt;If you reach this point you should have a dataset we obtained using WhatToLabel and Coreset sampling ready to be used to train our classifier.&lt;/p&gt;

&lt;p&gt;Fast.ai requires only a few lines of code to train an image classifier. We first need to create a dataset and then a learner object. Finally, we train the model using the &lt;code&gt;.fit(...)&lt;/code&gt; method.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h1&gt;
  
  
  Interpreting the results
&lt;/h1&gt;

&lt;p&gt;To evaluate our model we use the test set of the cats and dogs dataset consisting of 2'000 images. Looking at the confusion matrix we see that our model mostly struggles with predicting dogs as being cats.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y0QLu3qJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/417/1%2AkhfUjAkLFaJxeFqmt799Vg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y0QLu3qJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/417/1%2AkhfUjAkLFaJxeFqmt799Vg.png" alt="Image showing confusion matrix of our trained model"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fast.ai also helps us here getting interpretable performance plots of our model with just a few lines of code.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The library allows us also to look at the images from the test set with the highest loss of the trained model. You see that the model struggles with smaller dogs looking more similar to cats. We could improve accuracy by selecting more samples for the training routine. However, the goal of this tutorial was to show that by leveraging transfer learning and a smart data selection process you can get already high accuracy (&amp;gt;80%) with just a handful of training data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S4wYhTkE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/1017/1%2AaG6UvkQordnAzMN0OCJbPw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S4wYhTkE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/1017/1%2AaG6UvkQordnAzMN0OCJbPw.png" alt="Image showing top losses of our trained classifier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this brief guide on how to use few-shot learning using fast.ai and WhatToLabel. Follow me for further tutorials on Medium!&lt;/p&gt;

&lt;p&gt;Igor, co-founder&lt;br&gt;
&lt;a href="https://www.whattolabel.com/"&gt;whattolabel.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[1] Ozean S., (2017), Active Learning for Convolutional Neural Networks: A Core-Set Approach&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Rotoscoping: Hollywood's video data segmentation?</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Fri, 15 May 2020 15:32:26 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/rotoscoping-hollywood-s-video-data-segmentation-2ckn</link>
      <guid>https://forem.com/igorsusmelj/rotoscoping-hollywood-s-video-data-segmentation-2ckn</guid>
      <description>&lt;h4&gt;
  
  
  In Hollywood, video data segmentation has been done for decades. Simple tricks such as color keying with green screens can reduce work significantly.
&lt;/h4&gt;

&lt;p&gt;In late 2018 we worked on a video segmentation toolbox. One of the common problems in video editing is oversaturated or too bright sky when shooting a scene. Most skies in movies have been replaced by VFX specialists. The task is called “sky replacement”. We thought this is the perfect starting point for introducing automatic segmentation to mask the sky for further replacement. Based on the gathered experience I will explain similarities in VFX and data annotation.&lt;/p&gt;

&lt;p&gt;You find a comparison of our solution we built compared to Deeplab v3+ which was at the time considered the best image segmentation model. Our method (left) produced better details around the buildings as well as reduced the flickering significantly between the frames.&lt;/p&gt;

&lt;p&gt;Comparison of our sky segmentation model and Deeplab v3+&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/zi0qLwqqx28"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Video segmentation techniques of Hollywood
&lt;/h3&gt;

&lt;p&gt;In this section, we will have a closer look at color keying with for example green screens and rotoscoping.&lt;/p&gt;

&lt;h4&gt;
  
  
  What is color keying?
&lt;/h4&gt;

&lt;p&gt;I’m pretty sure you heard about color keying or green screens. Maybe you even used such tricks yourself when editing a video using a tool such as Adobe After Effects, Nuke, Final Cut, or any other software.&lt;/p&gt;

&lt;p&gt;I did a lot of video editing myself in my childhood. Making videos for fun with friends and adding cool effects using tools such as after-effects. Watching tutorials from &lt;a href="https://www.videocopilot.net/"&gt;videocopilot.com&lt;/a&gt; and &lt;a href="https://creativecow.com/"&gt;creativecow.com&lt;/a&gt; day and night. I remember playing with a friend and wooden sticks in the backyard of my family’s house just to replace them with lightsabers hours later.&lt;/p&gt;

&lt;p&gt;In case you don’t know how a green screen works you find a video below giving you a better explanation than I could do with words.&lt;/p&gt;

&lt;p&gt;Video explaining how a green screen works&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/A0h_BVLRSeI"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Essentially, a greenscreen is using color keying. The color “green” from the footage gets masked. This mask can be used to blend-in another background. And the beauty is, we don’t need a fancy image segmentation model burning your GPU but a rather simple algorithm looking for neighboring pixels with the desired color to mask.&lt;/p&gt;

&lt;h4&gt;
  
  
  What is rotoscoping?
&lt;/h4&gt;

&lt;p&gt;As you can imagine in many Hollywood movies special effects require more complex scenes than the ones where you can simply use a colored background to mask elements. Imagine a scene with animals that might be shy of strong color or a scene with lots of hair flowing in the wind. A simple color keying approach isn’t enough.&lt;/p&gt;

&lt;p&gt;But also for this problem, Hollywood found a technique many years ago: &lt;strong&gt;Rotoscoping&lt;/strong&gt;.&lt;br&gt;
To give you a better idea of what rotoscoping is I embedded a video below. The video is a tutorial on how to do rotoscoping using after effects. Using a special toolbox you can draw splines and polygons around objects throughout a video. The toolbox allows for automatic interpolation between the frames saving you lots of time.&lt;/p&gt;

&lt;p&gt;After effects tutorial on rotoscoping&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/ZqAyS2AMvG4"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;This technology, introduced in After Effects, 2003 has been out there for almost two decades and has since then been used by many VFX specialists and freelancers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silhouette&lt;/strong&gt; is in contrast to After Effects one tool focusing solely on rotoscoping. You get an idea of their latest product updates in &lt;a href="https://www.youtube.com/watch?v=NwbbHFO8Rl0"&gt;this video&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I picked one example for you to show how detailed the result of rotoscoping can be. The three elements in the following video from MPC Academy blowing my mind are &lt;strong&gt;motion blur, fine-grained details for hairs, and the frame consistency&lt;/strong&gt;. When we worked on a product for VFX editors we learned that in this industry the quality requirement is beyond what we have in image segmentation. There is simply neither a dataset nor a model in computer vision fulfilling the Hollywood standard.&lt;/p&gt;

&lt;p&gt;Rotoscoping demo reel from MPC Academy.&lt;br&gt;
Search for “roto showreel” on YouTube and you will find many more examples.&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/PQS9ov636ik"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  How is VFX rotoscoping different from semantic segmentation?
&lt;/h3&gt;

&lt;p&gt;There are &lt;strong&gt;differences in both quality and how the quality assurance/ inspection works.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Tools and workflow comparison
&lt;/h4&gt;

&lt;p&gt;The tools and workflow in VFX, as well as data annotation, are surprisingly similar to each other. Since both serve a similar goal. Rotoscoping, as well as professional annotation tools, support tracking of objects, working with polygons and splines. Both allow for changing brightness and contrast to help you finding edges. One of the key differences is that in rotoscoping you work with transparency for motion blur or hair. In segmentation, we usually have a defined number of classes and no interpolation between them.&lt;/p&gt;

&lt;h4&gt;
  
  
  Quality inspection comparison
&lt;/h4&gt;

&lt;p&gt;In data annotation quality inspection is usually automated using a simple trick. We let multiple people do the annotation and can compare their results. If all annotators agree the confidence is high and therefore the annotation is considered good. In case they only partially agree and the agreement is below a certain threshold an additional round of annotation or manual inspection takes place.&lt;br&gt;
In VFX however, an annotation is usually done by a single person. The person has been trained on the task and has to deliver very high quality. The customer or supervisor lets the annotator redo the work if the quality is not good enough. There is no automatic obtained metric. All inspection is done manually using the trained eye of VFX experts. There is a term called &lt;a href="https://www.urbandictionary.com/define.php?term=pixel-fucking"&gt;“pixel fucking”&lt;/a&gt; illustrating the required perfectionism on a pixel level.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we trained our model for sky segmentation
&lt;/h3&gt;

&lt;p&gt;Let’s get back to our model. In the beginning, you saw a comparison between our result and &lt;a href="https://arxiv.org/abs/1802.02611"&gt;Deeplab v3+, 2018&lt;/a&gt;. You will notice that the quality of our video data segmentation is higher and has less flickering. For high-quality segmentation, we had to create our own dataset. We used Full HD cameras mounted on tripods to record footage of the sky. This way a detailed segmentation around buildings and static objects can be reused throughout the whole shot. We used &lt;a href="https://www.foundry.com/products/nuke"&gt;Nuke&lt;/a&gt; for creating the annotated data.&lt;/p&gt;

&lt;p&gt;Image showing the soft contours using for rotoscoping.&lt;br&gt;
We blurred the edges around the skyline.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bAZrhmYO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/zoom.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bAZrhmYO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/zoom.jpg" alt="video data segmentation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Additionally, we used publicly available and license-free videos of trees, people, and other moving elements in front of simple backgrounds. To obtain ground truth information we simply used color keying. It worked like charm and we had pixel-accurate segmentation of 5 min shots within a few hours. For additional diversity within the samples, we used our video editing tool to crop out parts of the videos while moving the camera around. A 4k original video had a Full HD frame moving around with smooth motion. For some shots, we even broke out of the typical binary classification and used smooth edges, interpolated between full black and white, for our masks. Usually, segmentation is always binary, black or white. We had 255 colors in between when the scene was blurry.&lt;/p&gt;

&lt;p&gt;Color keying allowed us to get ground truth data for complicated scenes such as leaves or hair. The following picture of a palm tree has been masked/ labeled using simple color keying.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1OOx3cL3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/city-colored.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1OOx3cL3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/city-colored.gif" alt="Recording of city"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eSu1gjBX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/city-masked.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eSu1gjBX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/city-masked.gif" alt="Recording of city masked"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ajWWMPGk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/palm-tree.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ajWWMPGk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/palm-tree.jpg" alt="Palm tree segmentation using color keying"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For simple scenes, color keying was more than good enough to get detailed results. One could also replace now the background with a new one to augment the data.&lt;/p&gt;

&lt;p&gt;This worked for all kinds of trees. And even helped us obtain good results for a whole video. We were able to simply adapt the color keying parameters during the clip.&lt;/p&gt;

&lt;p&gt;Also, this frame has been taken using simple color keying methods&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jzWT2kqK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/tree.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jzWT2kqK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/tree.jpg" alt="Tree segmentation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To give you an idea of the temporal results of our color keying experiments have a look at the gif below. Note there is a little bittering. We added this on purpose to “simulate” recording with a camera in your hand. The movement of the camera itself is a simple linear interpolation of the crop on the whole scene. So what you see below is just a crop of the full view.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZllhSHu2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/trees-masked.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZllhSHu2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://data-annotation.com/wp-content/uploads/2020/04/trees-masked.gif" alt="tree segmentation animation"&gt;&lt;/a&gt;&lt;br&gt;
This mask has been obtained using color keying for the first frame. The subsequent frames might only need a small modification of the color keying parameters. We did such adaptation every 30-50 frames and let the tool interpolate the parameters between them.&lt;/p&gt;

&lt;h4&gt;
  
  
  Training the model
&lt;/h4&gt;

&lt;p&gt;To train the model we added an additional loss on the pixels close to the borders. This helped a lot to improve the fine-details. We played around with various parameters and changing the architecture. The simple U-Net model worked well enough. We trained the model not on the full images but on crops of around 512×512 pixels. We also read up on Kaggle competitions such as the &lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge"&gt;caravan image masking challenge from 2017&lt;/a&gt; for additional inspiration.&lt;/p&gt;

&lt;h4&gt;
  
  
  Adversarial training for temporal consistency
&lt;/h4&gt;

&lt;p&gt;Now that we had our dataset we started training the segmentation model. For the model, we used a &lt;a href="https://arxiv.org/pdf/1505.04597.pdf"&gt;U-Net architecture&lt;/a&gt;, since the sky can span the whole image and we don’t need to consider various sizes as we would need to for objects.&lt;/p&gt;

&lt;p&gt;In order to improve the temporal consistency of the model (e.g. removing the flickering) we co-trained a discriminator which always saw three sequential frames. The discriminator had to distinguish three frames coming from our model or the dataset. The training procedure was otherwise quite simple. The model trained for only a day on an Nvidia GTX 1080Ti.&lt;/p&gt;

&lt;p&gt;So for your next video data segmentation project, you might want to have a look at whether you can use any of these tricks to collect data and save lots of time. In my other posts, you will find a list of &lt;a href="https://data-annotation.com/tools-and-frameworks/"&gt;data annotation tools&lt;/a&gt;. In case you don’t want to spend any time on manual annotation there is also a list of &lt;a href="https://data-annotation.com/list-of-data-annotation-companies/"&gt;data annotation companies&lt;/a&gt; available.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I’d like to thank Momo and Heiki who worked on the project with me. An additional thank goes to all the VFX artists and studios for their feedback and fruitful discussions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Note: This post was originally published on &lt;a href="//data-annotation.com"&gt;data-annotation.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Curated List of Data Annotation Companies</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Sat, 04 Apr 2020 15:32:07 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/curated-list-of-data-annotation-companies-2gip</link>
      <guid>https://forem.com/igorsusmelj/curated-list-of-data-annotation-companies-2gip</guid>
      <description>&lt;p&gt;After sharing a list of tools and frameworks around data annotation I decided to also collect and maintain a &lt;a href="https://data-annotation.com/list-of-data-annotation-companies/"&gt;list of data annotation companies and service providers&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Quite often we don't want to spend hours in classifying images, drawing bounding boxes or segmentation maps. Since there are plenty of companies focusing on outsourcing these cumbersome tasks it often makes sense as an ML engineer to use these services and spend your own time on model training and optimization.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>dataannotation</category>
    </item>
    <item>
      <title>Data Annotation Tools and Frameworks</title>
      <dc:creator>IgorSusmelj</dc:creator>
      <pubDate>Mon, 02 Mar 2020 18:34:38 +0000</pubDate>
      <link>https://forem.com/igorsusmelj/data-annotation-tools-and-frameworks-g43</link>
      <guid>https://forem.com/igorsusmelj/data-annotation-tools-and-frameworks-g43</guid>
      <description>&lt;p&gt;I started creating my own &lt;a href="https://data-annotation.com/tools-and-frameworks/"&gt;list of data annotation tools&lt;/a&gt; and frameworks for my machine learning projects. I work a lot with computer vision data and used to build my own little annotation tools. However, there are plenty of open-source tools available!&lt;/p&gt;

&lt;p&gt;I thought it would be helpful for some of you!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vLwQ3dmd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/8ks1l1ef5ux1qgj1y2z7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vLwQ3dmd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/8ks1l1ef5ux1qgj1y2z7.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I will start blogging more about my deep learning related projects here. So stay tuned for more interesting content!&lt;/p&gt;

</description>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
