<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Philipp Muens</title>
    <description>The latest articles on Forem by Philipp Muens (@pmuens).</description>
    <link>https://forem.com/pmuens</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F239299%2Fabf7f37a-9367-48c9-b40c-43a7e9118832.jpeg</url>
      <title>Forem: Philipp Muens</title>
      <link>https://forem.com/pmuens</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pmuens"/>
    <language>en</language>
    <item>
      <title>Swift for modern Machine Learning</title>
      <dc:creator>Philipp Muens</dc:creator>
      <pubDate>Thu, 13 Feb 2020 13:31:03 +0000</pubDate>
      <link>https://forem.com/pmuens/swift-for-modern-machine-learning-1cg5</link>
      <guid>https://forem.com/pmuens/swift-for-modern-machine-learning-1cg5</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7KASPcsA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/02/amanda-sandlin-jIdKrtJF8Uk-unsplash-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7KASPcsA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/02/amanda-sandlin-jIdKrtJF8Uk-unsplash-1.jpg" alt="Swift for modern Machine Learning"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Note&lt;/strong&gt;&lt;/u&gt;: In this post we'll compare and contrast different programming languages. Everything discussed should be taken with a grain of salt. There's no single programming language which solves all the problems in an elegant and performant way. Every language has its up- and downsides. Swift is no exception.&lt;/p&gt;

&lt;p&gt;Entering the Data Science and Machine Learning world there are various programming languages and tools to choose from. There's &lt;a href="https://www.mathworks.com/products/matlab.html"&gt;MATLAB&lt;/a&gt;, a commercial programming environment which is used across &lt;a href="https://mathworks.com/solutions.html?s_tid=gn_sol#industries"&gt;different industries&lt;/a&gt; and usually the tool of choice for practicioners with a heavy Math background. A free and Open Source alternative is the &lt;a href="https://www.r-project.org"&gt;R&lt;/a&gt; project, a programming language created in 1993 to simplify statistical data processing. People working with R usually report enjoyment as R is "hackable" and comes bundled with different math modules and plotting libraries.&lt;/p&gt;

&lt;p&gt;A more recent incarnation is the &lt;a href="https://julialang.org"&gt;Julia&lt;/a&gt; scientific programming language which was &lt;a href="http://news.mit.edu/2018/mit-developed-julia-programming-language-debuts-juliacon-0827"&gt;created at MIT&lt;/a&gt; to resolve the issues older tools such as MATLAB and R struggled with. Julia cleverly incorporates modern engineering efforts from the fields of &lt;a href="https://llvm.org"&gt;compiler construction&lt;/a&gt; and &lt;a href="https://docs.julialang.org/en/v1/manual/parallel-computing/"&gt;parallel computing&lt;/a&gt; and given its &lt;a href="https://github.com/JuliaLang/julia"&gt;Open Source&lt;/a&gt; nature it has gained a lot of industry-wide adoption when it reached &lt;a href="https://julialang.org/blog/2018/08/one-point-zero/"&gt;&lt;code&gt;v1&lt;/code&gt; maturity&lt;/a&gt; in 2018.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python as the de-facto standard
&lt;/h2&gt;

&lt;p&gt;If you're doing some more research to find the most used programming language in Data Science and Machine Learning you might be surprised to see a language which wasn't built from the ground up for scientific computing: &lt;strong&gt;&lt;u&gt;Python&lt;/u&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.python.org"&gt;Python&lt;/a&gt; programming language was created by &lt;a href="https://en.wikipedia.org/wiki/Guido_van_Rossum"&gt;Guido van Rossum&lt;/a&gt; in 1989 to help &lt;a href="https://www.youtube.com/watch?v=J0Aq44Pze-w"&gt;bridge the gap between Bash scripting and C programming&lt;/a&gt;. Since then Python took the world by storm mainly due to its flat learning curve, its expressiveness and its powerful standard library which makes it possible to focus on the core problems rather than reinveing the wheel over and over again.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Funny tangent&lt;/strong&gt;&lt;/u&gt;: Open up a shell, run the Python interpreter via &lt;code&gt;python&lt;/code&gt; and enter &lt;code&gt;import this&lt;/code&gt; or &lt;code&gt;import antigravity&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Python is a general purpose programming language which was never designed to solve problems in a niche subject matter. Are you an Instagram user? &lt;a href="https://instagram-engineering.com/tagged/python"&gt;They're running Python&lt;/a&gt;. Do you curate content via Pinterest? &lt;a href="https://www.quora.com/What-is-the-technology-stack-behind-Pinterest-1"&gt;They're running Python&lt;/a&gt;. Do you store your data via Dropbox? They've &lt;a href="https://www.quora.com/What-technology-stack-does-Dropbox-use"&gt;developed their MVP in Python&lt;/a&gt; and &lt;a href="https://blogs.dropbox.com/tech/tag/python/"&gt;still use it today&lt;/a&gt;. Even &lt;a href="https://google.com"&gt;Google&lt;/a&gt; (then called BackRub) started out with &lt;a href="https://web.archive.org/web/19971210065425/http://backrub.stanford.edu/backrub.html"&gt;Python and Java&lt;/a&gt;. The list goes &lt;a href="https://en.wikipedia.org/wiki/List_of_Python_software"&gt;on and on&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Given such an industry-wide adoption it's easy to see why a lot of care and effort were put into the ecosystem of reusable packages as well as the language itself. No matter what use case you're working on, chances are that there are numerous &lt;a href="https://pypi.org"&gt;Python packages&lt;/a&gt; helping you solve your problems.&lt;/p&gt;

&lt;p&gt;While more famous Python projects include Web Frameworks such as &lt;a href="https://www.djangoproject.com"&gt;Django&lt;/a&gt; or &lt;a href="https://palletsprojects.com/p/flask/"&gt;Flask&lt;/a&gt; there are a also lot of mature &lt;a href="https://www.scipy.org"&gt;Scientific&lt;/a&gt; and &lt;a href="https://scikit-learn.org/stable/"&gt;Machine Learning&lt;/a&gt; implementations written in Python. Having access to such a robust foundation it only makes sense that modern Deep Learning frameworks such as &lt;a href="https://www.tensorflow.org"&gt;TensorFlow&lt;/a&gt; or &lt;a href="https://pytorch.org"&gt;PyTorch&lt;/a&gt; are also leveraging those libraries under the covers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hitting the limits
&lt;/h2&gt;

&lt;p&gt;All of the things discussed so far sound great. Python, a general purpose programming language which has quite a few years of existence under its belt is used across industries in mission critical software systems. Over the course of 30 years a vibrant Open Source community emerged which develops and maintains powerful libraries used by millions of users on a daily basis.&lt;/p&gt;

&lt;p&gt;Why bother and replace Python? If it ain't broke, don't fix it!&lt;/p&gt;

&lt;p&gt;Technology is constantly improving. What was once &lt;a href="https://www.youtube.com/watch?v=l9RWTMNnvi4"&gt;unthinkable&lt;/a&gt; might all of the sudden be possible thanks to breakthroughs in &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning"&gt;Hard&lt;/a&gt;- and &lt;a href="https://ieeexplore.ieee.org/document/1575717"&gt;Software&lt;/a&gt; development. Python was created in a different era with a different purpose. It was never engineered to directly interface with hardware or run complex computations across a fleet of distributed machines.&lt;/p&gt;

&lt;p&gt;A modern Deep Learning Framework such as &lt;a href="https://www.tensorflow.org"&gt;TensorFlow&lt;/a&gt; uses dozens of programming languages behind the scenes. The core of such a library might be written in high-performance &lt;a href="https://github.com/tensorflow/tensorflow/search?l=c%2B%2B"&gt;C++&lt;/a&gt; which occasionally interfaces with different &lt;a href="https://en.wikipedia.org/wiki/C_(programming_language)"&gt;C&lt;/a&gt; libraries, &lt;a href="https://en.wikipedia.org/wiki/Fortran"&gt;Fortran&lt;/a&gt; programs or even parts of &lt;a href="https://en.wikipedia.org/wiki/Assembly_language"&gt;Assembly language&lt;/a&gt; to squeeze out every bit of performance possible. A Python interface is usually built on top of the C++ core to expose a simple public API Data Scientists and Deep Learning enthusiasts use.&lt;/p&gt;

&lt;p&gt;Why isn't Python used throughout the whole stack?&lt;/p&gt;

&lt;p&gt;The answer to this question is rather involved but the gist of it is that Pythons language design is more tailored towards high level programming. Furthermore it's just &lt;a href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/which-programs-are-fastest.html"&gt;not fast enough&lt;/a&gt; to be used at the lower layers.&lt;/p&gt;

&lt;p&gt;The following is an incomplete list of Pythons (subjective) shortcomings:&lt;/p&gt;

&lt;h3&gt;
  
  
  Speed
&lt;/h3&gt;

&lt;p&gt;Python code usually runs an order of magnitude slower compared to other interpreted and compiled languages. Language implementations such as &lt;a href="https://cython.org"&gt;Cython&lt;/a&gt; which compile Python code to raw C try to mitigate this problem but they come with other issues (e.g. language inconsistencies, compatibility problems, ...).&lt;/p&gt;

&lt;h3&gt;
  
  
  Parallel Processing
&lt;/h3&gt;

&lt;p&gt;It's not that straightforward to write Python code which reliably performs parallel processing tasks on multiple cores or even multiple machines. Deep Neural Networks can be &lt;a href="https://www.tensorflow.org/tensorboard/graphs"&gt;expressed as graphs&lt;/a&gt; on which Tensor computations are carried out, making it a prime use case for parallel processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware integration
&lt;/h3&gt;

&lt;p&gt;Python is a high level language with lots of useful abstractions which unfortunately get in the way when trying to directly interface with the computers underlying hardware. Because of that heavy GPU computations are usually moved into lower-level code written in e.g. &lt;a href="https://en.wikipedia.org/wiki/C_(programming_language)"&gt;C&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/CUDA"&gt;CUDA&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interpreted rather than compiled
&lt;/h3&gt;

&lt;p&gt;Since it's a scripting language at its core, Python comes with its own runtime that evaluates the script line-by-line as it runs it. A process called "interpretation".&lt;/p&gt;

&lt;p&gt;The other branch of programming languages are compiled languages. Compiling code means that the humand-readable program code is translated into code a machine can read and understand. Compiled programs have the downside that there's a compilation step in between writing and running the program. The upside of such step is that various checks and optimizations can be performed while translating the code, eventually emitting the most efficient machine code possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic typing
&lt;/h3&gt;

&lt;p&gt;Python has no concept of typing. There's no problem in passing an integer into a function which expects a string. Python will run the program and raise an exception as soon as it evaluates the broken code.&lt;/p&gt;

&lt;p&gt;Strongly typed languages have the upside that mistakes like the one described above are impossible to make. The developer has to explicitly declare which types are expected.&lt;/p&gt;

&lt;p&gt;Python has recently added support for &lt;a href="https://www.python.org/dev/peps/pep-0484/"&gt;type hints&lt;/a&gt;. Type hinting merely serves as another form of documentation as it still won't prevent type misuses in programs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interoperability
&lt;/h3&gt;

&lt;p&gt;A lot of prominent packages such as &lt;a href="https://numpy.org"&gt;Numpy&lt;/a&gt; wrap other languages such as Fortran or C to offer reliable performance when working on computational expensive data processing tasks.&lt;/p&gt;

&lt;p&gt;While it's certainly not impossible to introduce existing libraries written in other languages into Python, the process to do that is oftentimes rather involved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Entering Swift
&lt;/h2&gt;

&lt;p&gt;Without going into too much detail it makes sense to take a quick detour and study the origins of the Swift programming language in order to see why it has such a potential to replace Python as the go-to choice for Data Science and Machine Learning projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Chris_Lattner"&gt;Chris Lattne&lt;/a&gt;r, the inventor of Swift has a long history and established track record in modern compiler development. During college he worked on a project which eventually became LLVM ("Low Level Virtual Machine"), the infamous compiler infrastructure toolchain. The revolutionary idea behing LLVM is the introduction of frontends and backends which can be mixed and matched. One frontend could be written for Swift which is then coupled with a backend implementation for the x86 architecture. Making it possible to compile to another architecture is as simple as using another backend such as the one for PowerPC. Back in the early compiler days one had to write the compiler end-to-end, tightly coupling the frontend and backend, making it a heroic effort to offer the compiler for different platforms.&lt;/p&gt;

&lt;p&gt;LLVM gained a lot of traction and Christ Lattner was eventually hired by Apple to work on its developer toolings which heavily relied on LLVM. &lt;a href="https://www.youtube.com/watch?v=yCd3CzGSte8&amp;amp;t=2250s"&gt;During that time&lt;/a&gt; he worked on a C++ compiler and thought about ways how a better, more modern programming langauge might look like. He figured that it should be compiled, easy to learn, flexible enough to feel like a scripting language and at the same time "hackable" at every layer. Those ideas translated into the Swift programming langauage which was officially &lt;a href="https://www.youtube.com/watch?v=MO7Ta0DvEWA"&gt;released at WWDC in 2014&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But what exactly makes Swift such a natural fit as a Python replacement? Isn't Swift only used for iOS and macOS apps? The following section shows why Swift could be Pythons successor.&lt;/p&gt;

&lt;h3&gt;
  
  
  It's compiled
&lt;/h3&gt;

&lt;p&gt;Swift is compiled via LLVM which means that its code is translated into optimized machine code directly running on the target platform. Improvements made to the LLVM compiler toolchain automatically benefit the Swift code generation.&lt;/p&gt;

&lt;p&gt;There's the saying that Swift is &lt;a href="https://github.com/tensorflow/swift/blob/master/docs/DesignOverview.md#swift"&gt;"syntactic sugar for LLVM"&lt;/a&gt; which rings true as one can see with the &lt;code&gt;Builtin&lt;/code&gt; usage for its &lt;a href="https://github.com/apple/swift/blob/tensorflow/stdlib/public/core/FloatingPointTypes.swift.gyb#L730-L753"&gt;core types&lt;/a&gt;. The linked code snippet shows that Swifts core types directly interface with their LLVM equivalents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python-like syntax
&lt;/h3&gt;

&lt;p&gt;Despite the compilation process Swift feels like a dynamic, Python-esque language. Swift was designed from the ground up for programs to &lt;a href="https://youtu.be/yCd3CzGSte8?t=2840"&gt;incrementally grow in complexity&lt;/a&gt; as necessary. The simplest of all Swift programs is just one line of code: &lt;code&gt;print("Hello World")&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let greeting = "Hello World"
print(greeting)
// Hello World

let num1 = 1
let num2 = 2
print(num1 + num2)
// 3

let scores = [10, 35, 52, 92, 88]
for score in scores {
    print(score)
}
// 10
// 35
// 52
// 92
// 88

class Cat {
    var name: String
    var livesRemaining: Int = 9

    init(name: String) {
        self.name = name
    }

    func describe() -&amp;gt; String {
        return "👋 I'm \(self.name) and I have \(self.livesRemaining) lives 😸"
    }
}
let mitsy = Cat(name: "Mitsy")
print(mitsy.describe())
// 👋 I'm Mitsy and I have 9 lives 😸
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Static typing
&lt;/h3&gt;

&lt;p&gt;Given that Swift is compiled via LLVM, it's statically type checked during the compilation process. There's no way you can pass an invalid type to a function and run into an error during runtime. If your code compiles you can be pretty sure that you're passing around the expected types.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func sum(xs: [Int]) -&amp;gt; Int {
    var result: Int = 0
    for x: Int in xs {
        result = result + x
    }
    return result
}

// Using correct types
let intNumbers: [Int] = [1, 2, 3, 4, 5]
let resultInt = sum(xs: intNumbers)
print(resultInt)
// 15

// Using incorrect types
let stringNumbers: [String] = ["one", "two", "three"]
let resultString = sum(xs: stringNumbers)
print(resultString)
// error: cannot convert value of type '[String]' to expected argument type '[Int]'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Hackable
&lt;/h3&gt;

&lt;p&gt;Swifts concepts of &lt;a href="https://docs.swift.org/swift-book/LanguageGuide/Protocols.html"&gt;protocols&lt;/a&gt; and &lt;a href="https://docs.swift.org/swift-book/LanguageGuide/Extensions.html"&gt;extensions&lt;/a&gt; make it dead simple to add new functionality to existing libraries or even types which ship with the language core itself. Want to add a new method to &lt;code&gt;Int&lt;/code&gt;? No problem!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// One needs to implement `help` when using the `Debugging` Protocol
protocol Debugging {
    func help() -&amp;gt; String
}

// Implementing `Debugging` for MatrixMultiply
class MatrixMultiply: Debugging {
    func help() -&amp;gt; String {
        return "Offers methods to aid with matrix-matrix multiplications."
    }

    func multiply() {
        // ...
    }
}
var matMult = MatrixMultiply()
print(matMult.help())
// Offers methods to aid with matrix-matrix multiplications.

// Implementing `Debugging` for VectorMultiply
class VectorMultiply: Debugging {
    func help() -&amp;gt; String {
        return "Offers methods to aid with matrix-vector multiplications."
    }
}
var vecMult = VectorMultiply()
print(vecMult.help())
// Offers methods to aid with matrix-vector multiplications.

// Makes it possible to emojify an existing type
protocol Emojifier {
    func emojify() -&amp;gt; String
}

// Here we're extending Swifts core `Int` type
extension Int: Emojifier {
    func emojify() -&amp;gt; String {
        if self == 8 {
            return "🎱"
        } else if self == 100 {
            return "💯"
        }
        return String(self)
    }
}

print(8.emojify())
// 🎱
print(100.emojify())
// 💯
print(42.emojify())
// 42
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Value semantics
&lt;/h3&gt;

&lt;p&gt;I'm sure everyone ran into this problem before. An object is passed into a function and modified without bad intentions. Meanwhile the object is used in a different place and all of the sudden its internal state isn't what it's supposed to be. The culprit is the data mutation within the function.&lt;/p&gt;

&lt;p&gt;This problem can be mitigated easily via value semantics. When using value semantics a "copy" rather than an object reference is passed around.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// As seen on: https://marcosantadev.com/copy-write-swift-value-types/

import Foundation

// Prints the memory address of the given object
func address(of object: UnsafeRawPointer) -&amp;gt; String {
    let addr = Int(bitPattern: object)
    return String(format: "%p", addr)
}

var list1 = [1, 2, 3, 4, 5]
print(address(of: list1))
// 0x7f2021f845d8

var list2 = list1
print(address(of: list2))
// 0x7f2021f845d8 &amp;lt;-- Both lists share the same address

list2.append(6) // &amp;lt;-- Mutating `list2`

print(list1)
// [1, 2, 3, 4, 5]

print(list2)
// [1, 2, 3, 4, 5, 6]

print(address(of: list1))
// 0x7f2021f84a38
print(address(of: list2))
// 0x128fb50 &amp;lt;-- `list2` has a different address
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  First-class C interoperability
&lt;/h3&gt;

&lt;p&gt;Given that Swift compiles via LLVM it has access to existing LLVM-based implementations to interoperate with. One such project is &lt;a href="https://clang.llvm.org"&gt;Clang&lt;/a&gt;, a C language family frontend written for LLVM. Thanks to Clang it's dead simple to wrap existing C libraries and bring them into Swift projects.&lt;/p&gt;

&lt;p&gt;The following video demonstrates how easy it is:&lt;/p&gt;

&lt;h2&gt;
  
  
  Swift for TensorFlow (S4TF)
&lt;/h2&gt;

&lt;p&gt;Given all the upsides described above, the TensorFlow team decided to experiment with Swift as a Python replacement to interface with TensorFlow. Early prototypes were fruitful, encouraging the TensorFlow team to officially released &lt;a href="https://www.tensorflow.org/swift"&gt;Swift for TensorFlow (S4TF)&lt;/a&gt; in 2019.&lt;/p&gt;

&lt;p&gt;S4TF extends the Swift core language with various features especially useful for Machine Learning tasks. Such enhancements include first-class autodiff support to calculate derivatives for functions or Python interoperability which makes it possible to reuse existing Python packages such as &lt;a href="https://matplotlib.org"&gt;matplotlib&lt;/a&gt;, &lt;a href="https://scikit-learn.org/stable/"&gt;scikit-learn&lt;/a&gt; or &lt;a href="https://pandas.pydata.org"&gt;pandas&lt;/a&gt; via Swift.&lt;/p&gt;

&lt;p&gt;The following is a demonstartion which shows how Swift for TensorFlow can be used to describe and train a deep neural network in TensorFlow:&lt;/p&gt;

&lt;p&gt;Do you want to play around with &lt;a href="https://www.tensorflow.org/swift"&gt;Swift for TensorFlow&lt;/a&gt; yourself? Just run the following code in a terminal to spin up a &lt;a href="https://jupyter.org"&gt;Jupyter&lt;/a&gt; Notebook server with Swift Kernel support in a &lt;a href="https://www.docker.com"&gt;Docker&lt;/a&gt; container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -it -p 8888:8888 --rm --name jupyter-s4tf \
  -v "$PWD":/home/jovyan/work \
  --ipc=host \
  pmuens/jupyter-s4tf:latest jupyter notebook \
  --ip=0.0.0.0 \
  --no-browser \
  --allow-root \
  --NotebookApp.token=\
  --notebook-dir=/home/jovyan/work
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The code for the repository can be found &lt;a href="https://github.com/pmuens/jupyter-s4tf"&gt;here&lt;/a&gt; and the Docker Hub entry is &lt;a href="https://hub.docker.com/r/pmuens/jupyter-s4tf"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Python, the de-facto standard programming language for Data Science and Machine Learning has served the community very well in the past. Nevertheless, given the trajectory of technological advancements we're slowly but surely hitting the limits with the toolings we currently have.&lt;/p&gt;

&lt;p&gt;Performance critical code is already pushed down into lower-level implementations written in programming languages such as C or Fortran and wrapped via public Python APIs. Wouldn't it be nice to write expressive, yet performant code from the get go at every layer? And what about all the libraries out there? Wouldn't it be nice to wrap and reuse them with only a couple lines of code?&lt;/p&gt;

&lt;p&gt;The lack of static typing in Python makes it painful to work on larger, more complex projects. It's all too easy to define a model and train it on a huge dataset just to realize that a type error interrupts the training process halfway through. An error which could've been mitigated via thorough type checks.&lt;/p&gt;

&lt;p&gt;And what if we're hitting other roadblocks? Wouldn't it be nice to be able to peek under the covers and fix the issues ourselves in an "official" way without all the &lt;a href="https://en.wikipedia.org/wiki/Monkey_patch"&gt;monkey-patching&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;Most large-scale Machine Learning projects already faced some, if not all of the issues listed above. The TensorFlow team experienced them too and looked into ways to solve them once and for all. What they came up with is &lt;a href="https://www.tensorflow.org/swift"&gt;Swift for TensorFlow (S4TF)&lt;/a&gt;, a Swift language extension tailored towards modern Machine Learning projects. The Swift programming language comes with various properties which makes it a perfect fit for a Python replacement: It shares a similar syntax, is compiled (and therefore runs fast), has a type system and seamlessly interoperates with exisiting C and Python libraries.&lt;/p&gt;

&lt;p&gt;What do you think? Is Swift for TensorFlow the future or do we stick with Python for now? Will a language such as Julia dominate the Data Science and Machine Learning world in the future?&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;p&gt;The following is a list of resources I've used to compile this blog post. There are also a couple of other sources linked within the article itself.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.swift.org/swift-book/GuidedTour/GuidedTour.html"&gt;A Swift Tour&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.google.com/presentation/d/1dc6o2o-uYGnJeCeyvgsgyk05dBMneArxdICW5vF75oU/edit#slide=id.p"&gt;Fast.ai + Swift for TensorFlow Presentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=3TqN_M1L4ts"&gt;Fast.ai - Lesson 13 (2019) - Basics of Swift for Deep Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=8wd8zFzTG38"&gt;Fast.ai - Lesson 14 (2019) - Swift: C interop; Protocols; Putting it all together&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.fast.ai/2019/01/10/swift-numerics/"&gt;Fast.ai - Swift Numerics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=yCd3CzGSte8"&gt;Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Artificial Intelligence Podcast&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>code</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Generics</title>
      <dc:creator>Philipp Muens</dc:creator>
      <pubDate>Tue, 01 Oct 2019 10:44:50 +0000</pubDate>
      <link>https://forem.com/pmuens/generics-2f0h</link>
      <guid>https://forem.com/pmuens/generics-2f0h</guid>
      <description>&lt;p&gt;Generic programming makes it possible to describe an implementation in an abstract way with the intention to reuse it with different data types.&lt;/p&gt;

&lt;p&gt;While generic programming is a really powerful tool as it prevents the programmer from repeating herself it can be hard to grasp for newcomers. This is especially true if you're not too familiar with typed programming languages.&lt;/p&gt;

&lt;p&gt;This blog post aims to shed some light into the topic of generic programming. We'll discover why Generics are useful and which thought process can be applied to easily derive generic function signatures. At the end of post you'll be able to author and understand functions like the this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Throughout this post we'll use &lt;a href="http://www.typescriptlang.org"&gt;TypeScript&lt;/a&gt; as our language of choice. Feel free to &lt;a href="http://www.typescriptlang.org/play/"&gt;code along&lt;/a&gt; while reading through it.&lt;/p&gt;

&lt;p&gt;Of course you can "just use JavaScript" (or another dynamically typed language) to not deal with concepts such as typing or Generics. But that's not the point. The point of this post is to introduce the concepts of Generics in a playful way. TypeScript is just a replaceable tool to express our thoughts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;Before we jump right into the application of generic programming it might be useful to understand what problem Generics are solving. We'll re-implement one of JavaScripts built-in Array methods called &lt;code&gt;filter&lt;/code&gt; to get first-hand experience as to why Generics were invented.&lt;/p&gt;

&lt;p&gt;Let's start with an example to understand what &lt;code&gt;filter&lt;/code&gt; actually does. The &lt;a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/filter"&gt;JavaScript documentation for &lt;code&gt;filter&lt;/code&gt;&lt;/a&gt; states that:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The filter() method creates a new array with all elements that pass the test implemented by the provided function.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's take a look at a concrete example to see how we would use &lt;code&gt;filter&lt;/code&gt; in our programs. First off we have to define an array. Let's call our array &lt;code&gt;numbers&lt;/code&gt; as it contains some numbers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Next up we ned to come up with a function our &lt;code&gt;filter&lt;/code&gt; method applies to each element of such array. This function determines whether the element-under-test should be included in the resulting / filtered array. Based on the quote above and the description we just wrote down we can derive that our function which is used by the &lt;code&gt;filter&lt;/code&gt; method should return a boolean value. The function should return &lt;code&gt;true&lt;/code&gt; if the element passes the test and &lt;code&gt;false&lt;/code&gt; otherwise.&lt;/p&gt;

&lt;p&gt;To keep things simple we pretend that we want to filter our &lt;code&gt;numbers&lt;/code&gt; array such that only even numbers will be included in our resulting array. Here's the &lt;code&gt;isEven&lt;/code&gt; function which implements that logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isEven&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;num&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;num&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Our &lt;code&gt;isEven&lt;/code&gt; function takes in a &lt;code&gt;num&lt;/code&gt; argument of type &lt;code&gt;number&lt;/code&gt; and returns a &lt;code&gt;boolean&lt;/code&gt;. We use the &lt;a href="https://en.wikipedia.org/wiki/Modulo_operation"&gt;modulo operation&lt;/a&gt; to determine whether the number-under-test is even.&lt;/p&gt;

&lt;p&gt;Next up we can use this function as an argument for the &lt;code&gt;filter&lt;/code&gt; method on our array to get a resulting array which only includes even numbers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isEven&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// --&amp;gt; [2, 4, 6]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As we've stated earlier our goal is to implement the &lt;code&gt;filter&lt;/code&gt; function on our own. Now that we've used &lt;code&gt;filter&lt;/code&gt; with an example we should be familiar with it's API and usage.&lt;/p&gt;

&lt;p&gt;To keep things simple we won't implement &lt;code&gt;filter&lt;/code&gt; on arrays but rather define a standalone function which accepts an &lt;code&gt;array&lt;/code&gt; and a &lt;code&gt;function&lt;/code&gt; as its arguments.&lt;/p&gt;

&lt;p&gt;What we do know is that &lt;code&gt;filter&lt;/code&gt; loops through every element of the array and applies the custom function to it in order to see if it should be included in the resulting array. We can translate this into the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cons&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now there's definitely a lot happening here and it might look intimidating but bear with me. It's simpler than it might look.&lt;/p&gt;

&lt;p&gt;In the first line we define our function called &lt;code&gt;filter&lt;/code&gt; which takes an array called &lt;code&gt;xs&lt;/code&gt; (you can imagine pronouncing this "exes") and a function called &lt;code&gt;func&lt;/code&gt; as its arguments. The array &lt;code&gt;xs&lt;/code&gt; is of type &lt;code&gt;number&lt;/code&gt; as we're dealing with numbers and the function &lt;code&gt;func&lt;/code&gt; takes an &lt;code&gt;x&lt;/code&gt; of type &lt;code&gt;number&lt;/code&gt;, runs some code and returns a &lt;code&gt;boolean&lt;/code&gt;. Once done our &lt;code&gt;filter&lt;/code&gt; function returns an array of type &lt;code&gt;number&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The function body simply defines an intermediary array of type &lt;code&gt;number&lt;/code&gt; which is used to store the resulting numbers. Other than that we're looping over every element of our array and apply the function &lt;code&gt;func&lt;/code&gt; to it. If the function returns &lt;code&gt;true&lt;/code&gt; we push the element into our &lt;code&gt;res&lt;/code&gt; array. Once done looping over all elements we return the &lt;code&gt;res&lt;/code&gt; array which includes all the numbers for which our &lt;code&gt;func&lt;/code&gt; function returned the value &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Alright. Let's see if our homebrew &lt;code&gt;filter&lt;/code&gt; function works the same way the built-in JavaScript &lt;code&gt;filter&lt;/code&gt; function does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;isEven&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// --&amp;gt; [2, 4, 6]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Great! Looks like it's working!&lt;/p&gt;

&lt;p&gt;If we think about filtering in the abstract we can imagine that there's more than just the filtering of numbers.&lt;/p&gt;

&lt;p&gt;Let's imagine we're building a Rolodex-like application. Here's an array with some names from our Rolodex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Alice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Bob&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;John&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Alex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Pete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Anthony&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now one of our application requirements is to only display names that start with a certain letter.&lt;/p&gt;

&lt;p&gt;That sounds like a perfect fit for our &lt;code&gt;filter&lt;/code&gt; function as we basically filter all the names based on their first character!&lt;/p&gt;

&lt;p&gt;Let's start by writing our custom function we'll use to filter out names that start with an &lt;code&gt;a&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startsWithA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;charAt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As we can see our function takes one argument called &lt;code&gt;name&lt;/code&gt; of type &lt;code&gt;string&lt;/code&gt; and it returns a &lt;code&gt;boolean&lt;/code&gt; which our function computes by checking if the first character of the name is an &lt;code&gt;a&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now let's use our &lt;code&gt;filter&lt;/code&gt; function to filter the names:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;startsWithA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// --&amp;gt; Type Error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Hmm. Something seems to be off here.&lt;/p&gt;

&lt;p&gt;Let's revisit the signature of our &lt;code&gt;filter&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Here we can see that the &lt;code&gt;xs&lt;/code&gt; parameter is an array of type &lt;code&gt;number&lt;/code&gt;. Furthermore the &lt;code&gt;func&lt;/code&gt; parameter takes an &lt;code&gt;x&lt;/code&gt; of type &lt;code&gt;number&lt;/code&gt; and returns a &lt;code&gt;boolean&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;However in our new Rolodex application we're dealing with names which are &lt;code&gt;strings&lt;/code&gt; and the &lt;code&gt;startsWithA&lt;/code&gt; function we've defined takes a &lt;code&gt;string&lt;/code&gt; as an argument, not a &lt;code&gt;number&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One way to fix this problem would be to create a copy of &lt;code&gt;filter&lt;/code&gt; called e.g. &lt;code&gt;filter2&lt;/code&gt; which arguments can handle &lt;code&gt;strings&lt;/code&gt; rather than &lt;code&gt;numbers&lt;/code&gt;. But we programmers know that we &lt;a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself"&gt;shouldn't repeat ourselves&lt;/a&gt; to keep things maintainable. In addition to that we're lazy, so using one function to deal with different data types would be ideal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Entering Generics
&lt;/h2&gt;

&lt;p&gt;And that's exactly the problem Generics tackle. As the introduction of this blog post stated, Generics can be used to describe an implementation in an abstract way in order to reuse it with different data types.&lt;/p&gt;

&lt;p&gt;Let's use Generics to solve our problem and write a function that can deal with any data type, not just &lt;code&gt;numbers&lt;/code&gt; or &lt;code&gt;strings&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Before we jump into the implementation we should articulate what we're about to implement. Talking in the abstract we're basically attempting to filter an array of type &lt;code&gt;T&lt;/code&gt; (&lt;code&gt;T&lt;/code&gt; is our "placeholder" for some valid type here) with the help of our custom function. Given that our array has elements of type &lt;code&gt;T&lt;/code&gt; our function should take each element of such type and produce a &lt;code&gt;boolean&lt;/code&gt; as a result (like we did before).&lt;/p&gt;

&lt;p&gt;Alright. let's translate that into code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;At a first glance this might look confusing since we've sprinkled in our &lt;code&gt;T&lt;/code&gt; type here and there. However overall it should look quite familiar. Let's take a closer look into how this implementation works.&lt;/p&gt;

&lt;p&gt;In the first line we define our &lt;code&gt;filter&lt;/code&gt; function as a function which takes an array named &lt;code&gt;xs&lt;/code&gt; of type &lt;code&gt;T&lt;/code&gt; and a function called &lt;code&gt;func&lt;/code&gt; which takes a parameter &lt;code&gt;x&lt;/code&gt; of type &lt;code&gt;T&lt;/code&gt; and returns a &lt;code&gt;boolean&lt;/code&gt;. Our function &lt;code&gt;filter&lt;/code&gt; then returns a resulting array which is also of type &lt;code&gt;T&lt;/code&gt;, since it's basically a subset of elements of our original array &lt;code&gt;xs&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The code inside the function body is pretty much the same as before with the exception that our intermediary &lt;code&gt;res&lt;/code&gt; array also needs to be of type &lt;code&gt;T&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There's one little detail we haven't talked about yet. There's this &lt;code&gt;&amp;lt;T&amp;gt;&lt;/code&gt; at the beginning of the function. What does that actually do?&lt;/p&gt;

&lt;p&gt;Well our compiler doesn't really know what the type &lt;code&gt;T&lt;/code&gt; might be at the end of the day. And it doesn't really care that much whether it's a &lt;code&gt;string&lt;/code&gt;, a &lt;code&gt;number&lt;/code&gt; or an &lt;code&gt;object&lt;/code&gt;. It only needs to know that it's "some placeholder" type. We programmers have to tell the compiler that we're abstracting the type away via Generics here. So in TypeScript for example we use the syntax &lt;code&gt;&amp;lt;TheTypePlaceHolder&amp;gt;&lt;/code&gt; right after the function names to signal the compiler that we want our function to be able to deal with lots of different types (to be generic). Using &lt;code&gt;T&lt;/code&gt; is just a convention. You could use any name you want as your "placeholder type". If your functions deals with more than one generic type you'd just list them comma-separated inside the &lt;code&gt;&amp;lt;&amp;gt;&lt;/code&gt; like this: &lt;code&gt;&amp;lt;A, B&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That's pretty much all we have to do to turn our limited, &lt;code&gt;number&lt;/code&gt;-focused &lt;code&gt;filter&lt;/code&gt; function into a generic function which can deal with all kinds of types. Let's see if it works with our &lt;code&gt;numbers&lt;/code&gt; and &lt;code&gt;names&lt;/code&gt; arrays:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;

&lt;span class="c1"&gt;// using `filter` with numbers and our `isEven` function&lt;/span&gt;
&lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;isEven&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// --&amp;gt; [2, 4, 6]&lt;/span&gt;

&lt;span class="c1"&gt;// using `filter` with strings and our `startsWithA` function&lt;/span&gt;
&lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;startsWithA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// --&amp;gt; ['Alice', 'Alex', 'Anthony']&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Awesome! It works!&lt;/p&gt;

&lt;h2&gt;
  
  
  Function signatures as documentation
&lt;/h2&gt;

&lt;p&gt;One of the many benefits of using a type system is that you can get a good sense of what the function will be doing based solely on its signature.&lt;/p&gt;

&lt;p&gt;Let's take the function signature from the beginning of the post and see if we can figure out what it'll be doing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nx"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The first thing we notice is that it's a generic function as we're dealing with 2 "type placeholders" &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt; here. Next up we can see that this function takes in an array called &lt;code&gt;xs&lt;/code&gt; of type &lt;code&gt;A&lt;/code&gt; and a function &lt;code&gt;func&lt;/code&gt; which takes an &lt;code&gt;A&lt;/code&gt; and turns it into a &lt;code&gt;B&lt;/code&gt;. At the end the &lt;code&gt;foo&lt;/code&gt; function returns an array of type &lt;code&gt;B&lt;/code&gt;,&lt;/p&gt;

&lt;p&gt;Take a couple of minutes to parse the function signature in order to understand what it's doing.&lt;/p&gt;

&lt;p&gt;Do you know how this function is called? Here's a tip: It's also one of those functions from the realm of functional programming used on e.g. arrays.&lt;/p&gt;

&lt;p&gt;Here's the solution: The function we called &lt;code&gt;foo&lt;/code&gt; here is usually called &lt;code&gt;map&lt;/code&gt; as it iterates over the elements of the array and uses the provided function to map every element from one type to the other (note that it can also map to the same type, i.e. from type &lt;code&gt;A&lt;/code&gt; to type &lt;code&gt;A&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;I have to admit that this was a rather challenging question. Here's how &lt;code&gt;map&lt;/code&gt; is used in the wild:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;numToString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;num&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;num&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;numToString&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// --&amp;gt; ['1', '2', '3', '4', '5', '6']&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this blog post we've looked into Generics as a way to write code in an abstract and reusable way.&lt;/p&gt;

&lt;p&gt;We've implemented our own &lt;code&gt;filter&lt;/code&gt; function to understand why generic programming is useful and how it helps us to allow the filtering of lists of &lt;code&gt;numbers&lt;/code&gt;, &lt;code&gt;strings&lt;/code&gt; or more broadly speaking &lt;code&gt;T&lt;/code&gt;s.&lt;/p&gt;

&lt;p&gt;Once we understood how to read and write Generic functions we've discovered how typing and Generics can help us to get a sense of what a function might be doing just by looking at its signature.&lt;/p&gt;

&lt;p&gt;I hope that you've enjoyed this journey and feel equipped to read and write highly generic code.&lt;/p&gt;

&lt;p&gt;Do you have any questions, comments, feedback? Feel free to send me an E-Mail or reach out to me via Twitter.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>typescript</category>
      <category>functional</category>
      <category>codenewbie</category>
    </item>
    <item>
      <title>The intuition behind Word2Vec</title>
      <dc:creator>Philipp Muens</dc:creator>
      <pubDate>Tue, 04 Jun 2019 18:22:00 +0000</pubDate>
      <link>https://forem.com/pmuens/the-intuition-behind-word2vec-1b72</link>
      <guid>https://forem.com/pmuens/the-intuition-behind-word2vec-1b72</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JaLGDF3h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/antoine-dautry-_zsL306fDck-unsplash.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JaLGDF3h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/antoine-dautry-_zsL306fDck-unsplash.jpg" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have you ever wondered how YouTube knows which videos to recommend, how Google Translate is able to translate whole texts into a decent version of the target language or how your Smartphone keyboard knows which words and text snippets to suggest while you type your texts?&lt;/p&gt;

&lt;p&gt;There’s a very high likelihood that so-called &lt;a href="https://en.wikipedia.org/wiki/Word_embedding"&gt;Embeddings&lt;/a&gt; were used behind the scenes. Embeddings are one of the central ideas behind modern Natural Language Processing models.&lt;/p&gt;

&lt;p&gt;In the following writeup we’ll discover the main building blocks and basic intuition behind Embeddings. We’ll learn how and why they work and how &lt;a href="https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf"&gt;Word2Vec&lt;/a&gt;, a method to turn words into vectors, can be used to show that:&lt;/p&gt;

&lt;p&gt;[king - man + woman = queen ]&lt;/p&gt;

&lt;p&gt;All the code we’ll write here can be found in my &lt;a href="https://github.com/pmuens/lab"&gt;“Lab”&lt;/a&gt; repository on GitHub. Feel free to code along while reading through this tutorial.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Setup
&lt;/h2&gt;

&lt;p&gt;Before jumping right into the code we need to make sure that all Python packages we’ll be using are installed on our machine.&lt;/p&gt;

&lt;p&gt;We install &lt;a href="https://seaborn.pydata.org/"&gt;Seaborn&lt;/a&gt;, a visualization tool which helps us to plot nice-looking charts and diagrams. We don’t really work with Seaborn directly but rather use its styles in conjunction with &lt;a href="https://matplotlib.org/"&gt;Matplotlib&lt;/a&gt; to make our plots look a little bit more “modern”.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install seaborn
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Next up we need to import the modules we’ll use throughout this tutorial (the last few lines configure Matplotlib to use Seaborn styles).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
from pathlib import Path

import pandas as pd
import seaborn as sns
import numpy as np
from IPython.display import HTML, display

# prettier Matplotlib plots
import matplotlib.pyplot as plt
import matplotlib.style as style
style.use('seaborn')
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Since we’re dealing with different datasets we should create a separate directory to store them in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!mkdir -p data
data_dir = Path('data')
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Comparing Countries
&lt;/h2&gt;

&lt;p&gt;Let’s start with our first data analysis task. Our goal is to compare and contrast different countries based on their surface area and population. The main idea being that we want to analyze which countries are quite similar and which are rather different based on those two metrics.&lt;/p&gt;

&lt;p&gt;The dataset we’ll use is part of the &lt;a href="https://github.com/samayo/country-json"&gt;&lt;code&gt;country-json&lt;/code&gt; project&lt;/a&gt; by &lt;a href="https://github.com/samayo"&gt;@samayo&lt;/a&gt;. Make sure to take some time to browse through the &lt;a href="https://github.com/samayo/country-json/tree/master/src"&gt;different JSON files&lt;/a&gt; to get an idea about the structure of the data.&lt;/p&gt;

&lt;p&gt;In our example we’re only interested in the &lt;a href="https://github.com/samayo/country-json/blob/master/src/country-by-surface-area.json"&gt;&lt;code&gt;country-by-surface-area.json&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://github.com/samayo/country-json/blob/master/src/country-by-population.json"&gt;&lt;code&gt;country-by-population.json&lt;/code&gt;&lt;/a&gt; files. Let’s go ahead and download the files to our &lt;code&gt;data&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;After that we can define 2 variables which will point to the files on our file system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SURFACE_AREA_FILE_NAME = 'country-by-surface-area.json'
POPULATION_FILE_NAME = 'country-by-population.json'

!wget -nc https://raw.githubusercontent.com/samayo/country-json/master/src/country-by-surface-area.json -O data/country-by-surface-area.json
!wget -nc https://raw.githubusercontent.com/samayo/country-json/master/src/country-by-population.json -O data/country-by-population.json

surface_area_file_path = str(data_dir / SURFACE_AREA_FILE_NAME)
population_file_path = str(data_dir / POPULATION_FILE_NAME)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;During our data analysis we’ll utilize &lt;a href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt;, a great Python library which makes it dead simple to inspect and manipulate data.&lt;/p&gt;

&lt;p&gt;Since our data is in JSON format we can use Pandas &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html"&gt;&lt;code&gt;read_json&lt;/code&gt;&lt;/a&gt; function to load the data into a so-called &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html"&gt;DataFrame&lt;/a&gt; (think of it as an Excel spreadsheet on steroids).&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html"&gt;&lt;code&gt;dropna&lt;/code&gt;&lt;/a&gt; function makes sure that we remove all entries which are undefined and therefore useless for further inspection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df_surface_area = pd.read_json(surface_area_file_path)
df_population = pd.read_json(population_file_path)

df_population.dropna(inplace=True)
df_surface_area.dropna(inplace=True)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You might’ve noticed that dealing with 2 separate files will get quite hairy if we want to compare countries based on their 2 metrics.&lt;/p&gt;

&lt;p&gt;Since both files contain the same countries with the same names and only differ in terms of their &lt;code&gt;area&lt;/code&gt; and &lt;code&gt;population&lt;/code&gt; data we can use &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html"&gt;&lt;code&gt;merge&lt;/code&gt;&lt;/a&gt; to create a new DataFrame containing all countries with their respective &lt;code&gt;area&lt;/code&gt; and &lt;code&gt;population&lt;/code&gt; numbers.&lt;/p&gt;

&lt;p&gt;Another tweak we perform here is to set the &lt;code&gt;index&lt;/code&gt; to the country name. This way we can easily query for country data based on the country names rather than having to deal with non-expressive integer values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.merge(df_surface_area, df_population, on='country')
df.set_index('country', inplace=True)
df.head()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zoa3pW9r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.28.09.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zoa3pW9r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.28.09.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;len(df)

227
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As you can see we have a total of 227 countries in our DataFrame. 227 are way too many countries for our need. Especially since we’re about to plot the data in the next step.&lt;/p&gt;

&lt;p&gt;Let’s reduce our result set by performing some range-queries with the &lt;code&gt;area&lt;/code&gt; and &lt;code&gt;population&lt;/code&gt; data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = df[
    (df['area'] &amp;gt; 100000) &amp;amp; (df['area'] &amp;lt; 600000) &amp;amp;
    (df['population'] &amp;gt; 35000000) &amp;amp; (df['population'] &amp;lt; 100000000)
]
len(df)

12
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Great! 12 countries are way easier to analyze once plotted.&lt;/p&gt;

&lt;p&gt;Speaking of which, let’s do a 2D &lt;a href="https://en.wikipedia.org/wiki/Scatter_plot"&gt;scatterplot&lt;/a&gt; of our 12 countries. We decide to plot the &lt;code&gt;area&lt;/code&gt; on the X axis and the &lt;code&gt;population&lt;/code&gt; on the Y axis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fig, ax = plt.subplots()
df.plot(x='area', y='population', figsize=(10, 10), kind='scatter', ax=ax)

for k, v in df.iterrows():
    ax.annotate(k, v)

fig.canvas.draw()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UJwEkx7C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/countries-scatterplot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UJwEkx7C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/countries-scatterplot.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking at the plotted data we can immediately see some relationships. It appears that Vietnam has a high population compared to its area. Kenya on the other hand has a large surface area but a smaller population compared to its size.&lt;/p&gt;

&lt;p&gt;Plotting the data like this helps us to reason about it in a visual way. In addition to that we can also easily validate the integrity of our data.&lt;/p&gt;

&lt;p&gt;While we as humans can immediately tell the relationships in our country data just by looking at our plot it’s necessary to translate our visual reasoning into raw numbers so our computer can understand them too.&lt;/p&gt;

&lt;p&gt;Looking at the plot again it seems like the distance between the data points of the countries is a good measure to determine how “similar” or “different” the countries are.&lt;/p&gt;

&lt;p&gt;There are several algorithms to calculate the distance between two (or more) coordinates. The &lt;a href="https://en.wikipedia.org/wiki/Euclidean_distance"&gt;Euclidean distance&lt;/a&gt; is a very common formula to do just that. Here’s the Math notation:&lt;/p&gt;

&lt;p&gt;[d(x, y) = d(y, x) = \sqrt{\sum_{i=1}&lt;sup&gt;N&lt;/sup&gt; (x_i - y_i)&lt;sup&gt;2}&lt;/sup&gt; ]&lt;/p&gt;

&lt;p&gt;While the formula might look intimidating at first it’s rather simple to turn it into code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def euclidean_distance(x, y):
    x1, x2 = x
    y1, y2 = y
    result = np.sqrt((x1 - x2) **2 + (y1 - y2)** 2)
    # we'll cast the result into an int which makes it easier to compare
    return int(round(result, 0))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;According to our plot it seems like Thailand and Uganda are 2 countries which are very different. Computing the Euclidean distance between both validates our hunch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Uganda &amp;lt;--&amp;gt; Thailand
uganda = df.loc['Uganda']
thailand = df.loc['Thailand']

x = (uganda['area'], thailand['area'])
y = (uganda['population'], thailand['population'])

euclidean_distance(x, y)

26175969
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If we compare this result to the Euclidean distance between Iraq and Morocco we can see that those countries seem to be more “similar”.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Iraq &amp;lt;--&amp;gt; Morocco
iraq = df.loc['Iraq']
morocco = df.loc['Morocco']

x = (iraq['area'], morocco['area'])
y = (iraq['population'], morocco['population'])

euclidean_distance(x, y)

2535051
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;While this exercise was quite simple and intuitive if one is fluent in geography it also introduced us to the basic concepts of Embeddings. With Embeddings we map data (e.g. words or raw numbers) into multi-dimensional spaces and use Math to manipulate and calculate relationships between that data.&lt;/p&gt;

&lt;p&gt;This might sound rather abstract and I agree that the relationship between our Country data analysis and Embeddings is still a little bit fuzzy.&lt;/p&gt;

&lt;p&gt;Trust me, the upcoming example will definitely result in an “Aha Moment” and suddenly what we’ve learned so far will click!&lt;/p&gt;

&lt;h2&gt;
  
  
  Color Math
&lt;/h2&gt;

&lt;p&gt;Now that we’ve seen some of the underlying principles of Embeddings let’s take another look at a slightly more complicated example. This time we’ll work with different colors and their representation as a combination of Red, Green and Blue values (also known as RGB).&lt;/p&gt;

&lt;p&gt;Before we jump right into our analysis we’ll define a helper function which lets us render the color according to its RGB representation.&lt;/p&gt;

&lt;p&gt;The following code defines a function which takes the integer values of Red, Green and Blue (values in the range of 0 - 255) and renders a HTML document with the given color as its background.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def render_color(r, g, b):
    display(HTML('''
      &amp;lt;div style="background-color: rgba(%d, %d, %d, 1); height: 100px;"&amp;gt;&amp;lt;/div&amp;gt;
    ''' % (r, g, b)),
    metadata=dict(isolated=True))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The color black is represented as 0 Red, 0 Green and 0 Blue. Let’s validate that our &lt;code&gt;render_color&lt;/code&gt; function works as expected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;render_color(0, 0, 0)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8RgHHN5c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.31.47.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8RgHHN5c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.31.47.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Great. It works!&lt;/p&gt;

&lt;p&gt;Next up it’s time to download the dataset we’ll be using for our color analysis. We’ve decided to use the &lt;a href="https://jonasjacek.github.io/colors/"&gt;256 Colors&lt;/a&gt; dataset by &lt;a href="https://github.com/jonasjacek"&gt;@jonasjacek&lt;/a&gt;. It lists the 256 colors used by &lt;a href="https://en.wikipedia.org/wiki/Xterm"&gt;xterm&lt;/a&gt;, a widely used terminal emulator. Make sure to take a couple of minutes to familiarize yourself with the data and its structure.&lt;/p&gt;

&lt;p&gt;Downloading the dataset follows the same instruction we’ve used in the beginning of this tutorial where we downloaded the Country data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;COLORS_256_FILE_NAME = 'colors-256.json'

!wget -nc https://jonasjacek.github.io/colors/data.json -O data/colors-256.json

colors_256_file_path = str(data_dir / COLORS_256_FILE_NAME)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now that we have access to the data in our programming environment it’s time to inspect the structure and think about ways to further process it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;color_data = json.loads(open(colors_256_file_path, 'r').read())
color_data[:5]

[{'colorId': 0,
  'hexString': '#000000',
  'rgb': {'r': 0, 'g': 0, 'b': 0},
  'hsl': {'h': 0, 's': 0, 'l': 0},
  'name': 'Black'},
 {'colorId': 1,
  'hexString': '#800000',
  'rgb': {'r': 128, 'g': 0, 'b': 0},
  'hsl': {'h': 0, 's': 100, 'l': 25},
  'name': 'Maroon'},
 {'colorId': 2,
  'hexString': '#008000',
  'rgb': {'r': 0, 'g': 128, 'b': 0},
  'hsl': {'h': 120, 's': 100, 'l': 25},
  'name': 'Green'},
 {'colorId': 3,
  'hexString': '#808000',
  'rgb': {'r': 128, 'g': 128, 'b': 0},
  'hsl': {'h': 60, 's': 100, 'l': 25},
  'name': 'Olive'},
 {'colorId': 4,
  'hexString': '#000080',
  'rgb': {'r': 0, 'g': 0, 'b': 128},
  'hsl': {'h': 240, 's': 100, 'l': 25},
  'name': 'Navy'}]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As we can see there are 3 different color representations available in this dataset. There’s a Hexadecimal, a HSL (Hue, Saturation, Lightness) and a RGB (Red, Green, Blue) representation. Furthermore we have access to the name of the color via the &lt;code&gt;name&lt;/code&gt; attribute.&lt;/p&gt;

&lt;p&gt;In our analysis we’re only interested in the name and the RGB value of every color. Given that we can create a simple dict which key is the lowercased color name and its value is a tuple containing the Red, Green and Blue values respectively.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;colors = dict()

for color in color_data:
    name = color['name'].lower()
    r = color['rgb']['r']
    g = color['rgb']['g']
    b = color['rgb']['b']
    rgb = tuple([r, g, b])
    colors[name] = rgb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;To validate that our data structure works the way we described above we can print out some sample colors with their RGB values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print('Black: %s' % (colors['black'],))
print('White: %s' % (colors['white'],))

print()

print('Red: %s' % (colors['red'],))
print('Lime: %s' % (colors['lime'],))
print('Blue: %s' % (colors['blue'],))

Black: (0, 0, 0)
White: (255, 255, 255)

Red: (255, 0, 0)
Lime: (0, 255, 0)
Blue: (0, 0, 255)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;While our dict is a good starting point it’s often easier and sometimes faster to do computations on the data if it’s stored in a Pandas &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html"&gt;DataFrame&lt;/a&gt;. The &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html"&gt;&lt;code&gt;from_dict&lt;/code&gt;&lt;/a&gt; function helps us to turn a simple Python dictionary into a DataFrame.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.DataFrame.from_dict(colors, orient='index', columns=['r', 'g', 'b'])
df.head()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9p5qAgkI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.33.34.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9p5qAgkI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.33.34.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Seeing the data formatted in this way we can think of its representation as a mapping of the Red, Green and Blue values into a 3-Dimensional space where for example Red is the X axis, Green is the Y axis and Blue is the Z axis.&lt;/p&gt;

&lt;p&gt;You might recall that we’ve used &lt;a href="https://en.wikipedia.org/wiki/Euclidean_distance"&gt;Euclidean distance&lt;/a&gt; in our Country example above to determine how “similar” countries are. The main idea was that similar countries have less distance between their data points compared to dissimilar countries whose data points are farther apart.&lt;/p&gt;

&lt;p&gt;Another very useful formula to calculate the similarity of data points is the so-called &lt;a href="https://en.wikipedia.org/wiki/Cosine_similarity"&gt;Cosine similarity&lt;/a&gt;. The Cosine similarity measures the angle between two vectors in a multi-dimensional space. The smaller the angle, the more similar the underlying data.&lt;/p&gt;

&lt;p&gt;Translating this to our color example we can think of every color being represented as a vector with 3 values (Red, Green and Blue) which (as stated above) can be mapped to the X, Y and Z axis in a 3D coordinate system. Using the Cosine similarity we can take one of such vectors and calculate the distance between it and the rest of the vectors to determine how similar or dissimilar they are. And that’s exactly what we’ll be doing here.&lt;/p&gt;

&lt;p&gt;The Math notation for the Cosine similarity looks like this:&lt;/p&gt;

&lt;p&gt;[similarity = \cos(\Theta) = \frac{A \cdot B}{\left\lVert A\right\rVert \left\lVert B\right\rVert} ]&lt;/p&gt;

&lt;p&gt;We’re taking the &lt;a href="https://en.wikipedia.org/wiki/Dot_product"&gt;dot-product&lt;/a&gt; between the two vectors A and B and divide it by the product of their &lt;a href="https://en.wikipedia.org/wiki/Magnitude_(mathematics)#Euclidean_vector_space"&gt;magnitudes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The following code-snippet implements such formula. Again, it might look intimidating and rather complicated but if you take some time to read through it you’ll see that it’s not that hard to understand.&lt;/p&gt;

&lt;p&gt;In fact our implementation here does more than just calculating the Cosine similarity. In addition to that we copy our DataFrame containing the colors and add another column to it which will include the distance as a value between 0 and 1. Once done we sort our copied DataFrame by such distance in descending order. We do this to see the computed values when querying for similar colors later on.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def similar(df, coord, n=10):
    # turning our RGB values (3D coordinates) into a numpy array
    v1 = np.array(coord, dtype=np.float64)

    df_copy = df.copy()

    # looping through our DataFrame to calculate the distance for every color
    for i in df_copy.index:
        item = df_copy.loc[i]
        v2 = np.array([item.r, item.g, item.b], dtype=np.float64)
        # cosine similarty calculation starts here
        theta_sum = np.dot(v1, v2)
        theta_den = np.linalg.norm(v1) * np.linalg.norm(v2)
        # check if we're trying to divide by 0
        if theta_den == 0:
            theta = None
        else:
            theta = theta_sum / theta_den
        # adding the `distance` column with the result of our computation
        df_copy.at[i, 'distance'] = theta
    # sorting the resulting DataFrame by distance
    df_copy.sort_values(by='distance', axis=0, ascending=False, inplace=True)
    return df_copy.head(n)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;To validate that our &lt;code&gt;similar&lt;/code&gt; function works we can use it to find similar colors to &lt;code&gt;red&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;similar(df, colors['red'])
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--R54x8KdB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.34.51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--R54x8KdB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.34.51.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can also pass in colors as a list of RGB values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;similar(df, [100, 20, 120])
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HQ10hzLr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.35.27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HQ10hzLr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.35.27.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since it’s hard to imagine what color &lt;code&gt;100&lt;/code&gt;, &lt;code&gt;20&lt;/code&gt; and &lt;code&gt;120&lt;/code&gt; represent it’s worthwhile to use our &lt;code&gt;render_color&lt;/code&gt; function to see it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;render_color(100, 20, 120)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jgovqecY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.36.05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jgovqecY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.36.05.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking at the list of most similar colors from above it appears that &lt;code&gt;darkviolet&lt;/code&gt;is quite similar to &lt;code&gt;100&lt;/code&gt;, &lt;code&gt;20&lt;/code&gt;, &lt;code&gt;120&lt;/code&gt;. Let’s see how this color looks like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;darkviolet = df.loc['darkviolet']
render_color(darkviolet.r, darkviolet.g, darkviolet.b)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZH-7oTlf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.36.44.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZH-7oTlf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.36.44.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And we can validate that &lt;code&gt;darkviolet&lt;/code&gt; in fact looks quite similar to &lt;code&gt;100&lt;/code&gt;, &lt;code&gt;20&lt;/code&gt;, &lt;code&gt;120&lt;/code&gt;!&lt;/p&gt;

&lt;p&gt;But it doesn’t end here. Our 3 color values are numbers in the range of 0 - 255. Given that, it should be possible to do some basic Math computations such as addition or subtraction on them.&lt;/p&gt;

&lt;p&gt;Since we only have access to 256 different colors it’s highly unlikely that our resulting color values for Red, Green and Blue will exactly match one of our 256 colors. That’s where our &lt;code&gt;similar&lt;/code&gt; function comes in handy! The &lt;code&gt;similar&lt;/code&gt;function should make it possible to calculate a new color and find its most similar representation in our 256 color dataset.&lt;/p&gt;

&lt;p&gt;We can look at a &lt;a href="https://www.sessions.edu/color-calculator/"&gt;Color Wheel&lt;/a&gt; to see that subtracintg a &lt;code&gt;red&lt;/code&gt; color from &lt;code&gt;purple&lt;/code&gt;one should result in a Blueish color. Let’s do the Math and check whether that’s true.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;blueish = df.loc['purple'] - df.loc['red']

similar(df, blueish)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--meokZ1nr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.37.20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--meokZ1nr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.37.20.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And sure enough the most similar colors in our dataset are Blueish ones. We can validate that by rendering &lt;code&gt;darkblue&lt;/code&gt;, one of the best matches.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;darkblue = df.loc['darkblue']
render_color(darkblue.r, darkblue.g, darkblue.b)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Z2nxdI1R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.38.45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Z2nxdI1R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.38.45.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s a simple one. If we have Black and add some White to the mix we should get something Greyish, correct?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;greyish = df.loc['black'] + df.loc['white']

similar(df, greyish)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4RrETAGb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.39.21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4RrETAGb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.39.21.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And sure enough we do. Rendering &lt;code&gt;grey93&lt;/code&gt; shows a light grey color.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;grey93 = df.loc['grey93']
render_color(grey93.r, grey93.g, grey93.b)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yjXpFw0r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.40.07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yjXpFw0r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.40.07.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s end our color exploration with a more complex formula. So far we’ve only done some very simple Math like subtracting and adding colors. But there’s more we can do. We can also express our search for a color as a “solve for x” problem.&lt;/p&gt;

&lt;p&gt;Mixing Yellow and Red will result in Orange. We can translate this behavior to other colors as well. Here we ask “Yellow is to Red as X is to Blue” and express it in Math notation to get the result for X.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# yellow is to red as X is to blue
yellow_to_red = df.loc['yellow'] - df.loc['red']
X = yellow_to_red + df.loc['blue']

similar(df, X)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jyXcwlyY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.40.43.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jyXcwlyY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.40.43.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our calculation shows us that &lt;code&gt;lightseargreen&lt;/code&gt; is to Blue as Yellow is to Red. Intuitively that makes sense if you think about it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lightseagreen = df.loc['lightseagreen']
render_color(lightseagreen.r, lightseagreen.g, lightseagreen.b)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--okJJvppX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.41.25.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--okJJvppX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/Bildschirmfoto-2020-01-13-um-19.41.25.png" alt="The intuition behind Word2Vec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Word2Vec
&lt;/h2&gt;

&lt;p&gt;In the beginnig of this tutorial I promised that once done we should understand the intuition behind &lt;a href="https://en.wikipedia.org/wiki/Word2vec"&gt;Word2Vec&lt;/a&gt;, a key component for modern Natural Language Processing models.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Word2Vec&lt;/code&gt; model does to words what we did with our colors represented as RGB values. It maps words into a multi-dimensional space (our colors were mapped into a 3D space). Once such words are mapped into that space we can perform Math calculations on their vectors the same way we e.g. calculated the similarity between our color vectors.&lt;/p&gt;

&lt;p&gt;Having a mapping of words into such a vector space makes it possible to do calculations resulting in:&lt;/p&gt;

&lt;p&gt;[king - man + woman = queen ]&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial we took a deep dive into the main building blocks and intuitions behind Embeddings, a powerful concept which is heavily utilized in modern Natural Language Processing models.&lt;/p&gt;

&lt;p&gt;The main idea is to map data into a multi-dimensional space so that Math calculations from the realm of &lt;a href="https://en.wikipedia.org/wiki/Linear_algebra"&gt;Linear Algebra&lt;/a&gt; can be performed on it.&lt;/p&gt;

&lt;p&gt;We started our journey with a simple example in which we mapped the surface area and population of different countries into a 2D vector space. We then used the &lt;a href="https://en.wikipedia.org/wiki/Euclidean_distance"&gt;Euclidean distance&lt;/a&gt; to verify that certain countries are similar while others are dissimilar based on their metrics.&lt;/p&gt;

&lt;p&gt;Another, more advanced example mapped colors and their RGB representation into a 3D vector space. We then used &lt;a href="https://en.wikipedia.org/wiki/Cosine_similarity"&gt;Cosine similarity&lt;/a&gt; and some basic Math to add and subtract colors.&lt;/p&gt;

&lt;p&gt;With this knowledge we’re now able to understand how more advanced models such as &lt;a href="https://en.wikipedia.org/wiki/Word2vec"&gt;Word2Vec&lt;/a&gt; or &lt;a href="https://cs.stanford.edu/~quocle/paragraph_vector.pdf"&gt;Doc2Vec&lt;/a&gt; make it possible to do calculations on words and texts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lab
&lt;/h2&gt;

&lt;p&gt;You can find more code examples, experiments and tutorials in my GitHub &lt;a href="https://github.com/pmuens/lab"&gt;Lab&lt;/a&gt; repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;p&gt;Eager to learn more? Here’s a list with all the resources I’ve used to write this post.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469"&gt;Allison Parrish - Understanding word vectors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jalammar.github.io/illustrated-word2vec/"&gt;Jay Alammar - The Illustrated Word2Vec&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281"&gt;Derek Chia - A line-by-line implemenation of Word2Vec&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://skymind.ai/wiki/word2vec"&gt;Skymind.ai - A Beginners Guide to Word2Vec&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf"&gt;Word2Vec Paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cs.stanford.edu/~quocle/paragraph_vector.pdf"&gt;Doc2Vec Paper&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>code</category>
    </item>
    <item>
      <title>Minimax and Monte Carlo Tree Search</title>
      <dc:creator>Philipp Muens</dc:creator>
      <pubDate>Tue, 02 Apr 2019 16:21:00 +0000</pubDate>
      <link>https://forem.com/pmuens/minimax-and-monte-carlo-tree-search-27dn</link>
      <guid>https://forem.com/pmuens/minimax-and-monte-carlo-tree-search-27dn</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fveeterzy-sMQiL_2v4vs-unsplash.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fveeterzy-sMQiL_2v4vs-unsplash.jpg" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Do you remember your childhood days when you discovered the infamous game &lt;a href="https://en.wikipedia.org/wiki/Tic-tac-toe" rel="noopener noreferrer"&gt;Tic-Tac-Toe&lt;/a&gt; and played it with your friends over and over again?&lt;/p&gt;

&lt;p&gt;You might’ve wondered if there’s a certain strategy you can exploit that lets you win all the time (or at least force a draw). Is there such an algorithm that will show you how you can defeat your opponent at any given time?&lt;/p&gt;

&lt;p&gt;It turns out there is. To be precise there are a couple of algorithms which can be utilized to predict the best possible moves in games such as &lt;a href="https://en.wikipedia.org/wiki/Tic-tac-toe" rel="noopener noreferrer"&gt;Tic-Tac-Toe&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Connect_Four" rel="noopener noreferrer"&gt;Connect Four&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Chess" rel="noopener noreferrer"&gt;Chess&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Go_(game)" rel="noopener noreferrer"&gt;Go&lt;/a&gt; among others. One such family of algorithms leverages tree search and operates on game state trees.&lt;/p&gt;

&lt;p&gt;In this blog post we’ll discuss 2 famous tree search algorithms called &lt;a href="https://en.wikipedia.org/wiki/Minimax" rel="noopener noreferrer"&gt;Minimax&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search" rel="noopener noreferrer"&gt;Monte Carlo Tree Search&lt;/a&gt; (abbreviated to MCTS). We’ll start our journey into tree search algorithms by discovering the intuition behind their inner workings. After that we’ll see how Minimax and MCTS can be used in modern game implementations to build sophisticated Game AIs. We’ll also shed some light into the computational challenges we’ll face and how to handle them via performance optimization techniques.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Intuition behind tree search
&lt;/h2&gt;

&lt;p&gt;Let’s imagine that you’re playing some games of Tic-Tac-Toe with your friends. While playing you’re wondering what the optimal strategy might be. What’s the best move you should pick in any given situation?&lt;/p&gt;

&lt;p&gt;Generally speaking there are 2 modes you can operate in when determining the next move you want to play:&lt;/p&gt;

&lt;p&gt;Aggressive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Play a move which will cause an immediate win (if possible)&lt;/li&gt;
&lt;li&gt;Play a move which sets up a future winning situation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Defensive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Play a move which prevents your opponent from winning in the next round (if possible)&lt;/li&gt;
&lt;li&gt;Play a move which prevents your opponent from setting up a future winning situation in the next round&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These modes and their respective actions are basically the only strategies you need to follow to win the game of Tic-Tac-Toe.&lt;/p&gt;

&lt;p&gt;The “only” thing you need to do is to look at the current game state you’re in and play simulations through all the potential next moves which could be played. You do this by pretending that you’ve played a given move and then continue playing the game until the end, alternating between the &lt;strong&gt;X&lt;/strong&gt; and &lt;strong&gt;O&lt;/strong&gt; player. While doing that you’re building up a game tree of all the possible moves you and your opponent would play.&lt;/p&gt;

&lt;p&gt;The following illustration shows a simplified version of such a game tree:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Ftic-tac-toe-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Ftic-tac-toe-1.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note that for the rest of this post we’ll only use simplified game tree examples to save screen space&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Of course, the set of strategic rules we’ve discussed at the top is specifically tailored to the game of Tic-Tac-Toe. However we can generalize this approach to make it work with other board games such as Chess or Go. Let’s take a look at &lt;a href="https://en.wikipedia.org/wiki/Minimax" rel="noopener noreferrer"&gt;Minimax&lt;/a&gt;, a tree search algorithm which abstracts our Tic-Tac-Toe strategy so that we can apply it to various other 2 player board games.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Minimax Algorithm
&lt;/h2&gt;

&lt;p&gt;Given that we’ve built up an intuition for tree search algorithms let’s switch our focus from simple games such as Tic-Tac-Toe to more complex games such as Chess.&lt;/p&gt;

&lt;p&gt;Before we dive in let’s briefly recap the properties of a Chess game. Chess is a 2 player deterministic game of perfect information. Sound confusing? Let’s unpack it:&lt;/p&gt;

&lt;p&gt;In Chess, 2 players (Black and White) play against each other. Every move which is performed is ensured to be “fulfilled” with no randomness involved (the game doesn’t use any random elements such as a die). During gameplay every player can observe the whole game state. There’s no hidden information, hence everyone has perfect information about the whole game at any given time.&lt;/p&gt;

&lt;p&gt;Thanks to those properties we can always compute which player is currently ahead and which one is behind. There are several different ways to do this for the game of Chess. One approach to evaluate the current game state is to add up all the remaining white pieces on the board and subtract all the remaining black ones. Doing this will produce a single value where a large value favors white and a small value favors black. This type of function is called an &lt;strong&gt;evaluation function&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Based on this evaluation function we can now define the overall goal during the game for each player individually. White tries to maximize this objective while black tries to minimize it.&lt;/p&gt;

&lt;p&gt;Let’s pretend that we’re deep in an ongoing Chess game. We’re player white and have already played a couple of clever moves, resulting in a large number computed by our evaluation function. It’s our turn right now but we’re stuck. Which of the possible moves is the best one we can play?&lt;/p&gt;

&lt;p&gt;We’ll solve this problem with the same approach we already encountered in our Tic-Tac-Toe gameplay example. We build up a tree of potential moves which could be performed based on the game state we’re in. To keep things simple we pretend that there are only 2 possible moves we can play (in Chess there are on average ~30 different options for every given game state). We start with a (white) root node which represents the current state. Starting from there we’re branching out 2 (black) child nodes which represent the game state we’re in after taking one of the 2 possible moves. From these 2 child nodes we’re again branching out 2 separate (white) child nodes. Each one of those represents the game state we’re in after taking one of the 2 possible moves we could play from the black node. This branching out of nodes goes on and on until we’ve reached the end of the game or hit a predefined maximum tree depth.&lt;/p&gt;

&lt;p&gt;The resulting tree looks something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-1.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Given that we’re at the end of the tree we can now compute the game outcome for each end state with our evaluation function:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-2.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this information we now know the game outcome we can expect when we take all the outlined moves starting from the root node and ending at the last node where we calculated the game evaluation. Since we’re player white it seems like the best move to pick is the one which will set us up to eventually end in the game state with the highest outcome our evaluation function calculated.&lt;/p&gt;

&lt;p&gt;While this is true there’s one problem. There’s still the black player involved and we cannot directly manipulate what move she’ll pick. If we cannot manipulate this why don’t we estimate what the black player will likely do based on our evaluation function? As a white player we always try to maximize our outcome. The black player always tries to minimize the outcome. With this knowledge we can now traverse back through our game tree and compute the values for all our individual tree nodes step by step.&lt;/p&gt;

&lt;p&gt;White tries to maximize the outcome:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-3.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While black wants to minimize it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-4.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once done we can now pick the next move based on the evaluation values we’ve just computed. In our case we pick the next possible move which maximizes our outcome:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-5.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What we’ve just learned is the general procedure of the so-called &lt;a href="https://en.wikipedia.org/wiki/Minimax" rel="noopener noreferrer"&gt;Minimax algorithm&lt;/a&gt;. The Minimax algorithm got its name from the fact that one player wants to &lt;strong&gt;Mini&lt;/strong&gt; -mize the outcome while the other tries to &lt;strong&gt;Max&lt;/strong&gt; -imize it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def minimax(state, max_depth, is_player_minimizer):
  if max_depth == 0 or state.is_end_state():
    # We're at the end. Time to evaluate the state we're in
    return evaluation_function(state)

  # Is the current player the minimizer?
  if is_player_minimizer:
    value = -math.inf
    for move in state.possible_moves():
      evaluation = minimax(move, max_depth - 1, False)
      min = min(value, evaluation)
    return value

  # Or the maximizer?
  value = math.inf
  for move in state.possible_moves():
    evaluation = minimax(move, max_depth - 1, True)
    max = max(value, evaluation)
  return value
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Search space reduction with pruning
&lt;/h2&gt;

&lt;p&gt;Minimax is a simple and elegant tree search algorithm. Given enough compute resources it will always find the optimal next move to play.&lt;/p&gt;

&lt;p&gt;But there’s a problem. While this algorithm works flawlessly with simplistic games such as Tic-Tac-Toe, it’s computationally infeasible to implement it for strategically more involved games such as Chess. The reason for this is the so-called tree branching factor. We’ve already briefly touched on that concept before but let’s take a second look at it.&lt;/p&gt;

&lt;p&gt;In our example above we've artificially restricted the potential moves one can play to 2 to keep the tree representation simple and easy to reason about. However the reality is that there are usually more than 2 possible next moves. On average there are ~30 moves a Chess player can play in any given game state. This means that every single node in the tree will have approximately 30 different children. This is called the width of the tree. We denote the trees width as (w).&lt;/p&gt;

&lt;p&gt;But there's more. It takes roughly ~85 consecutive turns to finish a game of Chess. Translating this to our tree means that it will have an average depth of 85. We denote the trees depth as (d).&lt;/p&gt;

&lt;p&gt;Given (w) and (d) we can define the formula (w^d) which will show us how many different positions we have to evaluate on average.&lt;/p&gt;

&lt;p&gt;Plugging in the numbers for Chess we get (30^{85}). Taking the Go board game as an example which has a width (w) of ~250 and an average depth (d) of ~150 we get (250^{150}). I encourage you to type those numbers into your calculator and hit enter. Needless to say that current generation computers and even large scale distributed systems will take "forever" to crunch through all those computations.&lt;/p&gt;

&lt;p&gt;Does this mean that Minimax can only be used for games such as Tic-Tac-Toe? Absolutely not. We can apply some clever tricks to optimize the structure of our search tree.&lt;/p&gt;

&lt;p&gt;Generally speaking we can reduce the search trees width and depth by pruning individual nodes and branches from it. Let's see how this works in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alpha-Beta Pruning
&lt;/h3&gt;

&lt;p&gt;Recall that Minimax is built around the premise that one player tries to maximize the outcome of the game based on the evaluation function while the other one tries to minimize it.&lt;/p&gt;

&lt;p&gt;This gameplay behavior is directly translated into our search tree. During traversal from the bottom to the root node we always picked the respective “best” move for any given player. In our case the white player always picked the maximum value while the black player picked the minimum value:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-6.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking at our tree above we can exploit this behavior to optimize it. Here’s how:&lt;/p&gt;

&lt;p&gt;While walking through the potential moves we can play given the current game state we’re in we should build our tree in a depth-first fashion. This means that we should start at one node and expand it by playing the game all the way to the end before we back up and pick the next node we want to explore:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-7.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following this procedure allows us to identify moves which will never be played early on. After all, one player maximizes the outcome while the other minimizes it. The part of the search tree where a player would end up in a worse situation based on the evaluation function can be entirely removed from the list of nodes we want to expand and explore. We prune those nodes from our search tree and therefore reduce its width.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-minimax-8.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The larger the branching factor of the tree, the higher the amount of computations we can potentially save!&lt;/p&gt;

&lt;p&gt;Assuming we can reduce the width by an average of 10 we would end up with (w^d = (30 - 10)^{85} = 20^{85}) computations we have to perform. That's already a huge win.&lt;/p&gt;

&lt;p&gt;This technique of pruning parts of the search tree which will never be considered during gameplay is called &lt;a href="https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning" rel="noopener noreferrer"&gt;Alpha-Beta pruning&lt;/a&gt;. Alpha-Beta pruning got its name from the parameters (\alpha) and (\beta) which are used to keep track of the best score either player can achieve while walking the tree.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def minimax(state, max_depth, is_player_minimizer, alpha, beta):
  if max_depth == 0 or state.is_end_state():
    return evaluation_function(state)

  if is_player_minimizer:
    value = -math.inf
    for move in state.possible_moves():
      evaluation = minimax(move, max_depth - 1, False, alpha , beta)
      min = min(value, evaluation)
      # Keeping track of our current best score
      beta = min(beta, evaluation)
      if beta &amp;lt;= alpha:
        break
    return value

  value = math.inf
  for move in state.possible_moves():
    evaluation = minimax(move, max_depth - 1, True, alpha, beta)
    max = max(value, evaluation)
    # Keeping track of our current best score
    alpha = max(alpha, evaluation)
    if beta &amp;lt;= alpha:
      break
  return value
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Alpha-Beta pruning to reduce the trees width helps us utilize the Minimax algorithm in games with large branching factors which were previously considered as computationally too expensive.&lt;/p&gt;

&lt;p&gt;In fact &lt;a href="https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)" rel="noopener noreferrer"&gt;Deep Blue&lt;/a&gt;, the Chess computer developed by &lt;a href="https://www.ibm.com/" rel="noopener noreferrer"&gt;IBM&lt;/a&gt; which &lt;a href="https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov" rel="noopener noreferrer"&gt;defeated&lt;/a&gt; the Chess world champion &lt;a href="https://en.wikipedia.org/wiki/Garry_Kasparov" rel="noopener noreferrer"&gt;Garry Kasparov&lt;/a&gt; in 1997 heavily utilized parallelized Alpha-Beta based search algorithms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monte Carlo Tree Search
&lt;/h2&gt;

&lt;p&gt;It seems like Minimax combined with Alpha-Beta pruning is enough to build sophisticated game AIs. But there’s one major problem which can render such techniques useless. It’s the problem of defining a robust and reasonable evaluation function. Recall that in Chess our evaluation function added up all the white pieces on the board and subtracted all the black ones. This resulted in high values when white had an edge and in low values when the situation was favorable for black. While this function is a good baseline and is definitely worthwhile to experiment with there are usually more complexities and subtleties one needs to incorporate to come up with a sound evaluation function.&lt;/p&gt;

&lt;p&gt;Simple evaluation metrics are easy to fool and exploit once the underlying internals are surfaced. This is especially true for more complex games such as Go. Engineering an evaluation function which is complex enough to capture the majority of the necessary game information requires a lot of thought and interdisciplinary domain expertise in Software Engineering, Math, Psychology and the game at hand.&lt;/p&gt;

&lt;p&gt;Isn’t there a universally applicable evaluation function we could leverage for all games, no matter how simple or complex they are?&lt;/p&gt;

&lt;p&gt;Yes, there is! And it’s called randomness. With randomness we let chance be our guide to figure out which next move might be the best one to pick.&lt;/p&gt;

&lt;p&gt;In the following we’ll explore the so-called &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search" rel="noopener noreferrer"&gt;Monte Carlo Tree Search (MCTS)&lt;/a&gt; algorithm which heavily relies on randomness (the name “Monte Carlo” stems from the gambling district in &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo" rel="noopener noreferrer"&gt;Monte Carlo&lt;/a&gt;) as a core component for value approximations.&lt;/p&gt;

&lt;p&gt;As the name implies, MCTS also builds up a game tree and does computations on it to find the path of the highest potential outcome. But there’s a slight difference in how this tree is constructed.&lt;/p&gt;

&lt;p&gt;Let’s once again pretend that we’re playing Chess as player white. We’ve already played for a couple of rounds and it’s on us again to pick the next move we’d like to play. Additionally let’s pretend that we’re not aware of any evaluation function we could leverage to compute the value of each possible move. Is there any way we could still figure out which move might put us into a position where we could win at the end?&lt;/p&gt;

&lt;p&gt;As it turns out there’s a really simple approach we can take to figure this out. Why don’t we let both player play dozens of random games starting from the state we’re currently in? While this might sound counterintuitive it make sense if you think about it. If both player start in the given game state, play thousands of random games and player white wins 80% of the time, then there must be something about the state which gives white an advantage. What we’re doing here is basically exploiting the &lt;a href="https://en.wikipedia.org/wiki/Law_of_large_numbers" rel="noopener noreferrer"&gt;Law of large numbers (LLN)&lt;/a&gt; to find the “true” game outcome for every potential move we can play.&lt;/p&gt;

&lt;p&gt;The following description will outline how the MCTS algorithm works in detail. For the sake of simplicity we again focus solely on 2 playable moves in any given state (as we’ve already discovered there are on average ~30 different moves we can play in Chess).&lt;/p&gt;

&lt;p&gt;Before we move on we need to get some minor definitions out of the way. In MCTS we keep track of 2 different parameters for every single node in our tree. We call those parameters (t) and (n). (t) stands for "total" and represents the total value of that node. (n) is the "number of visits" which reflects the number of times we've visited this node while walking through the tree. When creating a new node we always initialize both parameters with the value 0.&lt;/p&gt;

&lt;p&gt;In addition to the 2 new parameters we store for each node, there's the so-called "Upper Confidence Bound 1" (UCT) formula which looks like this&lt;/p&gt;

&lt;p&gt;[x_i + C\sqrt{\frac{\ln(N)}{n_i}} ]&lt;/p&gt;

&lt;p&gt;This formula basically helps us in deciding which upcoming node and therefore potential game move we should pick to start our random game series (called "rollout") from. In the formula (x_i) represents the average value of the game state we're working with, (C) is a constant called "temperature" we need to define manually (we just set it to 1.5 in our example here. More on that later), (N) represents the parent node visits and (n_i) represents the current nodes visits. When using this formula on candidate nodes to decide which one to explore further, we're always interested in the largest result.&lt;/p&gt;

&lt;p&gt;Don't be intimidated by the Math and just note that this formula exists and will be useful for us while working with out tree. We'll get into more details about the usage of it while walking through our tree.&lt;/p&gt;

&lt;p&gt;With this out of the way it's time apply MCTS to find the best move we can play.&lt;/p&gt;

&lt;p&gt;We start with the same root node of the tree we're already familiar with. This root node is our start point and reflects the current game state. Based on this node we branch off our 2 child nodes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-1.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first thing we need to do is to use the UCT formula from above and compute the results for both child nodes. As it turns out we need to plug in 0 for almost every single variable in our UCT formula since we haven't done anything with our tree and its nodes yet. This will result in (\infty) for both calculations.&lt;/p&gt;

&lt;p&gt;[S_1 = 0 + 1.5\sqrt{\frac{\ln(0)}{0.0001}} = \infty ]&lt;/p&gt;

&lt;p&gt;[S_1 = 0 + 1.5\sqrt{\frac{\ln(0)}{0.0001}} = \infty ]&lt;/p&gt;

&lt;p&gt;&lt;em&gt;We've replaced the 0 in the denominator with a very small number because division by zero is not defined&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Given this we're free to choose which node we want to explore further. We go ahead with the leftmost node and perform our rollout phase which means that we play dozens of random games starting with this game state.&lt;/p&gt;

&lt;p&gt;Once done we get a result for this specific rollout (in our case the percentage of wins for player white). The next thing we need to do is to propagate this result up the tree until we reach the root node. While doing this we update both (t) and (n) with the respective values for every node we encounter. Once done our tree looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-2.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next up we start at our root node again. Once again we use the UCT formula, plug in our numbers and compute its score for both nodes:&lt;/p&gt;

&lt;p&gt;[S_1 = 30 + 1.5\sqrt{\frac{\ln(1)}{1}} = 30 ]&lt;/p&gt;

&lt;p&gt;[S_2 = 0 + 1.5\sqrt{\frac{\ln(0)}{0.0001}} = \infty ]&lt;/p&gt;

&lt;p&gt;Given that we always pick the node with the highest value we'll now explore the rightmost one. Once again we perform our rollout based on the move this node proposes and collect the end result after we've finished all our random games.&lt;/p&gt;

&lt;p&gt;The last thing we need to do is to propagate this result up until we reach the root of the tree. While doing this we update the parameters of every node we encounter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-3.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've now successfully explored 2 child nodes in our tree. You might've guessed it already. We'll start again at our root node and calculate every child nodes UCT score to determine the node we should further explore. In doing this we get the following values:&lt;/p&gt;

&lt;p&gt;[S_1 = 30 + 1.5\sqrt{\frac{\ln(2)}{1}} \approx 31.25 ]&lt;/p&gt;

&lt;p&gt;[S_2 = 20 + 1.5\sqrt{\frac{\ln(2)}{1}} \approx 21.25 ]&lt;/p&gt;

&lt;p&gt;The largest value is the one we've computed for the leftmost node so we decide to explore that node further.&lt;/p&gt;

&lt;p&gt;Given that this node has no child nodes we add two new nodes which represent the potential moves we can play to the tree. We initialize both of their parameters ((t) and (n)) with 0.&lt;/p&gt;

&lt;p&gt;Now we need to decide which one of those two nodes we should explore further. And you're right. We use the UCT formula to calculate their values. Given that both have (t) and (n) values of zero they're both (\infty) so we decide to pick the leftmost node. Once again we do a rollout, retrieve the value of those games and propagate this value up to the tree until we reach the trees root node, updating all the node parameters along the way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fphilippmuens.com%2Fcontent%2Fimages%2F2020%2F01%2Fchess-mcts-4.png" alt="Minimax and Monte Carlo Tree Search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next iteration will once again start at the root node where we use the UCT formula to decide which child node we want to explore further. Since we can see a pattern here and I don't want to bore you I'm not going to describe the upcoming steps in great detail. What we'll be doing is following the exact same procedure we've used above which can be summarized as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start at the root node and use the UCT formula to calculate the score for every child node&lt;/li&gt;
&lt;li&gt;Pick the child node for which you've computed the highest UCT score&lt;/li&gt;
&lt;li&gt;Check if the child has already been visited&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;If not, do a rollout&lt;/li&gt;
&lt;li&gt;If yes, determine the potential next states from there&lt;/li&gt;
&lt;li&gt;Use the UCT formula to decide which child node to pick&lt;/li&gt;
&lt;li&gt;Do a rollout&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Propagate the result back through the tree until you reach the root node&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We iterate over this algorithm until we run out of time or reached a predefined threshold value of visits, depth or iterations. Once this happens we evaluate the current state of our tree and pick the child node(s) which maximize the value (t). Thanks to dozens of games we've played and the Law of large numbers we can be very certain this move is the best one we can possibly play.&lt;/p&gt;

&lt;p&gt;That's all there is. We've just learned, applied and understood Monte Carlo Tree Search!&lt;/p&gt;

&lt;p&gt;You might agree that it seems like MCTS is very compute intensive since you have to run through thousands of random games. This is definitely true and we need to be very clever as to where we should invest our resources to find the most promising path in our tree. We can control this behavior with the aforementioned "temperature" parameter (C) in our UCT formula. With this parameter we balance the trade-off between "exploration vs. exploitation".&lt;/p&gt;

&lt;p&gt;A large (C) value puts us into "exploration" mode. We'll spend more time visiting least-explored nodes. A small value for (C) puts us into "exploitation" mode where we'll revisit already explored nodes to gather more information about them.&lt;/p&gt;

&lt;p&gt;Given the simplicity and applicability due to the exploitation of randomness, MCTS is a widely used game tree search algorithm. &lt;a href="https://deepmind.com/" rel="noopener noreferrer"&gt;DeepMind&lt;/a&gt; extended MCTS with &lt;a href="https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks" rel="noopener noreferrer"&gt;Deep Neural Networks&lt;/a&gt; to optimize its performance in finding the best Go moves to play. The resulting Game AI was so strong that it reached superhuman level performance and &lt;a href="https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol" rel="noopener noreferrer"&gt;defeated&lt;/a&gt; the Go World Champion Lee Sedol 4-1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this blog post we’ve looked into 2 different tree search algorithms which can be used to build sophisticated Game AIs.&lt;/p&gt;

&lt;p&gt;While &lt;a href="https://en.wikipedia.org/wiki/Minimax" rel="noopener noreferrer"&gt;Minimax&lt;/a&gt; combined with &lt;a href="https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning" rel="noopener noreferrer"&gt;Alpha-Beta pruning&lt;/a&gt; is a solid solution to approach games where an evaluation function to estimate the game outcome can easily be defined, &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search" rel="noopener noreferrer"&gt;Monte Carlo Tree Search (MCTS)&lt;/a&gt; is a universally applicable solution given that no evaluation function is necessary due to its reliance on randomness.&lt;/p&gt;

&lt;p&gt;Raw Minimax and MCTS are only the start and can easily be extended and modified to work in more complex environments. &lt;a href="https://deepmind.com/" rel="noopener noreferrer"&gt;DeepMind&lt;/a&gt; cleverly &lt;a href="https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf" rel="noopener noreferrer"&gt;combined MCTS with Deep Neural Networks&lt;/a&gt; to predict Go game moves whereas &lt;a href="https://ibm.com/" rel="noopener noreferrer"&gt;IBM&lt;/a&gt;extended Alpha-Beta tree search to &lt;a href="https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)" rel="noopener noreferrer"&gt;compute the best possible Chess moves&lt;/a&gt; to play.&lt;/p&gt;

&lt;p&gt;I hope that this introduction to Game AI algorithms sparked your interest in Artificial Intelligence and helps you understand the underlying mechanics you’ll encounter the next time you pick up a board game on your computer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;p&gt;Do you want to learn more about Minimax and Monte Carlo Tree Search? The following list is a compilation of resources I found useful while studying such concepts.&lt;/p&gt;

&lt;p&gt;If you’re really into modern Game AIs I highly recommend the book &lt;a href="https://www.manning.com/books/deep-learning-and-the-game-of-go" rel="noopener noreferrer"&gt;“Deep Learning and the Game of Go”&lt;/a&gt; by &lt;a href="https://maxpumperla.com/" rel="noopener noreferrer"&gt;Max Pumperla&lt;/a&gt; and &lt;a href="https://twitter.com/macfergus" rel="noopener noreferrer"&gt;Kevin Ferguson&lt;/a&gt;. In this book you’ll implement a Go game engine and refine it step-by-step until at the end you implement the concepts &lt;a href="https://deepmind.com/" rel="noopener noreferrer"&gt;DeepMind&lt;/a&gt; used to build &lt;a href="https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf" rel="noopener noreferrer"&gt;AlphaGo&lt;/a&gt; and &lt;a href="https://deepmind.com/research/publications/mastering-game-go-without-human-knowledge/" rel="noopener noreferrer"&gt;AlphaGo Zero&lt;/a&gt; based on their published research papers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Minimax" rel="noopener noreferrer"&gt;Wikipedia - Minimax&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning" rel="noopener noreferrer"&gt;Wikipedia - Alpha-Beta Pruning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Expectiminimax" rel="noopener noreferrer"&gt;Wikipedia - Expectiminimax&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=STjW3eH0Cik" rel="noopener noreferrer"&gt;MIT OpenCourseWare - Games, Minimax, and Alpha-Beta&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=l-hh51ncgDI" rel="noopener noreferrer"&gt;Sebastian Lague - Minimax and alpha-beta pruning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search" rel="noopener noreferrer"&gt;Wikipedia - Monte Carlo Tree Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=UXW2yZndl7U" rel="noopener noreferrer"&gt;John Levine - Monte Carlo Tree Search&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>code</category>
    </item>
    <item>
      <title>Learning Deep Learning</title>
      <dc:creator>Philipp Muens</dc:creator>
      <pubDate>Tue, 05 Mar 2019 14:42:00 +0000</pubDate>
      <link>https://forem.com/pmuens/learning-deep-learning-3e7h</link>
      <guid>https://forem.com/pmuens/learning-deep-learning-3e7h</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--C_cKQzrj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/gery-wibowo-Eti6ph51H4A-unsplash.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--C_cKQzrj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/gery-wibowo-Eti6ph51H4A-unsplash.jpg" alt="Learning Deep Learning"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Deep_learning"&gt;Deep Learning&lt;/a&gt;, a branch of &lt;a href="https://en.wikipedia.org/wiki/Machine_learning"&gt;Machine Learning&lt;/a&gt; gained a lot of traction and press coverage over the last couple of years. Thanks to significant scientific breakthroughs we’re now able to solve a variety of hard problems with the help of machine intelligence.&lt;/p&gt;

&lt;p&gt;Computer systems were taught to &lt;a href="https://cs.stanford.edu/people/esteva/nature/"&gt;identify skin cancer&lt;/a&gt; with a significantly higher accuracy than human doctors do. Neural Networks can generate &lt;a href="https://arxiv.org/pdf/1511.06434.pdf"&gt;photorealistic&lt;/a&gt;images of &lt;a href="https://thispersondoesnotexist.com/"&gt;fake people&lt;/a&gt; and &lt;a href="https://research.nvidia.com/sites/default/files/pubs/2017-10_Progressive-Growing-of/karras2018iclr-paper.pdf"&gt;fake celebrities&lt;/a&gt;. It’s even possible for an algorithm to teach itself &lt;a href="https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/"&gt;entire games&lt;/a&gt; from first principles, surpassing human-level mastery after only a couple of hours training.&lt;/p&gt;

&lt;p&gt;In summary Deep Learning is amazing, mystical and sometimes even scary and intimidating.&lt;/p&gt;

&lt;p&gt;In order to demystify and understand this “Black Box” end-to-end I decided to take a deep dive into Deep Learning, looking at it through the practical as well as the theoretical lens.&lt;/p&gt;

&lt;p&gt;With this post I’d like to share the Curriculum I came up with after spending months following the space, reading books and research papers, doing lectures, classes and courses to find some of the best educational resources out there.&lt;/p&gt;

&lt;p&gt;Before we take a closer look I’d like to point out that the Curriculum as a whole is still a &lt;strong&gt;work in progress and might change over time&lt;/strong&gt; since new material covering state-of-the-art Deep Learning techniques is released on an ongoing basis. Feel free to bookmark this page and revisit it from time to time to stay up-to-date with the most recent changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Approach
&lt;/h2&gt;

&lt;p&gt;During the research phase which resulted in the following Curriculum I triaged dozens of classes, lectures, tutorials, talks, MOOCs, papers and books. While the topics covered were usually the same the required levels of expertise in advanced Mathematics and computer programming were not.&lt;/p&gt;

&lt;p&gt;Generally speaking one can divide most educational Deep Learning resources in two categories: “Shallow” and “Deep”. Authors of “Shallow” resources tend to heavily utilize high-level Frameworks and abstractions without taking enough time to talk about the underlying theoretical pieces. “Deep” resources on the other hand usually take the bottom-up approach, starting with a lot of Mathematical fundamentals until eventually some code is written to translate the theory into practice.&lt;/p&gt;

&lt;p&gt;I personally believe that both is important: Understanding how the technology works under the covers while using Frameworks to put this knowledge into practice. The proposed Curriculum is structured in a way to achieve exactly that. Learning and understanding Deep Learning from a theoretical as well as a practical point-of-view.&lt;/p&gt;

&lt;p&gt;In our case we’ll approach our Deep Learning journey with a slight twist. We won’t follow a strict bottom-up or top-down approach but will blend both learning techniques together.&lt;/p&gt;

&lt;p&gt;Our first touchpoint with Deep Learning will be in a practical way. We’ll use high-level abstractions to build and train Deep Neural Networks which will categorize images, predict and generate text and recommend movies based on historical user data. This first encounter is 100% practice-oriented. We won’t take too much time to learn about the Mathematical portions just yet.&lt;/p&gt;

&lt;p&gt;Excited about the first successes we had we’ll brush up our Mathematical understanding and take a deep dive into Deep Learning, this time following a bottom-up approach. Our prior, practical exposure will greatly benefit us here since we already know what outcomes certain methodologies produce and therefore have specific questions about how things might work under the hood.&lt;/p&gt;

&lt;p&gt;In the last part of this Curriculum we’ll learn about Deep Reinforcement Learning which is the intersection of &lt;a href="https://en.wikipedia.org/wiki/Reinforcement_learning"&gt;Reinforcement Learning&lt;/a&gt; and Deep Learning. A thorough analysis of &lt;a href="https://deepmind.com/blog/alphago-zero-learning-scratch/"&gt;AlphaGo Zero&lt;/a&gt;, the infamous agent that learned the &lt;a href="https://en.wikipedia.org/wiki/Go_(game)"&gt;Go board game&lt;/a&gt; from scratch and later on played against itself to become basically unbeatable by humans, will help us understand and appreciate the capabilities this approach has to offer.&lt;/p&gt;

&lt;p&gt;During our journey we’ll work on two distinct Capstone projects (“Capstone I” and “Capstone II”) to put our knowledge into practice. While working on this we’ll solve real problems with Deep Neural Networks and build up a professional portfolio we can share online.&lt;/p&gt;

&lt;p&gt;Once done we’ll be in a good position to continue our Deep Learning journey reading through the most recent academic research papers, implementing new algorithms and coming up with our own ideas to contribute to the Deep Learning community.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Curriculum
&lt;/h2&gt;

&lt;p&gt;As already discussed above, Deep Learning is… Deep. Given the traction and momentum, Universities, Companies and individuals have published a near endless stream of resources including academic research papers, Open Source tools, reference implementations as well as educational material. During the last couple of months I spent my time triaging those to find the highest quality, yet up-to-date learning resources.&lt;/p&gt;

&lt;p&gt;I then took a step back and structured the materials in a way which makes it possible to learn Deep Learning from scratch up to a point where enough knowledge is gained to solve complex problems, stay on top of the current research and participate in it.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. A Practical Encounter
&lt;/h3&gt;

&lt;p&gt;We begin our journey in the land of Deep Learning with a top-down approach by introducing the subject “Deep Learning” in a practical and playful way. We won’t start with advanced college Math, theoretical explanations and abstract AI topics. Rather we’ll dive right into the application of tools and techniques to solve well-known problems.&lt;/p&gt;

&lt;p&gt;The main reason of doing this is that it keeps us motivated since we’ll solve those problems with state-of-the-art implementations which will help us see and understand the bigger picture. It’s a whole lot easier to take a look under the covers of the abstractions we’ll use once we know what can be achieved with such. We’ll automatically come up with questions about certain results and behaviors and develop an own intuition and excitement to understand how the results came to be.&lt;/p&gt;

&lt;p&gt;In doing this we’ll take the great &lt;a href="https://course.fast.ai/"&gt;“Practical Deep Learning for Coders”&lt;/a&gt; course by the &lt;a href="https://fast.ai/"&gt;Fast.ai&lt;/a&gt; team which will walk us through many real-world examples of Deep Neural Network usage. Theoretical concepts aren’t completely left out but will be discussed “just-in-time”.&lt;/p&gt;

&lt;p&gt;It’s important to emphasize that it’s totally fine (and expected) that we won’t understand everything which is taught during this course the first time we hear about it. Most of the topics will be covered multiple times throughout this Curriculum so we’ll definitely get the hang of it later on. If you’re having problems with one topic or the other, feel free to rewatch the respective part in the video or do some research on your own. Keep in mind though that you shouldn’t get too deep into the weeds since our main focus is still on the practical portions.&lt;/p&gt;

&lt;p&gt;You should definitely recreate each and every single &lt;a href="https://jupyter.org/"&gt;Jupyter Notebook&lt;/a&gt; which was used in the &lt;a href="https://fast.ai/"&gt;Fast.ai&lt;/a&gt; course from scratch. This helps you to get a better understanding of the workflow and lets you play around with the parameters to see the effects they have on the data.&lt;/p&gt;

&lt;p&gt;When done it’s a good idea to watch the following &lt;a href="https://www.youtube.com/watch?v=vq2nnJ4g6N0"&gt;great talk by Google&lt;/a&gt; and this &lt;a href="https://www.youtube.com/playlist?list=PLWKotBjTDoLj3rXBL-nEIPRN9V3a9Cx07"&gt;mini-course by Leo Isikdogan&lt;/a&gt; to solidify the knowledge we’ve just acquired.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://course.fast.ai/"&gt;Fast.ai - Practical Deep Learning for Coders&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/gcp/learn-tensorflow-and-deep-learning-without-a-phd"&gt;Learn TensorFlow and Deep Learning without a Ph.D.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/playlist?list=PLWKotBjTDoLj3rXBL-nEIPRN9V3a9Cx07"&gt;Leo Isikdogan - Deep Learning Crash Course&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Mathematical Foundations
&lt;/h3&gt;

&lt;p&gt;Once we have a good understanding of what Deep Learning is, how it’s used in practice and how it roughly works under the hood it’s time to take a step back and refresh our Math knowledge. Deep Neural Networks heavily utilize Matrix multiplications, non-linearities and optimization algorithms such as &lt;a href="https://en.wikipedia.org/wiki/Gradient_descent"&gt;Gradient Descent&lt;/a&gt;. We therefore need to familiarize ourselves with &lt;a href="https://en.wikipedia.org/wiki/Linear_algebra"&gt;Linear Algebra&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Calculus"&gt;Calculus&lt;/a&gt; and some basic &lt;a href="https://en.wikipedia.org/wiki/Probability"&gt;Probability Theory&lt;/a&gt; which build the Mathematical foundations of Deep Learning.&lt;/p&gt;

&lt;p&gt;While this is certainly advanced Mathematics it’s important to highlight that High School level Math knowledge is usually enough to get by in the beginnings. For the most part we should just refresh our knowledge a little bit. It’s definitely not advisable to spent weeks or even months studying every aspect of Linear Algebra, Calculus or Probability Theory (if that’s even possible) to consider this part “done”. Basic fluency in the aforementioned topics is enough. There’s always enough time to learn the more advanced topics as soon as we come across them.&lt;/p&gt;

&lt;p&gt;Having a good Mathematical understanding will pay dividends later on as we progress with more advanced Deep Learning topics. Don’t be intimidated by this part of the Curriculum. Mathematics can and should be fun!&lt;/p&gt;

&lt;p&gt;Stanford has some great refreshers on &lt;a href="http://cs229.stanford.edu/section/cs229-linalg.pdf"&gt;Linear Algebra&lt;/a&gt; and &lt;a href="http://cs229.stanford.edu/section/cs229-prob.pdf"&gt;Probability Theory&lt;/a&gt;. If that’s too shallow and you need a little bit more to get up to speed you might find Part 1 of the &lt;a href="https://www.deeplearningbook.org/"&gt;Deep Learning Book&lt;/a&gt; helpful.&lt;/p&gt;

&lt;p&gt;Once you’ve brushed up the basics it’s worthwhile to take a couple of days and thoroughly study &lt;a href="https://arxiv.org/pdf/1802.01528.pdf"&gt;“The Matrix Calculus You Need For Deep Learning”&lt;/a&gt; by Terence Parr and Jeremy Howard (one of the founders of &lt;a href="https://fast.ai/"&gt;Fast.ai&lt;/a&gt;) and the &lt;a href="https://github.com/fastai/numerical-linear-algebra"&gt;“Computational Linear Algebra”&lt;/a&gt; course by Rachel Thomas (also a co-founder of &lt;a href="https://fast.ai/"&gt;Fast.ai&lt;/a&gt;). Both resources are heavily tailored to teach the Math behind Deep Learning.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://cs229.stanford.edu/section/cs229-linalg.pdf"&gt;Stanford CS229 - Linear Algebra Refresher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://cs229.stanford.edu/section/cs229-prob.pdf"&gt;Stanford CS229 - Probability Theory Refresher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.deeplearningbook.org/"&gt;Deep Learning Book - Part I&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1802.01528.pdf"&gt;Terence Parr, Jeremy Howard - The Matrix Calculus You Need For Deep Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/fastai/numerical-linear-algebra"&gt;Rachel Thomas - Computational Linear Algebra&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Deep Dive
&lt;/h3&gt;

&lt;p&gt;Now we’re armed with a good understanding of the capabilities and the underlying Math of Deep Learning.&lt;/p&gt;

&lt;p&gt;Given this it’s time to take a deep dive to broaden our knowledge of Deep Learning. The main goal of this part is to take the practical experience and blend it with our Mathematical exposure to fully understand the theoretical building blocks of Deep Neural Networks. A thorough understanding of this will be key later on once we learn more about topics such as Deep Reinforcement Learning.&lt;/p&gt;

&lt;p&gt;The following describes 3 different ways to take the deep dive. The approaches are certainly not mutually exclusive but could (and should) be used in conjunction to complement each other.&lt;/p&gt;

&lt;p&gt;The path you might want to take will depend on your prior exposure to Deep Learning and you favorite learning style.&lt;/p&gt;

&lt;p&gt;If you’re a person who appreciates classical MOOCs in the form of high-quality, pre-recorded videos with quizzes and exercises you’ll definitely enjoy &lt;a href="https://www.andrewng.org/"&gt;Andrew Ng’s&lt;/a&gt; &lt;a href="https://www.deeplearning.ai/deep-learning-specialization/"&gt;Deeplearning.ai “Specialization for Deep Learning”&lt;/a&gt;. This course is basically split up into 5 different sub-courses which will take you from the basics of Neural Networks to advanced topics such as as &lt;a href="https://en.wikipedia.org/wiki/Recurrent_neural_network"&gt;Recurrent Neural Networks&lt;/a&gt;. While learning about all of this you’ll also pick up a lot of valuable nuggets Andrew shares as he talks about his prior experience as a Deep Learning practicioner.&lt;/p&gt;

&lt;p&gt;You can certainly get around the tuition fee for the &lt;a href="https://www.deeplearning.ai/"&gt;Deeplearning.ai&lt;/a&gt;specialization, but it’s important to emphasize that it’s definitely worth every penny! You’ll have access to high quality course content, can request help when you’re stuck and get project reviews by classmates and experts.&lt;/p&gt;

&lt;p&gt;Readers who enjoy books should definitely look into the &lt;a href="http://d2l.ai/"&gt;“Dive into Deep Learning” book&lt;/a&gt;. This book was created to be a companion guide for the &lt;a href="https://courses.d2l.ai/"&gt;STAT 157 course&lt;/a&gt; at &lt;a href="https://www.berkeley.edu/"&gt;UC Berkeley&lt;/a&gt; but turned into more than that. The main focus of this book is to be at the intersection of Mathematical formulations, real world applications and the intuition behind Deep Learning complemented by &lt;a href="https://github.com/d2l-ai/notebooks"&gt;interactive Jupyter Notebooks&lt;/a&gt; to play around with. “Dive into Deep Learning” covers all of the important concepts of a modern Deep Learning class. It requires no prior knowledge and starts with the basics of Neural Networks while moving onwards to cover advanced topics such as &lt;a href="https://en.wikipedia.org/wiki/Convolutional_neural_network"&gt;Convolutional Neural Networks&lt;/a&gt;, ending in discussions about state-of-the-art &lt;a href="https://en.wikipedia.org/wiki/Natural_language_processing"&gt;NLP&lt;/a&gt; implementations.&lt;/p&gt;

&lt;p&gt;Another method to study Deep Learning in great detail is with the help of recorded university class videos. &lt;a href="http://www.mit.edu/"&gt;MIT&lt;/a&gt; released the terrific &lt;a href="http://introtodeeplearning.com/"&gt;“Introduction to Deep Learning” course&lt;/a&gt; which is basically a recording of their 6.S191 class accessible for everyone to watch! This option is definitely one of the more advanced ways to learn the subject as some prior university-level Math and Computer Science knowledge is necessary to grok it. The huge benefit of this format is that it touches on a lot of different topics other courses simply don’t cover due to missing prerequisites. If you’ve already been exposed to university-level Computer Science and Mathematics and like to learn with a focus on more rigor theory, then this course is definitely for you.&lt;/p&gt;

&lt;p&gt;Whatever route you take, it’s really important that you take your time to revisit concepts and recreate their implementations from scratch. It’s totally fine if you’re struggling at first. It’s this wandering through the dark alleys where you’ll actually learn the most! Don’t waste your time passively consuming content. Go out and reproduce what you’ve just learned!&lt;/p&gt;

&lt;p&gt;At the end of the day it doesn’t really matter what format you choose. All courses will equally well prepare you for the next step in your journey to Deep Learning mastery which is your first Capstone project!&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.deeplearning.ai/"&gt;Deeplearning.ai - Deep Learning Specializatios)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://d2l.ai/"&gt;UC Berkeley - Dive into Deep Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://introtodeeplearning.com/"&gt;MIT - Introduction to Deep Learning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Capstone Project I
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Focus:&lt;/strong&gt; Supervised Deep Learning&lt;/p&gt;

&lt;p&gt;Enough theory (for now). It’s time to put our hard earned knowledge to practice.&lt;/p&gt;

&lt;p&gt;In our first Capstone project we’ll demonstrate that we fully understand the basic building blocks of modern Deep Learning. We’ll pick a problem of interest and solve it with the help of a Deep Neural Network. Since we’ve mostly dealt with &lt;a href="https://en.wikipedia.org/wiki/Supervised_learning"&gt;Supervised Learning&lt;/a&gt; so far it’s worth mentioning that our solution will be based on such an implementation.&lt;/p&gt;

&lt;p&gt;Our programmatic environment will be a separate &lt;a href="https://jupyter.org/"&gt;Jupyter Notebook&lt;/a&gt; where we code and describe every step together with a brief justification of its necessity in great detail. Taking the time to think through the steps necessary to solve our problem helps us check ourselves as we have to think through our architecture as well as the underlying processes that take place when our code is executed.&lt;/p&gt;

&lt;p&gt;To further deepen our knowledge and help us get out of the comfort zone we’ll restrict our implementation to the usage of low-level Frameworks, meaning that we’re only allowed to use Frameworks such as &lt;a href="https://pytorch.org/"&gt;PyToch&lt;/a&gt;, &lt;a href="https://www.tensorflow.org/"&gt;TensorFlow&lt;/a&gt; or &lt;a href="https://mxnet.apache.org/"&gt;MXNet&lt;/a&gt;. Any usage of high-level abstraction libraries such as &lt;a href="https://docs.fast.ai/"&gt;Fastai&lt;/a&gt; or &lt;a href="https://keras.io/"&gt;Keras&lt;/a&gt; is &lt;strong&gt;strictly forbidden&lt;/strong&gt;. Those libraries, while being great for the experienced practicioner, abstract too much away, hindering us to go through the tough decisions and tradeoffs we have to make when working on our problem.&lt;/p&gt;

&lt;p&gt;Remember that this is the part where we’ll learn the most as we’re really getting into the weeds here. Don’t give up as enlightment will find you once you made it. It’s also more than ok to go back and reread / rewatch the course material if you’re having problems and need some help.&lt;/p&gt;

&lt;p&gt;While working on this project always keep in mind that it’s one of your personal portfolio projects you should definitely share online. It’s those projects where you can demonstrate that you’re capable to solve complex problems with Deep Learning technologies. Make sure that you really spend a good portion of your time on it and “make it pretty”.&lt;/p&gt;

&lt;p&gt;Are you struggling to find a good project to work on? Here are some project ideas which will help you get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://yann.lecun.com/exdb/mnist/"&gt;Hand written digit recognition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/"&gt;Semantic Segmentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://ai.stanford.edu/~amaas/data/sentiment/"&gt;Sentiment Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm"&gt;Natural Language Processing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Deep Reinforcement Learning
&lt;/h3&gt;

&lt;p&gt;Deep Reinforcement Learning is the last major topic we’ll cover in this Curriculum.&lt;/p&gt;

&lt;p&gt;One might ask the question as to what the difference between the Deep Learning we’re studying and Deep Reinforcement Learning is. All the techniques we’ve learned and used so far were built around the concept of &lt;a href="https://en.wikipedia.org/wiki/Supervised_learning"&gt;Supervised Learning&lt;/a&gt;. The gist of Supervised Learning is that we utilize large datasets to train our model by showing it data, letting it make predictions about what it thinks the data represents and then using the labeled solution to compute the difference between the prediction and the actual solution. We then use algorithms such as &lt;a href="https://en.wikipedia.org/wiki/Gradient_descent"&gt;Gradient Descent&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Backpropagation"&gt;Backpropagation&lt;/a&gt; to subsequently readjust our model until the predictions it makes meet our expectations.&lt;/p&gt;

&lt;p&gt;You might’ve already noticed that Supervised Learning heavily relies on huge datasets to train and test our models via examples.&lt;/p&gt;

&lt;p&gt;What if there’s a way that our AI can teach itself what it should do based on self-exploration and guidelines we define? That’s where &lt;a href="https://en.wikipedia.org/wiki/Reinforcement_learning"&gt;Reinforcement Learning&lt;/a&gt;comes into play. With Reinforcement Learning we’re able to let our model learn from first principles by exploring the environment. The researches at &lt;a href="https://deepmind.com/"&gt;DeepMind&lt;/a&gt;were one of the first who successfully &lt;a href="https://arxiv.org/pdf/1312.5602v1.pdf"&gt;blended Deep Learning and Reinforcement Learning&lt;/a&gt; to let an AI teach itself to &lt;a href="https://www.youtube.com/watch?v=eG1Ed8PTJ18"&gt;play Atari games&lt;/a&gt;. The only inputs the AI agent got were the raw input pixels and the score.&lt;/p&gt;

&lt;p&gt;In this part of our Curriculum we’ll learn what Reinforcement Learning is and how we can combine Deep Learning and Reinforcement Learning to build machine intelligence which learns to master tasks in an autodidactic way.&lt;/p&gt;

&lt;p&gt;As per usual there are different ways to learn Deep Reinforcement Learning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.simoninithomas.com/"&gt;Thomas Simonini&lt;/a&gt; has a great &lt;a href="https://simoninithomas.github.io/Deep_reinforcement_learning_Course/"&gt;“Deep Reinforcement Learning Course”&lt;/a&gt; which focuses on the practical pieces of Deep Reinforcement Learning as you’ll implement real world applications throughout his class.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://spinningup.openai.com/en/latest/"&gt;OpenAIs “SpinningUp AI”&lt;/a&gt; course is another great resource which strikes a really good balance between practical examples and theoretical foundations.&lt;/p&gt;

&lt;p&gt;If you’re looking for a University-level class which heavily focuses on the theoretical underlyings I’d highly recommend the &lt;a href="https://github.com/enggen/DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning"&gt;“Advanced Deep Learning and Reinforcement Learning Class”&lt;/a&gt; which was taught by &lt;a href="https://www.ucl.ac.uk/"&gt;UCL&lt;/a&gt; and &lt;a href="https://deepmind.com/"&gt;DeepMind&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Every resource listed here will help you understand and apply Deep Reinforcement Learning techniques. While some are more focused on the practical portions others go really deep into the trenches of theoretical rigor. It’s definitely worthwhile to look into all of them to get the all-around view and best mixture between theory and practice.&lt;/p&gt;

&lt;p&gt;Once you successfully made your way through one of the Deep Reinforcement Learning courses it’s a good idea to revisit the key ideas by reading the excellent blog posts &lt;a href="http://karpathy.github.io/2016/05/31/rl/"&gt;“Deep Reinforcement Learning: Pong from Pixels”&lt;/a&gt; by &lt;a href="https://cs.stanford.edu/people/karpathy/"&gt;Andrej Karpathy&lt;/a&gt;and &lt;a href="https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html"&gt;“A (Long) Peek into Reinforcement Learning”&lt;/a&gt; by &lt;a href="https://twitter.com/lilianweng/"&gt;Lilian Weng&lt;/a&gt; as they give a nice, broader overview of the different topics which were covered during class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aside:&lt;/strong&gt; If you’re fascinatied by the possibilities of Reinforcement Learning I’d highly recommend the book &lt;a href="http://www.incompleteideas.net/book/the-book-2nd.html"&gt;“Reinforcement Learning: An Introduction”&lt;/a&gt; by Richard Sutton and Andrew Barto. The recently updated 2nd edition includes chapters about Neuroscience, Deep Neural Networks and more. While it’s possible and desirable to buy the book at your local bookstore you can also access the book as a freely available PDF online.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simoninithomas.github.io/Deep_reinforcement_learning_Course/"&gt;Thomas Simonini - Deep Reinforcement Learning Course&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://spinningup.openai.com/en/latest/"&gt;OpenAI - SpinningUp AI Course&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=fdY7dt3ijgY"&gt;OpenAI - SpinningUp AI Talk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/enggen/DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning"&gt;Advanced Deep Learning and Reinforcement Learning 2018 Course (UCL &amp;amp; DeepMind)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/lebourbon/ADL_RL"&gt;Advanced Deep Learning and Reinforcement Learning 2018 Assignments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.incompleteideas.net/book/the-book-2nd.html"&gt;Richard Sutton, Andrew Barto - Reinforcement Learning: An Introduction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Capstone Project II
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Focus:&lt;/strong&gt; Deep Reinforcement Learning&lt;/p&gt;

&lt;p&gt;It’s time for our second and last Capstone Project where we’ll use Deep Reinforcement Learning to let our AI teach itself to solve difficult real-world problems.&lt;/p&gt;

&lt;p&gt;The same restrictions from our first Capstone project also apply here. We’ll implement the solution in a dedicated &lt;a href="https://jupyter.org/"&gt;Jupyter Notebook&lt;/a&gt; where we write our code and the prose to describe what we’re doing and why we’re doing it. This helps us test our knowledge since we have to take the time to think through our current implementation and its implications to the system as a whole.&lt;/p&gt;

&lt;p&gt;As with the Capstone I project &lt;strong&gt;it’s forbidden&lt;/strong&gt; to use higher level abstraction libraries such as &lt;a href="https://docs.fast.ai/"&gt;Fastai&lt;/a&gt; or &lt;a href="https://keras.io/"&gt;Keras&lt;/a&gt;. Our implementation here should only use APIs provided by lower-level Frameworks such as &lt;a href="https://pytorch.org/"&gt;PyToch&lt;/a&gt;, &lt;a href="https://www.tensorflow.org/"&gt;TensorFlow&lt;/a&gt; or &lt;a href="https://mxnet.apache.org/"&gt;MXNet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Keep in mind that it’s totally fine to feel stuck at some point. Don’t be discouraged! Take your time to revisit the material and ensure that you fill your knowledge gaps before moving on. It’s those moments of struggle where you grow the most. Once you’ve made it, you’ll feel excited and empowered.&lt;/p&gt;

&lt;p&gt;The result of this Capstone project is another crucial piece of your personal Deep Learning portfolio. Make sure to set aside enough time to be able to put in the effort so that you can showcase your implementation online.&lt;/p&gt;

&lt;p&gt;Do you need some inspiration for projects you might want to work on? Here’s a list with some ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://gym.openai.com/"&gt;OpenAI Gym Exercises&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Connect_Four"&gt;Connect Four Game Agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Frogger"&gt;Frogger Game Agent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Deep Learning has gained a lot of traction in last couple of years as major scientific breakthroughs made it finally possible to train and utilize Deep Neural Networks to perform tasks at human expert level ranging from cancer detection to mastery in games such as Go or Space Invaders.&lt;/p&gt;

&lt;p&gt;In this blog post I shared the Curriculum I follow to learn Deep Learning from scratch. Right in the beginning of the journey one learns how Deep Learning techniques are used in practice to solve real-world problems. Once a baseline understanding is established it’s time to take a deep dive into the Mathematical and theoretical pieces to demystify the Deep Learning “Black Box”. A final exploration of the intersection of Deep Learning and Reinforcement Learning puts the reader in a great position to understand state-of-the art Deep Learning solutions. Throughout the whole Curriculum we’ll pratice our skills and showcase our fluency in such while working on dedicated Capstone projects.&lt;/p&gt;

&lt;p&gt;While putting this together I had the feeling that this Curriculum can look quite intimidating at first glance since lots of topics are covered and it’ll definitely take some time to get through it. While I’d advise the avid reader to follow every single step in the outlined order it’s totally possible to adapt and skip some topics given that everyone has different experiences, goals and interests. Learning Deep Learning should be fun and exciting. If you ever feel exhausted or struggle to get through a certain topic you should take a step back and revisit it later on. Oftentimes complicated facts and figures turn into no-brainers if we give ourselves the permission and time to do something else for the moment.&lt;/p&gt;

&lt;p&gt;I personally believe that it’s important to follow a goal while learning a new topic or skill. Make sure that you know &lt;strong&gt;why&lt;/strong&gt; you want to learn Deep Learning. Do you want to solve a problem at your company? Are you planning to switch careers? Is a high level overview enough for you since you just want to be educated about AI and its social impacts? Whatever it is, keep this goal in mind as it’ll make everything reasonable and easier during the hard times when the motivation might be lacking and everything just feels too hard to pick up.&lt;/p&gt;

</description>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Learning Advanced Mathematics</title>
      <dc:creator>Philipp Muens</dc:creator>
      <pubDate>Tue, 19 Feb 2019 09:57:00 +0000</pubDate>
      <link>https://forem.com/pmuens/learning-advanced-mathematics-5e3o</link>
      <guid>https://forem.com/pmuens/learning-advanced-mathematics-5e3o</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ho550WVN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/photo-1453733190371-0a9bedd82893.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ho550WVN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://philippmuens.com/content/images/2020/01/photo-1453733190371-0a9bedd82893.jpeg" alt="Learning Advanced Mathematics"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For me personally Math was one of those mysterious subjects I had to go through in school but never really understood, let alone appreciated. It was too abstract, involved lengthy computations, rode formula memorization with little to no explanation as to why it’s useful and how it’s applied in the real world. Frankly put Math was one of my weakest spots. My parents were surprised and shocked when I told them that I planned to study Computer Science, which is a branch of applied Mathematics. Throughout my life I had a love-hate relationship with Math. I still remember that feeling of relief when I passed my last Math exam in college.&lt;/p&gt;

&lt;p&gt;During my career as a Software Engineer I was mostly Math absent. From time to time I consulted old Computer Science books to do some research on algorithms I then implemented. However those were usually the only touchpoints I had with Math.&lt;/p&gt;

&lt;p&gt;Something changed over the last couple of years. While looking for the next personal challenges and goals to grow I figured that most of the really exciting achievements heavily utilize Math as a fundamental building block. That’s actually true for a lot of scientific fields including Econometrics, Data Science and Artifical Intelligence. It’s easy to follow the news and roughly understand how things might work but once you try to dig deeper and look under the hood it gets pretty hairy.&lt;/p&gt;

&lt;p&gt;I found myself regularly lost somewhere in the dark alleys of Linear Algebra, Calculus and Statistics. Last year I finally stuck a fork in the road. I wanted to fundamentally change my understanding and decided to re-learn Math from scratch. After countless late nights, early mornings and weekends doing classes, exercises and proofs I’m finally at a pretty decent level of understanding advanced Mathematics. Right now I’m building upon this foundation to learn even more.&lt;/p&gt;

&lt;p&gt;During this process I learned one important thing: &lt;strong&gt;Math is really amazing!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Math is the language of nature. Understanding it helps you understand how our world works!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With this blog post I’d like to share how I went from “What is a root of a polynomial again?” to “Generalized Autoregressive Conditional Heteroskedasticity” (at least to some extend). I’ll share the Curriculum I created and followed, the mistakes I made (spoiler: I made a lot) and the most useful resources I used throughout this journey.&lt;/p&gt;

&lt;p&gt;Before we start I want to be honest with you: &lt;strong&gt;Math is a really involved discipline. There’s a lot out there…&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And it can certainly be overwhelming. However if you’re really dedicated and want to put in those hours you’ll make it! If I can do it so can you!&lt;/p&gt;

&lt;p&gt;Please keep in mind that this is the path which worked for me. This doesn’t necessarily mean that it will be as efficient for you. In my case I need to study, self-explain and practice, practice, practice to really understand a topic at hand. I know of people who can just sit in class, listen and ultimately get it. That’s definitely not how I operate.&lt;/p&gt;

&lt;p&gt;Alright. Let’s get started!&lt;/p&gt;

&lt;h2&gt;
  
  
  The Curriculum
&lt;/h2&gt;

&lt;p&gt;Math is one of those subjects where you’ll find a nearly endless stream of resources. Looking closer they all vary widely in terms of quality, density and understandability.&lt;/p&gt;

&lt;p&gt;My first approach to ramp up my Math skills was to skim through an interesting research paper, write down all the Math I won’t understand and look those terms up to study them in greater detail. This was fundamentally wrong on many levels. After some trial and error I took a step back and did a lot of research to figure out which topics I should study to support my goal and how those topics are related to one another.&lt;/p&gt;

&lt;p&gt;The Curriculum I finally put together is a good foundation if you want to jump into other &lt;a href="https://www.thoughtco.com/hard-vs-soft-science-3975989"&gt;“Hard Sciences”&lt;/a&gt;. My personal goal was to learn the knowledge I need to take a really deep dive into Artificial Intelligence. To be more specific I’m really excited about Deep Learning and the next steps in the direction of Machine intelligence.&lt;/p&gt;

&lt;p&gt;Every topic which is covered in this Curriculum uses 3 basic pillars to build a solid Mathematical foundation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intuition&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Videos, interactive visualizations and other helpful resources which outline how the Math came to be and how it works on an intuitive level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deep Dive&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A good enough “deep dive” to get familiar with the foundational concepts while avoiding confusion due to overuse of theorems, proofs, lemmas, etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practicality&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Practice, practice, practice. Resources such as books with lots of exercises to solidify the knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Algebra
&lt;/h3&gt;

&lt;p&gt;Algebra is the first topic which should be studied &lt;strong&gt;extensively&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Having a really good understanding of Algebra makes everything a whole lot easier! Calculus comes down to 90% Algebra most of the time. If you know how to solve Algebra problems you won’t have a hard time in Calculus either.&lt;/p&gt;

&lt;p&gt;Most of you might remember a phrase similar to&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Solve this equation for x”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s what Algebra is about. In an Algebra class you’ll learn about the following topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solving equations&lt;/li&gt;
&lt;li&gt;Solving inequalities&lt;/li&gt;
&lt;li&gt;Polynomials&lt;/li&gt;
&lt;li&gt;Factoring&lt;/li&gt;
&lt;li&gt;Functions&lt;/li&gt;
&lt;li&gt;Graphing&lt;/li&gt;
&lt;li&gt;Symmetry&lt;/li&gt;
&lt;li&gt;Fractions&lt;/li&gt;
&lt;li&gt;Radicals&lt;/li&gt;
&lt;li&gt;Exponents&lt;/li&gt;
&lt;li&gt;Logarithms&lt;/li&gt;
&lt;li&gt;Linear systems of equations&lt;/li&gt;
&lt;li&gt;Nonlinear systems of equations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As stated above it’s of uber importance that you really hone your Algebra skills. I’m repeating myself but Algebra is one of the main building blocks for advanced Mathematics.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://tutorial.math.lamar.edu/Classes/Alg/Alg.aspx"&gt;Paul’s Online Notes - Algebra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.goodreads.com/book/show/41838051-no-bullshit-guide-to-mathematics"&gt;No Bullshit Guide to Mathematics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mhprofessional.com/9781260120769-usa-schaums-outline-of-college-algebra-fifth-edition"&gt;Schaum’s Outlines - College Algebra&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Trigonometry
&lt;/h3&gt;

&lt;p&gt;In Trigonometry you’ll study the relationship of lengths and angles of triangles.&lt;/p&gt;

&lt;p&gt;You’ll learn about the unit circle and it’s relation to sin and cos, cones and their relation to circles, ellipses, parabolas and hyperbolas, Pythagoras’ Theorem and more. Trigonometry is interesting in itself since it can be immediately applied to real life problems.&lt;/p&gt;

&lt;p&gt;Here’s a list of topics you’ll usually learn in a Trigonometry class:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pythagoras’ Theorem&lt;/li&gt;
&lt;li&gt;Sin and cos&lt;/li&gt;
&lt;li&gt;The unit circle&lt;/li&gt;
&lt;li&gt;Trigonometric identities&lt;/li&gt;
&lt;li&gt;Radians vs. Degree&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Generally speaking this course is rather short. Nevertheless it’s a good preparation class for Calculus.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://tutorial.math.lamar.edu/Extras/AlgebraTrigReview/AlgebraTrigIntro.aspx"&gt;Paul’s Online Notes - Algebra Trig Review&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.goodreads.com/book/show/41838051-no-bullshit-guide-to-mathematics"&gt;No Bullshit Guide to Mathematics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mhprofessional.com/9781260011487-usa-schaums-outline-of-trigonometry-sixth-edition-group"&gt;Schaum’s Outlines - Trigonometry&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Calculus
&lt;/h3&gt;

&lt;p&gt;The study of continuous change is one of the main focus areas in Calculus.&lt;/p&gt;

&lt;p&gt;This might sound rather abstract and the intuition behind it is really paradox if you think about it (see &lt;em&gt;“Essence of Calculus”&lt;/em&gt; below). However you might remember that you dealt with Derivatives, Limits and area calculations for functions.&lt;/p&gt;

&lt;p&gt;There are usually 3 different Calculus classes (namely Calculus I, II and II) one can take. Those 3 classes range from easy topics such as “Derivatives” and “Limits” to advanced topics such as “Triple Integrals in Spherical Coordinates”. I’d suggest to definitely take the first class (Calculus I) and continue with the second one (Calculus II) if time permits. If you’re in a hurry taking Calculus I is usually sufficient.&lt;/p&gt;

&lt;p&gt;In Calculus I you’ll learn about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Limits&lt;/li&gt;
&lt;li&gt;Continuity&lt;/li&gt;
&lt;li&gt;L’Hospitals Rule&lt;/li&gt;
&lt;li&gt;Derivatives&lt;/li&gt;
&lt;li&gt;Power, Product, Quotient, Chain rule&lt;/li&gt;
&lt;li&gt;Higher Order Derivatives&lt;/li&gt;
&lt;li&gt;Min / Max Values&lt;/li&gt;
&lt;li&gt;Concavity&lt;/li&gt;
&lt;li&gt;Integrals&lt;/li&gt;
&lt;li&gt;Substitution Rule&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Calculus is an important topic since it’s heavily used in optimization problems to find local minima. The &lt;a href="https://en.wikipedia.org/wiki/Gradient_descent"&gt;“Gradient Descent”&lt;/a&gt; algorithm uses techniques from Calculus such as Derivatives and is leveraged in modern (Deep) Neural Networks to adjust the weights of Neurons during &lt;a href="https://en.wikipedia.org/wiki/Backpropagation"&gt;Backpropagation&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr"&gt;3Blue1Brown - Essence of Calculus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://the-learning-machine.com/article/machine-learning/calculus"&gt;The Learning Machine - Calculus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://tutorial.math.lamar.edu/Classes/CalcI/CalcI.aspx"&gt;Paul’s Online Notes - Calculus I&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.goodreads.com/book/show/22876442-no-bullshit-guide-to-math-and-physics"&gt;No Bullshit Guide to Math and Physics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mhprofessional.com/9780071795531-usa-schaums-outline-of-calculus-6th-edition-group"&gt;Schaum’s Outlines - Calculus&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Linear Algebra
&lt;/h3&gt;

&lt;p&gt;Linear Algebra is one of the most, if not &lt;strong&gt;the most&lt;/strong&gt; important topic when learning Math for Data Science, Artificial Intelligence and Deep Learning.&lt;/p&gt;

&lt;p&gt;Linear Algebra is pretty much omnipresent in modern computing since it lets you efficiently do calculations on multi-dimensional data. During childhood you probably spent quite some time in in front of your computer screen while wading through virtual worlds. Photorealistic 3D renderings are possible thanks to Math and more specifically Linear Algebra.&lt;/p&gt;

&lt;p&gt;Linear Algebra courses usually cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Systems of Equations&lt;/li&gt;
&lt;li&gt;Vectors&lt;/li&gt;
&lt;li&gt;Matrices&lt;/li&gt;
&lt;li&gt;Inverse Matrices&lt;/li&gt;
&lt;li&gt;Identity Matrix&lt;/li&gt;
&lt;li&gt;Matrix Arithmetic&lt;/li&gt;
&lt;li&gt;Determinants&lt;/li&gt;
&lt;li&gt;Dot &amp;amp; Cross Product&lt;/li&gt;
&lt;li&gt;Vector Spaces&lt;/li&gt;
&lt;li&gt;Basis and Dimension&lt;/li&gt;
&lt;li&gt;Linear Transformation&lt;/li&gt;
&lt;li&gt;Eigenvectors &amp;amp; Eigenvalues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As already stated above, Linear Algebra is one of the most important topics in modern computing. Lots of problems such as image recognition can be broken down into calculations on multi-dimensional data.&lt;/p&gt;

&lt;p&gt;You might have heard about the Machine Learning framework &lt;a href="https://www.tensorflow.org/"&gt;TensorFlow&lt;/a&gt; which was developed and made publicly available by &lt;a href="https://google.com/"&gt;Google&lt;/a&gt;. Well, a &lt;a href="https://en.wikipedia.org/wiki/Tensor"&gt;Tensor&lt;/a&gt; is just fancy word for a higher-dimensional way to organize information. Hence a Scalar is a Tensor of rank 0, a Vector is a Tensor of rank 1, a N x N Matrix is a Tensor of rank 2, etc.&lt;/p&gt;

&lt;p&gt;Another interesting fact is that Deep Neural Networks are usually trained on &lt;a href="https://en.wikipedia.org/wiki/Graphics_processing_unit"&gt;GPUs&lt;/a&gt; (Graphic Processing Unit) or &lt;a&gt;TPUs&lt;/a&gt; (Tensor Processing Unit). The simple reason is that GPUs and TPUs are way better at processing Linear Algebra computations compared to CPUs since (at least GPUs) were invented as a dedicated hardware unit to do exactly that when rendering computer graphics.&lt;/p&gt;

&lt;p&gt;Aside: &lt;a href="http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf"&gt;Here’s&lt;/a&gt; the original paper by Andrew Ng et al. where GPUs were first explored to carry out Deep Learning computations.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab"&gt;3Blue1Brown - Essence of Linear Algebra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://the-learning-machine.com/article/machine-learning/linear-algebra"&gt;The Learning Machine - Linear Algebra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cs.cornell.edu/courses/cs485/2006sp/LinAlg_Complete.pdf"&gt;Paul’s Online Notes - Linear Algebra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.goodreads.com/book/show/34760208-no-bullshit-guide-to-linear-algebra"&gt;No Bullshit Guide to Linear Algebra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mhprofessional.com/9781260011456-usa-schaums-outline-of-linear-algebra-sixth-edition"&gt;Schaum’s Outlines - Linear Algebra&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Statistics &amp;amp; Probabilities
&lt;/h3&gt;

&lt;p&gt;The last topic which should be covered in this Curriculum is Statistics &amp;amp; Probabilities.&lt;/p&gt;

&lt;p&gt;While both topics are sometimes taught separately it makes sense to learn them in conjunction since statistics and probabilities share a deep underlying relationship.&lt;/p&gt;

&lt;p&gt;A typical Statistics &amp;amp; Probabilities class covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Charting and plotting&lt;/li&gt;
&lt;li&gt;Probability&lt;/li&gt;
&lt;li&gt;Conditional Probability&lt;/li&gt;
&lt;li&gt;Bayes Rule&lt;/li&gt;
&lt;li&gt;Probability Distributions&lt;/li&gt;
&lt;li&gt;Average&lt;/li&gt;
&lt;li&gt;Variance&lt;/li&gt;
&lt;li&gt;Binomial Distribution&lt;/li&gt;
&lt;li&gt;Central Limit Theorem&lt;/li&gt;
&lt;li&gt;Normal Distribution&lt;/li&gt;
&lt;li&gt;Confidence Intervals&lt;/li&gt;
&lt;li&gt;Hypothesis Test&lt;/li&gt;
&lt;li&gt;Regression&lt;/li&gt;
&lt;li&gt;Correlation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Data Science one usually has to deal with statistical analysis to see if the computations actually made sense. Furthermore it’s helpful to compute and visualize correlations between data and certain events. Bayes Rule is another important tool which helps us update our belief about our view of “the world” when more evidence is available. The realms of Machine Learning and Deep Learning usually deal with lots of uncertainty. Having a good toolbox to deal with this makes our life a whole lot easier.&lt;/p&gt;

&lt;p&gt;A pretty popular example of applied statistics is the &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search"&gt;Monte Carlo Tree Search&lt;/a&gt; algorithm. This heuristic algorithm was used in &lt;a href="https://deepmind.com/"&gt;DeepMinds&lt;/a&gt; AI breakthrough “AlphaGo” to determine which moves it should consider while playing the &lt;a href="https://en.wikipedia.org/wiki/Go_(game)"&gt;Go boardgame&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Feel free to read through the &lt;a href="https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf"&gt;official paper&lt;/a&gt; for more information about the underlying technologies. Trust me, it’s amazing to read and understand how Math played such a huge role to build such a powerful contestant.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resources
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.math.louisville.edu/~pksaho01/teaching/Math662TB-09S.pdf"&gt;University of Louisville - Probability and Mathematical Statistics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mv.helsinki.fi/home/jmisotal/BoS.pdf"&gt;Jarkko Isotalo - Basics of Statistics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.math.arizona.edu/~jwatkins/statbook.pdf"&gt;Joseph Watkins - An Introduction to the Science of Statistics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/playlist?list=PLvxOuBpazmsOGOursPoofaHyz_1NpxbhA"&gt;JBStatistics - Basics of Probability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://seeing-theory.brown.edu/"&gt;Brown University - Seeing Theory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mhprofessional.com/9780071795579-usa-schaums-outline-of-probability-and-statistics-4th-edition-group"&gt;Schaum’s Outlines - Probability and Statistics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Mistakes
&lt;/h2&gt;

&lt;p&gt;As I already stated above it’s been quite a journey and I made lots of mistakes along the way.&lt;/p&gt;

&lt;p&gt;In this section I’d like to share some of those mistakes so that you don’t have to go through this yourself.&lt;/p&gt;

&lt;p&gt;The first mistake I made was jumping straight into Math without having a clear plan / Curriculum and more importantly goal. I dived right into certain topics I picked up while reading research papers and quickly figured that some of them were too advanced since I understood only little (if anything at all). My approach was to back off and start somewhere else. “Trial an error” so to say. This was obviously very costly in terms of time and resources.&lt;/p&gt;

&lt;p&gt;The solution here was to actually have a clear goal (learning Math to understand the underlying principles of Artificial Intelligence) and take the time to research a lot to come up with a sound Curriculum and start from there. Having that sorted out I only had to follow the path and knew that I was good.&lt;/p&gt;

&lt;p&gt;During this aforementioned trial and error phase I made the mistake of taking way too many &lt;a href="https://en.wikipedia.org/wiki/Massive_open_online_course"&gt;MOOCs&lt;/a&gt;. Don’t get me wrong, MOOCs are great! It has never been possible before to take an MIT course from your couch. In my case exactly that was the problem. Most of the time I was passively watching the course content nodding along. After a couple of “completed courses” and the feeling of knowing the ins and outs I jumped into more sophisticated problems to figure that I developed a pretty shallow knowledge.&lt;/p&gt;

&lt;p&gt;Doing a retrospective on the “completed courses” I saw that my learning style isn’t really tailored to MOOCs. I decided to switch my focus to the good old physical textbooks. I especially focused on textbooks with good didactics, lots of examples and exercises with solutions (the &lt;a href="https://www.mhprofessional.com/schaum-s"&gt;Schaum’s Outlines&lt;/a&gt; series is golden here). Switching from passively consuming to actively participating in the form of working through numerous exercises was really the breakthrough for me. It ensured that I left my comfort zone, went deep into the trenches and really battle-tested my knowledge about the topic at hand.&lt;/p&gt;

&lt;p&gt;The other upside of using textbooks is that it made it possible to learn in a distraction free environment. No computer, no notifications, no distractions. Just me, a black coffee and my Math textbook!&lt;/p&gt;

&lt;p&gt;Another, final tip I’d like to share is that you should really keep track of your feeling and engagement while studying. Do you feel fired up? Are you excited? Or are you just consuming and your thoughts are constantly wandering off because you don’t really care that much? If that’s the case then it’s usually time to move on. Don’t try to push through. There’s nothing worse than completing a course just for the sake of completing it. If it doesn’t feel right or isn’t working for you it’s important to let go and move on. There’s enough material and maybe the next one suits your needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this blog post I’ve shared my journey from being someone who acknowledged Math as something one should’ve heard about to someone who learned to love Math and its applications to solve complex problems. In order to really understand and learn more about Artificial Intelligence and Deep Learning I created a Curriculum which does not only cover the underlying Math concepts of such disciplines but will serve the student well when learning more about other “Hard Sciences” such as Computer Science in general, Physics, Meteorology, Biology, etc.&lt;/p&gt;

&lt;p&gt;I’m still early in my Math journey and there’s an infinite amount of exciting stuff to learn. With the given Curriculum I feel confident that I’ve gained a solid foundation to pick up more complex topics I’ll encounter while learning about Artificial Intelligence, Deep Learning and Advanced Mathematics in general.&lt;/p&gt;

</description>
      <category>math</category>
    </item>
  </channel>
</rss>
