<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Timothy Renner</title>
    <description>The latest articles on Forem by Timothy Renner (@timothyrenner).</description>
    <link>https://forem.com/timothyrenner</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F338540%2Faeb60119-a6a6-4614-9395-b187d2ee3dc0.jpeg</url>
      <title>Forem: Timothy Renner</title>
      <link>https://forem.com/timothyrenner</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/timothyrenner"/>
    <language>en</language>
    <item>
      <title>Functional Python: Fabulous Filter</title>
      <dc:creator>Timothy Renner</dc:creator>
      <pubDate>Sun, 10 May 2020 19:46:30 +0000</pubDate>
      <link>https://forem.com/timothyrenner/functional-python-fabulous-filter-44h3</link>
      <guid>https://forem.com/timothyrenner/functional-python-fabulous-filter-44h3</guid>
      <description>&lt;p&gt;This post is the second in a series on the functional side of Python. I've been told in code reviews that my Python looks like Clojure, which I naturally took as a complement even if it wasn't. So I decided to write a series of posts here detailing how I write functional Python (where appropriate), bit by bit.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/timothyrenner/functional-python-the-mighty-map-4mma"&gt;last post&lt;/a&gt; I wrote on this topic, I discussed &lt;code&gt;map&lt;/code&gt;. &lt;code&gt;map&lt;/code&gt; is one of two built-in higher order functions in Python. There used to be a third, &lt;code&gt;reduce&lt;/code&gt;, but that was since moved into the standard library, and I think that's super weird. I'll explain why when I do a post on &lt;code&gt;reduce&lt;/code&gt;. For today, I'll focus on &lt;code&gt;filter&lt;/code&gt;. This'll be short, because filter's pretty simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Basics
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;filter&lt;/code&gt; is a higher order function that takes two arguments: a function that returns a boolean and a sequence to apply it to. It produces a generator that removes elements of the sequence that are False (or false-ish, like &lt;code&gt;[]&lt;/code&gt; and &lt;code&gt;None&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The classic example is the even/odd filter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;even&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;even&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Pretty simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  When &lt;code&gt;filter&lt;/code&gt; is not the Right Choice
&lt;/h2&gt;

&lt;p&gt;Similar to &lt;code&gt;map&lt;/code&gt;, we can mimic &lt;code&gt;filter&lt;/code&gt;'s functionality with a comprehension.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;even&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;In general, the same rules that applied to &lt;code&gt;map&lt;/code&gt; apply here too. If you're operating on a finite, already-in-memory sequence then a comprehension is more readable. If you've got an infinite sequence and need a generator, &lt;code&gt;filter&lt;/code&gt;'s a good choice. Although you &lt;em&gt;can&lt;/em&gt; use a comprehension to create a generator for infinite sequences, it's not as common. If you need to compose the filter with others, the &lt;code&gt;filter&lt;/code&gt; function is definitely the way to go. I'll cover composition in &lt;em&gt;great&lt;/em&gt; detail in another post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stream Processing
&lt;/h2&gt;

&lt;p&gt;One of the goals of me writing these posts is to show examples of these patterns with real-world projects. This example is adapted from a &lt;a href="https://github.com/timothyrenner/profanity-power-index/blob/master/profanity_power_index/collect_tweets.py"&gt;script&lt;/a&gt; in my Profanity Power Index &lt;a href="https://github.com/timothyrenner/profanity-power-index"&gt;project&lt;/a&gt;, which streams data from Twitter's Streaming API for tweets containing profanity associated with some number of targets. It sends the filtered tweets to Elasticsearch for storage and visualization.&lt;/p&gt;

&lt;p&gt;In my last post I showed how we used &lt;code&gt;map&lt;/code&gt; to convert tweets into documents that Elasticsearch can load. Now I'll show how I used &lt;code&gt;filter&lt;/code&gt; to remove the tweets that only contained clean language.&lt;/p&gt;

&lt;p&gt;This is the function we're going to filter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;contains_profanity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# _extract_text is just a helper function that pulls the text
&lt;/span&gt;    &lt;span class="c1"&gt;# out of the tweet, including any quoted retweets.
&lt;/span&gt;    &lt;span class="n"&gt;tweet_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# PROFANITY is a list of profane words.
&lt;/span&gt;    &lt;span class="c1"&gt;# It was nice to put the swear words in the code itself and not
&lt;/span&gt;    &lt;span class="c1"&gt;# just the commit messages.
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;profanity&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;PROFANITY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;profanity&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tweet_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="c1"&gt;# If we made it this far, it's a clean tweet and we don't want
&lt;/span&gt;    &lt;span class="c1"&gt;# those.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The script (abbreviated) looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# track is the list of targets.
# api is an authenticated Twitter API client.
&lt;/span&gt;&lt;span class="n"&gt;tweet_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetStreamFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;track&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;track&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Filter to the tweets we want.
&lt;/span&gt;&lt;span class="n"&gt;profane_tweet_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contains_profanity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Apply the map function to create Elasticsearch documents.
&lt;/span&gt;&lt;span class="n"&gt;bulk_action_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet_to_bulk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load Elasticsearch.
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;streaming_bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bulk_action_stream&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;See how we were able to daisy-chain &lt;code&gt;filter&lt;/code&gt; and &lt;code&gt;map&lt;/code&gt; together without materializing more than one record into memory at a time? We can build incredibly robust, memory-efficient pipelines by chaining generators together.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I showed that &lt;code&gt;filter&lt;/code&gt; is a lot like &lt;code&gt;map&lt;/code&gt; - it can be replicated with comprehensions or even with loops. But the more complex the pipeline gets, the more complicated that loop gets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;contains_profanity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;tweet_doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweet_to_bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# ... send it to Elasticsearch.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If I wanted to add to this pipeline using the loop, I have to make the choice to either add indentation or use &lt;code&gt;if&lt;/code&gt;/&lt;code&gt;continue&lt;/code&gt; to short circuit the processing. Using &lt;code&gt;map&lt;/code&gt; and &lt;code&gt;filter&lt;/code&gt;, I just add another expression.&lt;/p&gt;

</description>
      <category>python</category>
      <category>functional</category>
    </item>
    <item>
      <title>Functional Python: The Mighty Map</title>
      <dc:creator>Timothy Renner</dc:creator>
      <pubDate>Sun, 03 May 2020 21:10:20 +0000</pubDate>
      <link>https://forem.com/timothyrenner/functional-python-the-mighty-map-4mma</link>
      <guid>https://forem.com/timothyrenner/functional-python-the-mighty-map-4mma</guid>
      <description>&lt;p&gt;Python's an incredibly versatile language. In this post (probably the first of many, we'll see) I'll walk through one of the major workhorses of functional programming in Python: &lt;code&gt;map&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Basics
&lt;/h2&gt;

&lt;p&gt;Feel free to skip this if you already know what map does and just want to get to the part where I describe common usage patterns.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;map&lt;/code&gt; is one of a couple of builtin higher-order functions, meaning it takes a function as one of its arguments. The second argument &lt;code&gt;map&lt;/code&gt; takes is a sequence. All it does is apply the function to the sequence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;add_one&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# &amp;gt;&amp;gt;&amp;gt; [2, 3, 4]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Hopefully what &lt;code&gt;map&lt;/code&gt;'s doing is pretty obvious. What may not be obvious is that &lt;code&gt;map&lt;/code&gt; doesn't return a list, it returns a generator. I'm converting it to a list manually to print it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When &lt;code&gt;map&lt;/code&gt; is not the Right Choice
&lt;/h2&gt;

&lt;p&gt;It's actually a lot simpler to write the above as a list comprehension. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# &amp;gt;&amp;gt;&amp;gt; [2, 3, 4]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;So ... what's the point of &lt;code&gt;map&lt;/code&gt; if comprehensions are simpler &lt;em&gt;and&lt;/em&gt; take less code? The fact that &lt;code&gt;map&lt;/code&gt; returns a generator is a clue. Generators don't materialize the sequences into memory, meaning &lt;code&gt;y&lt;/code&gt; is basically "free" in terms of memory. For the above example &lt;code&gt;map&lt;/code&gt; is a bad choice because it's operating on a list. But what if the sequence is itself a generator?&lt;/p&gt;

&lt;h1&gt;
  
  
  File Processing
&lt;/h1&gt;

&lt;p&gt;Here's an example where &lt;code&gt;map&lt;/code&gt; is a good choice. It's from a &lt;a href="https://github.com/timothyrenner/bfro_sightings_data/blob/master/scripts/load_elasticsearch.py"&gt;script&lt;/a&gt; in &lt;a href="https://github.com/timothyrenner/bfro_sightings_data"&gt;this repository&lt;/a&gt; that scrapes and processes Bigfoot Sightings from the &lt;a href="http://bfro.net/"&gt;BFRO&lt;/a&gt; sighting database. What the script does is take a CSV file with the processed Bigfoot sightings and load it into Elasticsearch. I like to use Elasticsearch and Kibana for checking data quality and light exploratory analysis.&lt;/p&gt;

&lt;p&gt;Elasticsearch takes JSON, and requires a pretty specific schema to load it (at least the streaming bulk helper I'm using does). I'll need a function that takes a dictionary (representing the csv row) and embed it within a dictionary that Elasticsearch's bulk loading mechanism can understand. This is that function, it's not too fancy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bfro_bulk_action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"_op_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"_index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfro_index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfro_report_type_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s"&gt;"_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="s"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"latitude"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                &lt;span class="s"&gt;"lon"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"longitude"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"latitude"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"longitude"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;  &lt;span class="c1"&gt;# This is the rest of the doc
&lt;/span&gt;        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Hopefully you can see where &lt;code&gt;map&lt;/code&gt; can be useful here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;reports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DictReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create the report documents.
&lt;/span&gt;&lt;span class="n"&gt;report_actions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfro_bulk_action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reports&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Note there has been zero processing thus far, and no data is in memory.
&lt;/span&gt;
&lt;span class="c1"&gt;# client here is the Elasticsearch client.
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;streaming_bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report_actions&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# If there's a failure print what happened.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;streaming_bulk&lt;/code&gt; function takes a client (for the HTTP connection to the Elasticsearch instance) and an iterable, which could be a list or a generator or an infinite stream (more on that in a minute). In our case, it's the generator returned by &lt;code&gt;map&lt;/code&gt;, which is itself operating on the generator created by the &lt;code&gt;DictReader&lt;/code&gt; from Python standard library &lt;code&gt;csv&lt;/code&gt; package. &lt;/p&gt;

&lt;p&gt;The most important thing to note here is that only one record's being held in memory at a time. That wouldn't be true if we'd used pandas &lt;code&gt;read_csv&lt;/code&gt;, or if we'd loaded the file into a list. In those cases we'd be constrained to operate only on files small enough to be held in main memory. In this implementation, the only significant resource constraint we have is our patience. The &lt;code&gt;map&lt;/code&gt; + &lt;code&gt;DictReader&lt;/code&gt; combo only ever loads one record into memory at a time. This enables &lt;code&gt;map&lt;/code&gt; to be very effective at operating on &lt;em&gt;infinite&lt;/em&gt; sequences, more commonly known as streams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stream Processing
&lt;/h2&gt;

&lt;p&gt;The final example I'll walk through in this post is inspired by &lt;a href="https://github.com/timothyrenner/profanity-power-index/blob/master/profanity_power_index/collect_tweets.py"&gt;this script&lt;/a&gt;, which is part of a &lt;a href="https://github.com/timothyrenner/profanity-power-index"&gt;project&lt;/a&gt; I wrote to collect profane tweets about people on Twitter. More info &lt;a href="https://timothyrenner.github.io/projects/profanitypowerindex/"&gt;here&lt;/a&gt;, though consider yourself warned: obviously the language is strong. Kinda the point.&lt;/p&gt;

&lt;p&gt;What the script I've linked above does is subscribe to the Twitter Streaming API with a list of tracking targets, filter out the tweets containing profanity, then load them into Elasticsearch (can you tell I'm a fan?). Here's how that works. Let's assume for simplicity that the stream already has the profanity filtered - I'll write another post giving more detail about how I did that later. This leaves one thing to do: wrap the tweet (a dictionary) in a larger dictionary to use with the &lt;code&gt;streaming_bulk&lt;/code&gt; function. I've omitted a few things for simplicity, but you can see the whole script in the link I provided above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_tweet_to_bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"_index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"profanity-power-index"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"tweet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id_str"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s"&gt;"_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id_str"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="c1"&gt;# _extract_text is just a small helper function so we get
&lt;/span&gt;            &lt;span class="c1"&gt;# the retweeted statuses too.
&lt;/span&gt;            &lt;span class="s"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Care to guess what the script looks like?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# track is the list of targets.
# api is an authenticated Twitter API client.
&lt;/span&gt;&lt;span class="n"&gt;tweet_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetStreamFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;track&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;track&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# From the perspective of our code, tweet_stream is an infinite
# sequence. It doesn't matter to us how it gets its contents.
&lt;/span&gt;&lt;span class="n"&gt;bulk_action_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet_to_bulk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# We are still not processing anything at this point.
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;streaming_bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bulk_action_stream&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And that will continue running until you hit Ctrl-C. At no point does it accumulate memory (at least not because of the streams). Twitter stuff aside, it's almost exactly the same as the file example, and that's the point. An iterable is just something you can loop over; it doesn't matter how long it is or where the data comes from. If we were trying to collect the tweets into a list, we'd need to add a way to deal with the memory. But because we're using &lt;code&gt;map&lt;/code&gt; and generators, it doesn't matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Operating on infinite streams of data doesn't necessarily require &lt;code&gt;map&lt;/code&gt;. In fact we could have implemented both of the above examples with regular for loops:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tweet_doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_tweet_to_bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ... other stuff here ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;There's even a way to write a generator as a comprehension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Note the parens rather than brackets.
&lt;/span&gt;&lt;span class="n"&gt;tweet_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_tweet_to_bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tweet_stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Personally, when it comes to streams I tend to prefer &lt;code&gt;map&lt;/code&gt; over comprehensions, even generator comprehensions. Not only is it more succinct, but &lt;code&gt;map&lt;/code&gt; has one very significant advantage over loop constructs and comprehensions: it's a function, and that means it can be composed with other functions. I'll cover that in another post later. For now, hopefully the distinction between &lt;code&gt;map&lt;/code&gt; and comprehensions, including when to use one or the other, is a little clearer.&lt;/p&gt;

</description>
      <category>python</category>
    </item>
    <item>
      <title>Command Line Machine Learning</title>
      <dc:creator>Timothy Renner</dc:creator>
      <pubDate>Thu, 02 Apr 2020 13:45:59 +0000</pubDate>
      <link>https://forem.com/timothyrenner/command-line-machine-learning-569n</link>
      <guid>https://forem.com/timothyrenner/command-line-machine-learning-569n</guid>
      <description>&lt;p&gt;No, this isn't an awesome sed hack that trains logistic regression models with regexes, it's how to build machine learning models with scripts rather than notebooks.&lt;br&gt;
Well actually, &lt;em&gt;how&lt;/em&gt; to do that is pretty straightforward. How to do it &lt;em&gt;effectively&lt;/em&gt; may not be. I'm going to walk through my process and reasoning in this post.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why scripts
&lt;/h2&gt;

&lt;p&gt;Notebooks are nice! What's wrong with training in those? I could (and probably will) write a huge post about why notebooks are bad for writing software in the future. For now I'm going to try writing something that won't get me flamed on Twitter, so here are two (not orthogonal) reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ever try to reproduce a model from someone else's notebook? Unless they've written it well, it's pretty hard.&lt;/li&gt;
&lt;li&gt;Ever try to do a code review on a notebook? It sucks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Writing your model training as a script enables you to train your model in &lt;em&gt;one contained process&lt;/em&gt;. If set up correctly another team member can easily train your model without having to ask you fifty questions about it, something you'll appreciate when that model needs to be trained while you're on vacation. Moreover, code reviews on scripts are far simpler than notebooks. They can be unit tested and run in CI/CD pipelines for production grade ML.&lt;/p&gt;
&lt;h2&gt;
  
  
  My Pattern for Training Scripts
&lt;/h2&gt;

&lt;p&gt;The main idea is this: put everything the model needs as a command line argument, use command line options for hyperparameters, and save the prediction results to a file at the end as well as the serialized model. It's actually pretty simple, and once you get used to iterating at the command line you'll begin to appreciate having everything in a self contained script.&lt;/p&gt;

&lt;p&gt;You'll need only two special ingredients: a main function and some library to parse the command line arguments. I typically use &lt;a href="https://click.palletsprojects.com/en/7.x/"&gt;Click&lt;/a&gt; for managing my command line arguments as it's pretty straightforward to work with. Python's standard library also comes with a module, argparse, that lets you set these things up too but I think it's a little less intuitive personally.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Skeleton
&lt;/h3&gt;

&lt;p&gt;So here's the skeleton:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;click&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;  &lt;span class="c1"&gt;# TODO: Implement.
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"__main__"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now obviously there's nothing in there so it won't do anything, but let me explain what's going on. Basically &lt;code&gt;@click.command()&lt;/code&gt; transforms your main function into a Click command. This enables Click to set up your function with things like a help page, etc for you. The key here is you have to decorate a &lt;em&gt;function&lt;/em&gt;. It can't just be a pile of code hanging around, it has to be a pile of code wrapped in &lt;code&gt;def main()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you don't write a lot of scripts the last part might be unfamiliar. &lt;code&gt;if __name__ == "__main__": ...&lt;/code&gt; effectively says "if this script is invoked as a python main process, run the main function. Otherwise it's just a library. So if I do &lt;code&gt;from model import main&lt;/code&gt; inside another script or the interpreter it won't run, but if I hit &lt;code&gt;python model.py&lt;/code&gt; or &lt;code&gt;python -m model&lt;/code&gt; at the command line it will. Without that, those last two commands won't do anything. Not saying I know personally because I forget the &lt;code&gt;if __name__ == "__main__"&lt;/code&gt; thing a lot or anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Rest
&lt;/h3&gt;

&lt;p&gt;Alright so now we're ready for some code that actually does stuff.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;click&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;xgboost&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;XGBClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.external&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"training_data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"--model-file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"model.pkl"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"--prediction-file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"predictions.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"--n-estimators"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"--max-depth"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"--learning-rate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;training_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prediction_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;training_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;training_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;training_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;training_df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;training_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="s"&gt;"predictions"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;
    &lt;span class="n"&gt;training_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"__main__"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="c1"&gt;# A little disconcerting, but click injects the arguments for you.
&lt;/span&gt;    &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Obviously there'd be a lot more in there than just train and dump. Personally I put &lt;a href="https://mlflow.org/"&gt;mlflow&lt;/a&gt; tracking in there and lots of logging. I also save out plots in a directory for review when it's done (mlflow lets you log these out too which is pretty neat).&lt;/p&gt;

&lt;p&gt;The point is now you can run the whole pipeline with just this at the terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python train_model.py training_data.csv --n-estimators 100

# or ...
python train_model.py training_data.csv --max-depth 10 --learning-rate 0.2

# or ...
python train_model.py --help  # Look, documentation! ish.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You have full control over how the model is built right from the terminal and it's just one button. There's very little setup for other people to pick it up and run, and if you've added &lt;code&gt;help&lt;/code&gt; arguments the script will literally tell people how to run it, all without them having to even open the code itself.&lt;/p&gt;

&lt;p&gt;The best part is that there's zero code change to adjust your parameters, which isn't possible in a notebook. In production every code change is a risk, and that's mitigated by abstracting your parameters to what's effectively configuration, which is what they are. Moreover, now with just one button you can run this command easily as part of a larger pipeline (for continuous integration, inside Docker, as a background process, etc.). That's very challenging with notebooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  tl;dr
&lt;/h2&gt;

&lt;p&gt;It takes some adjustment, but setting your ML model training as a script rather than a notebook keeps almost the same flexibility you have with notebooks but enables one button runs, the ability to run as a headless process, straightforward code reviews and simple version control diffs.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
    </item>
    <item>
      <title>Python isn't going Anywhere</title>
      <dc:creator>Timothy Renner</dc:creator>
      <pubDate>Thu, 20 Feb 2020 00:29:36 +0000</pubDate>
      <link>https://forem.com/timothyrenner/python-isn-t-going-anywhere-2ada</link>
      <guid>https://forem.com/timothyrenner/python-isn-t-going-anywhere-2ada</guid>
      <description>&lt;p&gt;I've seen a few articles recently predicting the demise of Python for machine learning and data science in favor of the faster, the simpler, the better-for-all-things-machine-learning Julia language. I've heard it mentioned in meetings at work and at a recent conference I attended. &lt;a href="https://towardsdatascience.com/5-ways-julia-is-better-than-python-334cc66d64ae"&gt;This article&lt;/a&gt; is one example. Every time I hear it or see it I have a pretty visceral reaction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/NITFX5emjpMQ0/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/NITFX5emjpMQ0/giphy.gif" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I don't buy it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At the time, in the moment, I didn't have anything like logic backing me up on that, it was just a feeling. I've spent a couple of days thinking it through and I'm convinced that my skepticism of the impending demise of Python is warranted. I really don't buy it. Here's why.&lt;/p&gt;

&lt;p&gt;Okay real quick obviously this is an opinion piece and I'm just one person. But I've been doing ML - and specifically ML in production - for a while now, so naturally I've got some thoughts and arguments behind my feelings. I'm not trying to start a flame war. As you'll see shortly, &lt;em&gt;I don't care&lt;/em&gt; about any programming language at all. I care about deployed models and shipped products. That's why I'm skeptical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Julia is Better (for Models)
&lt;/h2&gt;

&lt;p&gt;I'll start by saying this: &lt;strong&gt;Julia is a better language for data science and machine learning&lt;/strong&gt;. It's really really &lt;a href="https://julialang.org/benchmarks/"&gt;fast&lt;/a&gt;. It's very expressive, combining the simplicity of Python with the metaprogramming capabilities of R and LISP-y languages. It's really pleasant to work with. At the end of the day it's a technical language, closer to Matlab / R than Python. That's what makes it more effective to build high-powered machine learning algorithms with than Python. That's also why it won't unseat Python.&lt;/p&gt;

&lt;p&gt;Technical languages are specialized. That's kind of the point - it's faster and easier to build your model / algorithm in a language designed for models and algorithms. However, models don't make money. &lt;strong&gt;&lt;em&gt;Deployed&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;models make money.&lt;/em&gt; And that's where the technical languages turn up short.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python makes money tho
&lt;/h2&gt;

&lt;p&gt;Deploying a model is an immense amount of work, and a very significant and very challenging side of that work doesn't involve the model at all. You need a web server, containers, database connections, monitoring, CI/CD, package and version management ... you get the idea. That's all the &lt;em&gt;stuff&lt;/em&gt; that the software engineers (or if you're lucky machine learning engineers) &lt;del&gt;have&lt;/del&gt; get to deal with and solve. Your company is probably not paying data scientists to do that work. For one thing, a data scientist that can do the work is really hard to find. The practical aspects of using software to make money is not part of standard data science curriculum. Also, most data scientists just don't want to do it. That's fair.&lt;/p&gt;

&lt;p&gt;But is your software engineering team going to learn your language that's optimized for machine learning, taking on the very significant risk of deploying something they're not only unfamiliar with, but also lacks the tooling for all that &lt;em&gt;stuff&lt;/em&gt; I mentioned above?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/13Zh9drdSWAeSQ/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/13Zh9drdSWAeSQ/giphy.gif" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Python has all that &lt;em&gt;stuff&lt;/em&gt;. And it's been there for &lt;em&gt;years&lt;/em&gt;. The reason is simple: Python isn't a technical language. Immediately that means there are more web services and products you use on a daily basis running on Python than Matlab, R and Julia combined, multiplied by at least 100, probably a whole lot more. There are significantly more Python developers out there than data scientists. And most of them probably know more about shipping software - meaning making money with software - than the average data scientist.&lt;/p&gt;

&lt;p&gt;So which is more economical: develop machine learning libraries in Python so your models can plug right in to all that &lt;em&gt;stuff&lt;/em&gt; without rewriting it, or implementing web servers, security / authentication, CI/CD and testing, deployment, monitoring and alerting, etc. in the Best Technical Language Evar?&lt;/p&gt;

&lt;h2&gt;
  
  
  What would it take?
&lt;/h2&gt;

&lt;p&gt;So what would it take for a Julia or whatever technical language of the future to dethrone Python? I can think of three things, only one of which seems remotely possible.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Julia is &lt;em&gt;so much better&lt;/em&gt; than Python that Python isn't worth learning. No data scientists learn Python, so companies that want Data Science Money have to adopt Julia. Julia wins.&lt;/li&gt;
&lt;li&gt;Some new machine learning hotness comes along that is implemented in Julia first. Because Julia is so much better for this sort of thing, companies eat the cost of adopting and deploying it to use the Hot New Thing in machine learning. It takes too long for Python to get it, and Python for DS and ML gets dusted.&lt;/li&gt;
&lt;li&gt;Software gets released that makes Julia easy and fast to interoperate with Python. Models get developed in Julia, and are deployed with Python (or whatever ... doesn't matter), and nobody knows the difference or cares. All internet language flame wars cease. Pandas no longer become endangered, but the pandas library does.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Julia is &lt;em&gt;way&lt;/em&gt; better
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Point numero uno&lt;/strong&gt;: Julia exists &lt;em&gt;right now&lt;/em&gt; and is competing with Python &lt;em&gt;right now&lt;/em&gt;. Is it really that much easier? Yes it's easier and yes it's simpler. I can import Python packages directly into Julia and can get basically the best of both worlds. But is it &lt;em&gt;so much better&lt;/em&gt; that companies are willing to shell out money?&lt;/p&gt;

&lt;p&gt;For other general purpose languages like Java or C, that answer is yes. It's hard to write prototype software in those languages. Machine learning needs fast iteration cycles to work, and Java / C doesn't cut it. Development is too slow. Not for Python though. Python meets the basic requirement of being fast enough (mostly because machine learning libraries are actually written in C with Python bindings) to make the work happen and flexible enough for prototyping. It's also got all the production bells and whistles needed to get the software out into the world and making money. Because of that it's not hard for industry as a whole to tell data scientists to suck it up and index by zero.&lt;/p&gt;

&lt;h3&gt;
  
  
  New hotness, just for Julia
&lt;/h3&gt;

&lt;p&gt;This actually happened, just not with Julia. When deep learning became the sweet hotness that all companies needed, there wasn't much software that could do this stuff efficiently. Early implementer advantage went to this thing called &lt;a href="https://github.com/torch/torch7"&gt;Torch&lt;/a&gt;. When industry started exploring and deploying deep learning, Torch was there. Torch is written in &lt;a href="https://www.lua.org/"&gt;Lua&lt;/a&gt;: a fast, simple but fairly specialized and not widely adopted language. Did the world pivot to Lua so we could get deep learning?&lt;/p&gt;

&lt;p&gt;No. Python ate deep learning. Facebook literally rewrote Torch in Python and made &lt;a href="https://github.com/pytorch/pytorch"&gt;PyTorch&lt;/a&gt;. The reason Python ate deep learning (and will probably eat the Next Hot Thing in ML too) is simple. Shipped software is the dog. ML is the tail. The tail does not wag the dog. No matter how popular data science gets, there will always be more developers than data scientists because software developers get the software making money. A one-time investment in porting a library or model to Python (which again is hugely flexible because it can bind to superfast C libraries) is much cheaper than building a dev team and all the associated tooling to deploy in a specialized language.&lt;/p&gt;

&lt;h3&gt;
  
  
  Everyone plays nice
&lt;/h3&gt;

&lt;p&gt;The final path is plausible. If we can guarantee a straightforward and efficient interoperation between Julia and Python (or really whatever runtime we want to deploy in) then presumably it won't matter which language the model is built in. This is kind of starting to happen already. In the data engineering world, &lt;a href="https://spark.apache.org/"&gt;Apache Spark&lt;/a&gt; is king. Its core is written in Scala, which means it runs on the JVM. It has Python bindings, including user defined functions. For a long time Python UDFs were the slowest thing on the block in the Spark world, because to execute arbitrary Python code from a Java runtime meant copying and transferring data via (essentially) shell pipes. Then there came this little feature called the &lt;a href="https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html"&gt;Pandas UDF&lt;/a&gt;, which allows Python to execute in Spark without copying memory across runtimes. How? A piece of magic called &lt;a href="https://arrow.apache.org/"&gt;Apache Arrow&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Apache Arrow is an in-memory representation for columnar data that is &lt;em&gt;standard across runtimes&lt;/em&gt;. That means that I can use Java bindings to read Arrow data frames generated by Python, or vice versa. Or I can use Julia to generate a data frame and share it efficiently with the Python runtime that's doing the web service thing. I actually think &lt;strong&gt;Arrow is the most important open source project in the data science and machine learning space&lt;/strong&gt; precisely because it will remove critical efficiency issues between the tooling ecosystems for data engineering, data analysis, model development, and model deployment. &lt;em&gt;If&lt;/em&gt; Julia's going to be supreme monarch of data science and machine learning, this is probably how it would happen. That said, right now Python has the most libraries. Am I really going to use Julia to import Python packages only to export the results back to Python?&lt;/p&gt;

&lt;h2&gt;
  
  
  In Conclusion
&lt;/h2&gt;

&lt;p&gt;Unseating Python is hard because it has one key advantage over technical languages like Julia: it isn't one. Most software isn't deployed with technical languages, but with general ones. And deployed software makes the money. That means it's more economical to move machine learning and data science to Python than to move everything else to Julia (or Matlab, or R). Until some tool like Arrow comes along that enables these runtimes to work together so that nobody has to know or care what made the model, I don't think Python is going anywhere.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
