<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Booklet.ai</title>
    <description>The latest articles on Forem by Booklet.ai (@bookletai).</description>
    <link>https://forem.com/bookletai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F2100%2F19136ce5-4f0d-4bae-b9d1-3e326d1d3ea0.jpg</url>
      <title>Forem: Booklet.ai</title>
      <link>https://forem.com/bookletai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bookletai"/>
    <language>en</language>
    <item>
      <title>You may not need Airflow…. yet</title>
      <dc:creator>AdamBarnhard</dc:creator>
      <pubDate>Tue, 31 Mar 2020 17:09:25 +0000</pubDate>
      <link>https://forem.com/bookletai/you-may-not-need-airflow-yet-1gem</link>
      <guid>https://forem.com/bookletai/you-may-not-need-airflow-yet-1gem</guid>
      <description>&lt;p&gt;TL;DR: Airflow is robust and flexible, but complicated. If you are just starting to schedule data tasks, you may want to try more tailored solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Moving data into a warehouse: &lt;a href="https://www.stitchdata.com/"&gt;Stitch&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Transforming data within a warehouse: &lt;a href="https://www.getdbt.com/"&gt;DBT&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Scheduling Python scripts: &lt;a href="https://databricks.com/"&gt;Databricks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Batch scoring ML models: &lt;a href="https://booklet.ai/"&gt;Booklet.ai&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How could using 4 different services be easier than using just one?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://airflow.apache.org/"&gt;Apache Airflow&lt;/a&gt; is one of the most popular workflow management tools for data teams. It is used by hundreds of companies around the world to schedule jobs of any kind. It is a completely free, open source project, and offers amazing flexibility with its python-built infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y_HlvN5a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AFJsMPN5kPMI7JuqhsaP7rA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y_HlvN5a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AFJsMPN5kPMI7JuqhsaP7rA.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://airflow.apache.org/"&gt;Apache Airflow&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’ve used (and sometimes set up) Airflow instances of all sizes: from Uber’s custom-built &lt;a href="https://towardsdatascience.com/managing-data-science-workflows-the-uber-way-3d265b4c1264"&gt;Airflow-based Piper&lt;/a&gt; to &lt;a href="https://dev.to/adambarnhard/secure-your-data-tool-with-a-password-4hff-temp-slug-9857818"&gt;small instances for side projects&lt;/a&gt; and there is one theme in common: projects get complicated, fast! Airflow needs to be deployed in a stable and production-ready way, all tasks are &lt;a href="https://airflow.apache.org/docs/stable/concepts.html"&gt;custom-defined&lt;/a&gt; in Python, and there are many &lt;a href="https://medium.com/datareply/airflow-lesser-known-tips-tricks-and-best-practises-cf4d4a90f8f"&gt;watch-outs&lt;/a&gt; to keep in mind as you are building tasks. For a less technical user, Airflow can be an overwhelming system just to schedule some simple tasks.&lt;/p&gt;

&lt;p&gt;Although it may be tempting to use one tool for all of the different scheduling needs, that may not always be your best choice. You’ll end up building custom solutions every time a new use-case comes up. Instead, you should use the best tool for the job you are trying to accomplish. The time saved during the setup and maintenance for each use-case is well worth adding a few more tools to your data stack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uIiEu5Nj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/651/0%2Abu_wJ-XsdUt6xgnl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uIiEu5Nj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/651/0%2Abu_wJ-XsdUt6xgnl.jpg" alt=""&gt;&lt;/a&gt;&lt;a href="https://imgflip.com/i/3upiyq"&gt;Imgflip&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, I’ll outline a few of the use cases for Airflow and alternatives for each.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclaimer: I am one of the founders of&lt;/em&gt; &lt;a href="https://booklet.ai"&gt;&lt;em&gt;Booklet.ai&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Extracting Raw Data from a Source to Data Warehouse
&lt;/h3&gt;

&lt;p&gt;Data teams need something to do their jobs…. Data! Many times there are multiple different internal and external sources for this data across disparate sources. To pull all of this data into one single place, the team needs to extract this data from all of these sources and plug them into one single location. This is usually a data warehouse of some kind.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DaUYzZpU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/342/1%2AqEo7Y4kOnlT43TL5vTHEmg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DaUYzZpU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/342/1%2AqEo7Y4kOnlT43TL5vTHEmg.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://www.stitchdata.com/"&gt;Stitch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stitch&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For this step, there are many reliable tools that are being used around the globe. They extract data from a given set of systems on a regular cadence and send those results directly to a data warehouse. These systems handle most error handling and keep things running smoothly. Managing multiple, complicated integrations can prove to be a maintenance nightmare, so these tools can save a lot of time. Luckily, there is a nightmare-saving option:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.stitchdata.com/"&gt;Stitch&lt;/a&gt; coins itself as “a cloud-first, open source platform for rapidly moving data.” You can quickly connect to databases and third party tools and send that data to multiple different data warehouses. The best part: the first 5 million rows are free! Stitch can also be extended with a few &lt;a href="https://www.stitchdata.com/platform/extensibility/"&gt;open source frameworks&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transforming Data within a Data Warehouse
&lt;/h3&gt;

&lt;p&gt;Once that data is loaded up into a data warehouse, it’s usually a mess! Every source had a different structure and each dataset is probably indexed in a different way with a different set of identifiers. To make sense of this chaos, the team needs to transform and join all of this data into a nice, clean form that is easier to use. Most of the logic will happen directly within the data warehouse.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mzeW3l00--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/686/1%2ABzuPc_5WqQMDRX0NvvdIkQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mzeW3l00--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/686/1%2ABzuPc_5WqQMDRX0NvvdIkQ.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://www.getdbt.com/"&gt;DBT&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DBT&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The process of combining all of these datasets into a form that a business can actually use can be a tedious task. It has become such a complex field, that the specific role of &lt;a href="https://blog.getdbt.com/what-is-an-analytics-engineer/"&gt;Analytics Engineer&lt;/a&gt; has emerged from it. These are a common set of problems across the industry and a tool has emerged to solve these problems specifically:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.getdbt.com/"&gt;DBT&lt;/a&gt; considers itself “your entire analytics engineering workflow” and I agree. With only knowing SQL, you can quickly build multiple, complex layers of data transformation jobs that will be fully managed. Version control, testing, documentation and so much else is all managed for you! The &lt;a href="https://www.getdbt.com/pricing/"&gt;cloud-hosted version&lt;/a&gt; is free to use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transforming Data outside of the Data Warehouse
&lt;/h3&gt;

&lt;p&gt;Sometimes the team will also need to transform data outside of the data warehouse. What if the transformation logic can’t operate completely within SQL? What if the team needs to train a Machine Learning model? These tasks might pull data from the data warehouse directly, but the actual tasks need to operate in a different system, such as python.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lt2LqSIE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AlaYMsvzf7eBg_LDVMLw1sw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lt2LqSIE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AlaYMsvzf7eBg_LDVMLw1sw.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://databricks.com/"&gt;Databricks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Databricks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most custom python-based scripts usually start as a Jupyter Notebook. You import a few packages, import or extract data, run some functions, and finally push that data somewhere else. Sometimes more complicated, production-scale processes are needed, but that’s rare. If you just need a simple way to run and schedule a python notebook there’s a great option:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://databricks.com/"&gt;Databricks&lt;/a&gt; was created by the original founders of Spark. Its main claim-to-fame is spinning up spark clusters super easily, but it also has great notebook functionality. They offer easy to use Python notebooks, where you can collaborate within the notebook just like Google docs. Once you develop the script that works for you, you can schedule that notebook to run completely within the platform. It’s a great way to not worry about where the code is running and have an easy way to schedule those tasks. They have a &lt;a href="https://databricks.com/product/faq/community-edition"&gt;free community edition&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch Scoring Machine Learning Models
&lt;/h3&gt;

&lt;p&gt;If the team has built a machine learning model, the results of that model should be sent to a place where it can actually help the business. These tasks usually involve connecting to an existing Machine Learning model and then sending the results from that model to another tool, such as a Sales or Marketing tool. This can be a ridiculously tedious task to get a system up and running that actually pushes the correct model results at the right time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YvOzep3N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2Aksz3KtLCS3frDpT1y7L2JQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YvOzep3N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2Aksz3KtLCS3frDpT1y7L2JQ.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://booklet.ai/"&gt;Booklet.ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://booklet.ai"&gt;&lt;strong&gt;Booklet.ai&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building a machine learning model is hard enough, it shouldn’t take another 2 months of custom coding to connect that model to a place where the business can find value in it. This work usually requires painful integration to third party systems, not to mention the production-level infrastructure work that is required! Thankfully, there is a solution that handles some of these tasks for you:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://booklet.ai/"&gt;Booklet.ai&lt;/a&gt; connects to existing ML Models and then allows you to quickly set up a few things: A demo to share the model with non technical users, an easy API endpoint connection, and a set of integrations to connect inputs and outputs. You can easily set up an input query from a data warehouse to score the model, and then send those results to a variety of tools that your business counterparts may use. &lt;strong&gt;You can check out a demo for a&lt;/strong&gt; &lt;a href="https://app.booklet.ai/model/lead-scoring"&gt;&lt;strong&gt;lead-scoring model that sends results to intercom&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;.&lt;/strong&gt; &lt;a href="https://booklet.ai"&gt;&lt;strong&gt;You can request access to the Booklet.ai beta&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;, where your first model will be free.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UV7dQQKc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2Ars6sCg3ZO3cNk7Io" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UV7dQQKc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2Ars6sCg3ZO3cNk7Io" alt=""&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@mbenna?utm_source=medium&amp;amp;utm_medium=referral"&gt;Mike Benna&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;




</description>
      <category>datascience</category>
      <category>python</category>
      <category>airflow</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>A True End-to-End ML Example: Lead Scoring</title>
      <dc:creator>AdamBarnhard</dc:creator>
      <pubDate>Mon, 09 Mar 2020 05:17:28 +0000</pubDate>
      <link>https://forem.com/bookletai/a-true-end-to-end-ml-example-lead-scoring-3162</link>
      <guid>https://forem.com/bookletai/a-true-end-to-end-ml-example-lead-scoring-3162</guid>
      <description>&lt;h4&gt;
  
  
  From machine learning idea to implemented solution with MLflow, AWS Sagemaker, and &lt;a href="https://booklet.ai/"&gt;Booklet.ai&lt;/a&gt;
&lt;/h4&gt;

&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Selling something can be hard work. A business might have many potential customers leads but most of those customers won’t turn into actual, paying customers in the end. A sales team has to sort through a long list of potential customers and figure out how to spend their time. That’s where lead scoring comes in. This is a system that analyzes attributes about each new lead in relation to the chances of that lead actually becoming a customer, and uses that analysis to score and rank all of the potential customers. With that new ranking, the sales team can then prioritize their time, and only spend time on the leads that are highly likely to become paying customers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cool, that sounds great! How do I do it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Well, I’m glad you asked! In this post, we will walk through the full end-to-end implementation of a custom built lead-scoring model. This includes pulling the data, building the model, deploying that model, and finally pushing those results directly to where they matter most — the tools that a sales team uses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fj003Edk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2A2QzvSSSHA6Ir_wydOBs1kg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fj003Edk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2A2QzvSSSHA6Ir_wydOBs1kg.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://app.booklet.ai/model/lead-scoring"&gt;Testing the model in Booklet.ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you want to test out this model without going through the full process, we have a&lt;/strong&gt; &lt;a href="https://app.booklet.ai/model/lead-scoring"&gt;&lt;strong&gt;fully-functioning lead scoring model&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;on&lt;/strong&gt; &lt;a href="https://booklet.ai/"&gt;&lt;strong&gt;Booklet.ai&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;.&lt;/strong&gt; We’ve posted all code, in the form of a &lt;a href="https://github.com/BookletAI/lead-scoring-demo"&gt;Jupyter Notebook on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;This will be a technical tutorial that requires a bit of coding and data science understanding to get through. To get the most out of this, you should have at least a bit of exposure to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python (we will stay within Jupyter notebooks the whole time)&lt;/li&gt;
&lt;li&gt;Machine Learning (we will use a Random Forest model)&lt;/li&gt;
&lt;li&gt;The command line (yes, it can be scary, but we just use a few simple commands)&lt;/li&gt;
&lt;li&gt;AWS (we can hold your hand through this one!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also, you should have a few things installed to make sure you can move quickly through the tutorial:&lt;/p&gt;

&lt;p&gt;An AWS username with access through awscli (we will cover this below!)&lt;/p&gt;

&lt;p&gt;Python 3 of some kind with a few packages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pandas &lt;code&gt;pip install pandas&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;MLflow &lt;code&gt;pip install mlflow&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;SKlearn &lt;code&gt;pip install scikit-learn&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Docker (pretty quick and easy to install &lt;a href="https://hub.docker.com/editions/community/docker-ce-desktop-mac/"&gt;here&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Before we get started…
&lt;/h3&gt;

&lt;p&gt;We’re going to touch on a lot of tools and ideas in a short amount of time. Before we dive right in, it’s important to take a step back to understand what’s happening here. There are a few tools that we will be using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://jupyter.org/"&gt;Jupyter Notebook&lt;/a&gt;: A go-to for data scientists. Allows you to run python scripts in the form of a notebook and get results in-line.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://mlflow.org/"&gt;MLflow&lt;/a&gt;: An open source model management system.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/sagemaker/"&gt;Sagemaker&lt;/a&gt;: A full-stack machine learning platform from AWS.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://booklet.ai/"&gt;Booklet.ai&lt;/a&gt;: A model testing and integration system.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.intercom.com/"&gt;Intercom&lt;/a&gt;: A customer messaging platform that is commonly used by customer service and sales teams to manage customer relationships.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is a diagram that outlines how these different tools are used:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NL7Fk9Bz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AjTEHKooNKLlx5BRv5INiqQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NL7Fk9Bz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AjTEHKooNKLlx5BRv5INiqQ.png" alt=""&gt;&lt;/a&gt;Process Overview, author’s work. Utilizing images from &lt;a href="https://thenounproject.com/"&gt;Noun Project&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the highest level, we will use a Jupyter notebook to pull leads data and train a model. Next, we will send that model to MLflow to keep track of the model version. Then, we will send both a docker container and the model into AWS Sagemaker to deploy the model. Finally, we will use Booklet to put that model to use and start piping lead scores into Intercom.&lt;/p&gt;

&lt;p&gt;Now that we got that out of the way, let’s get started!&lt;/p&gt;

&lt;h3&gt;
  
  
  Training the Model
&lt;/h3&gt;

&lt;h3&gt;
  
  
  About the Data
&lt;/h3&gt;

&lt;p&gt;First, we need to access data about our leads. This data should have two types of information:&lt;/p&gt;

&lt;p&gt;(A) The response variable: Whether or not the lead converted into a paying customer&lt;/p&gt;

&lt;p&gt;(B) The features: Details about each lead that will help us the response variable&lt;/p&gt;

&lt;p&gt;For this exercise, we are going to use an example &lt;a href="https://www.kaggle.com/ashydv/lead-scoring-logistic-regression/output"&gt;leads dataset&lt;/a&gt; from Kaggle. This dataset provides a large list of simulated leads for a company called X Education, which sells online courses. We have a variety of features for each lead as well as whether or not that lead converted into a paying customer. Thanks &lt;a href="https://www.kaggle.com/ashydv"&gt;Ashish&lt;/a&gt; for providing this dataset and for the awesome analysis on Kaggle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Importing and Cleaning the Data
&lt;/h3&gt;

&lt;p&gt;To import this data, simply read the leads_cleaned dataset into Pandas. If you are reading this data from a database instead, replace this with pd.read_sql_query instead.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Next, we want to pick out a few columns that matter to us. To do that, we will create lists of columns that fit into different categories: numeric, categorical, and the response variable. This will make the processing and cleaning processing easier.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;From here, we can create our train/test datasets that will be used for training:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Now that we have a test dataset, let’s go ahead and create a scaler for our numeric variables. It is important to only run this on the training dataset so that we don’t “leak” any information about the test set.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Now, we need to make some adjustments to the model to prepare for modeling. We’ve created a function to perform a few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select the columns that we’ve defined as important&lt;/li&gt;
&lt;li&gt;Use the fitted scaler to center and scale the numeric columns&lt;/li&gt;
&lt;li&gt;Turn categorical variables into one-hot encoded variables&lt;/li&gt;
&lt;li&gt;Ensure that all columns from the training dataset are also in the outputted, processed dataset (This is important so that all levels of dummy variables are created, even if the dataset we import doesn’t have each individual level.)&lt;/li&gt;
&lt;/ul&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Here’s how it looks when we put it all together and run both the training and test dataset through our preprocessing function:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--c1W9Gvjp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/878/1%2AZy1FYy8VR4N0pSzYzr5hTw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--c1W9Gvjp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/878/1%2AZy1FYy8VR4N0pSzYzr5hTw.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Training the model
&lt;/h3&gt;

&lt;p&gt;This bring us to the exciting part! Let’s use our newly cleaned and split datasets to train a random forest model that predicts the chances of someone converting into a paying customer of X Education. First, let’s define a few standard hyperparameters and initialize the SKLearn model:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;From here, we can quickly calculate a few accuracy metrics in our test set to see how the model did.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;We have an accuracy of 82% and an AUC score of 80% in our held-out test set! Not too shabby. There is definitely room to improve, but for the sake of this tutorial, let’s move forward with this model.&lt;/p&gt;

&lt;h3&gt;
  
  
  MLflow: Managing the Model
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What is MLflow?
&lt;/h4&gt;

&lt;p&gt;Before we go setting this up, let’s have a quick chat about MLflow. Officially, MLflow is “An open source platform for the machine learning lifecycle.” Databricks developed this open source project to help machine learning builders more easily manage and deploy machine learning models. Let’s break that down:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managing models:&lt;/strong&gt; While building an ML model, you will likely go through multiple iterations and test a variety of model types. It’s important to keep track of metadata about those tests as well as the model objects themselves. What if you discover an awesome model on your 2nd of 100 tries and want to go back to use that? MLflow has you covered!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deploying models:&lt;/strong&gt; In order to make a model accessible, you need to deploy the model. This means hosting your model as an API endpoint, so that it is easy to reference and score against your model in a standard way. There is a super long list of tools that deploy models for you. MLflow isn’t actually one of those tools. Instead, MLflow allows easy deployment of your managed model to a variety of different tools. It could be on your local machine, Microsoft Azure, or AWS Sagemaker. We will use Sagemaker in this tutorial.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setting up MLflow
&lt;/h4&gt;

&lt;p&gt;The MLflow tracking server is a nice UI and API that wraps around the important features. We will need to set this up before we can use MLflow to start managing and deploying models.&lt;/p&gt;

&lt;p&gt;Make sure you have the MLflow package installed (check out the Pre-reqs if not!). From there, run the following command in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mlflow ui
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;After this, you should see the shiny new UI running at &lt;a href="http://localhost:5000/"&gt;http://localhost:5000/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you run into issues getting this setup, check out the MLflow tracking server docs &lt;a href="https://www.mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers"&gt;here&lt;/a&gt;. Also, if you’d prefer not to setup the tracking server on your own machine, Databricks offers a &lt;a href="https://www.mlflow.org/docs/latest/quickstart.html#logging-to-a-remote-tracking-server"&gt;free hosted version&lt;/a&gt; as well.&lt;/p&gt;

&lt;p&gt;Once you have the tracking server running, let’s point Python to our tracking server and setup an experiment. An experiment is a collection of models inside of the MLflow tracking server.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;h4&gt;
  
  
  Packaging our model with processing
&lt;/h4&gt;

&lt;p&gt;If you are working with a model that has no preprocessing associated with your data, logging the model is fairly simple. In our case, we actually need to setup this preprocessing logic alongside the model itself. This will allow leads to be sent to our model as-is and and the model will handle the data prep. A quick visual to show this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---gHuxbqf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1001/0%2AI37XhBI3C9eKk2cx" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---gHuxbqf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1001/0%2AI37XhBI3C9eKk2cx" alt=""&gt;&lt;/a&gt;Data Processing Flow, author’s work. Utilizing images from &lt;a href="https://thenounproject.com/"&gt;Noun Project&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To do this, we will utilize MLflow’s pyfunc model-type (more info &lt;a href="https://www.mlflow.org/docs/latest/models.html#python-function-python-function"&gt;here&lt;/a&gt;) which allows us to wrap up both a model and the preprocessing logic into one nice Python class. We will need to send two different inputs to this class: objects (i.e. list of columns that are numeric or the random forest model itself) and logic (i.e. preprocessing function that we created). Both of these entities will be used inside the class.&lt;/p&gt;

&lt;p&gt;Now, let’s setup the class. First, check out the code and then we will talk through the different pieces:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The class leadsModel by based on MLflow’s &lt;code&gt;pyfunc&lt;/code&gt; class. This will allow us to push this model into MLflow and eventually Sagemaker.&lt;/p&gt;

&lt;p&gt;Next we setup all of the objects that we need within the &lt;strong&gt;__init__&lt;/strong&gt;. This contains both the objects and the logic function.&lt;/p&gt;

&lt;p&gt;Finally, we setup the predict function:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, we take in the model_input (which is the dataframe that is sent to the deployed object after deployment) and ensure that all of the column names are lowercase.&lt;/li&gt;
&lt;li&gt;Next, we send this dataframe into the preprocessing function that we had created and used earlier for model training. This time, we keep the response columns blank since we won’t need them for deployment!&lt;/li&gt;
&lt;li&gt;Then, we reference the original training dataset’s column names and fill in any missing columns with 0’s. This is important since we will have levels of on-hot-encoded variables that aren’t calculated when we send datasets to the model after deployment.&lt;/li&gt;
&lt;li&gt;Finally, we send this nice, clean dataset to our Random Forest model for prediction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we have all of our logic and objects ready to go within one class, we can log this model into MLflow!&lt;/p&gt;

&lt;h4&gt;
  
  
  Logging the model to MLflow
&lt;/h4&gt;

&lt;p&gt;Before we package everything up and log the model, we need to setup the Anaconda environment that will be used when the model runs on Sagemaker. For more information about Anaconda, &lt;a href="https://towardsdatascience.com/get-your-computer-ready-for-machine-learning-how-what-and-why-you-should-use-anaconda-miniconda-d213444f36d6"&gt;here’s a detailed overview&lt;/a&gt;.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Now, we start a run within MLflow. Within that run, we log our hyperparameters, accuracy metrics, and finally the model itself!&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;If you head over to the MLflow UI that we checked out earlier, you’ll see the newly created model along with all of the metrics and parameters that we just defined. Woot woot!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oQ83mchH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AMeKhmLRDWXaW_fXL" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oQ83mchH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AMeKhmLRDWXaW_fXL" alt=""&gt;&lt;/a&gt;Logged model in MLflow&lt;/p&gt;

&lt;h3&gt;
  
  
  Sagemaker: Deploying the Model
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What is Sagemaker?
&lt;/h4&gt;

&lt;p&gt;Sagemaker is a suite of tools that Amazon Web Services (AWS) created to support Machine Learning development and deployment. There’s a ton of tools available within Sagemaker (too many to list here) and we will be using their model deployment tool specifically. There are some great Sagemaker examples in their GitHub repo &lt;a href="https://github.com/awslabs/amazon-sagemaker-examples"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setting up Sagemaker
&lt;/h4&gt;

&lt;p&gt;First things first, you need to get permissions worked out. AWS permissions are never simple, but we will try to keep this easy! You’ll need to setup two different settings: a user for yourself and a role for Sagemaker.&lt;/p&gt;

&lt;p&gt;The first is a user account so that you can access AWS as you send the model to Sagemaker. To do this, you’ll need to head over to the Identity and Access Management (IAM) console and setup a user account with Administrator permissions. If your security team pushes back, “Sagemaker Full Access” should work too! At the end of the setup flow, you’ll be given an AWS Access Key ID and a AWS Secret Access Key. Make sure to save those! They are not accessible after that first time. Now, head to your terminal and type aws configure. This will prompt you to enter your AWS keys that you just collected. Once you have that setup, you’ll now have AWS access from both the terminal and from Python! &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/gs-account-user.html"&gt;Here&lt;/a&gt; are more details from AWS.&lt;/p&gt;

&lt;p&gt;The second is a role (which is essentially a user account for services within AWS) for Sagemaker. To set this up, head to the roles section of IAM. You’ll want to assign this role to Sagemaker and then pick the policy called “SagemakerFullAccess.” At the end of this process, you’ll get something called an ARN for this role! We’ll need this for deployment so keep this handy. More details from AWS &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/access-policy-aws-managed-policies.html"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, we need to push an MLflow docker container into AWS. Assuming you have the permissions setup correctly above and docker installed (see prerequisites section for docker setup), you’ll want to run the following command in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mlflow sagemaker build-and-push-container
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;This will push a docker container into AWS, which will be used during deployment.&lt;/p&gt;
&lt;h4&gt;
  
  
  Deploying to Sagemaker
&lt;/h4&gt;

&lt;p&gt;Now that we have everything setup, it’s time to push our model to Sagemaker!&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;The deploy function usually takes a 5 to 10 minutes to complete and the status is checked every so often with this function until completion. Once the deployment is complete, you’ll be able to find a running model in the Sagemaker UI!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FTnW1JD1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2A_hJmrgrMJBgOUNIBYOk3RA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FTnW1JD1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2A_hJmrgrMJBgOUNIBYOk3RA.png" alt=""&gt;&lt;/a&gt;Deployed model in Sagemaker&lt;/p&gt;

&lt;h3&gt;
  
  
  Booklet: Integrating the Model
&lt;/h3&gt;

&lt;p&gt;Congrats, your model is now deployed! Our next goal is to make this model helpful to the sales team. To do that, we’ll want to use the deployed model to create lead scores for new sales leads and send those results to the tools that the sales team uses. We now need to create a system that regularly pulls in new sales leads, sends each lead’s info to our deployed model, and then send those model results to Intercom, the sales team’s tool.&lt;/p&gt;

&lt;p&gt;There are a few custom-built ways to set this up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We can setup a custom Python script that regularly collects new Intercom user data in our data warehouse, sends that data to our deployed endpoint using the &lt;a href="https://www.rideboreal.com/plan-your-trip/passes/unlimited-pass"&gt;Sagemaker Python SDK&lt;/a&gt;, and then sends the results back to Intercom with their &lt;a href="https://developers.intercom.com/intercom-api-reference/reference"&gt;API&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;We can utilize Sagemaker’s Batch Transform functionality (great example &lt;a href="https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_batch_transform/introduction_to_batch_transform/batch_transform_pca_dbscan_movie_clusters.ipynb"&gt;here&lt;/a&gt;) to score batches of Intercom users. All data starts and ends in S3 for batch transform, so we’ll need to pull data into S3 for scoring, and then push data from S3 to Intercom to serve that up to sales teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;We knew there had to be a more efficient way to push the model results into the tools where they are most useful, so we built&lt;/strong&gt; &lt;a href="https://booklet.ai/"&gt;&lt;strong&gt;Booklet.ai&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;to make these steps easier.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What is Booklet?
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://booklet.ai/"&gt;Booklet&lt;/a&gt; adds a web testing interface and data integrations to each of your Machine Learning endpoints, without requiring code changes. With Booklet, you can quickly try out model test-cases to ensure results are performing as expected, as well as send these results to the tools that matter most. For a lead scoring model, we can send results back to our data warehouse (Redshift in this case) or the sale’s team’s tool (Intercom).&lt;/p&gt;

&lt;h4&gt;
  
  
  Testing the model
&lt;/h4&gt;

&lt;p&gt;Using Booklet, we quickly setup a &lt;a href="https://app.booklet.ai/model/lead-scoring"&gt;demo to test the lead scoring model&lt;/a&gt;. This is connected to the endpoint that we created in this tutorial so far. You can try out different inputs and see how the model classifies each theoretical lead. &lt;a href="https://booklet.ai/blog/web-app-for-ml-model/"&gt;Learn more about how to turn your ML model into a web app here.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7U7V8lKp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2ALNB3dpLOMR13kcl_fZ57kA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7U7V8lKp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2ALNB3dpLOMR13kcl_fZ57kA.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://app.booklet.ai/model/lead-scoring"&gt;Testing the model in Booklet.ai&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Connecting the model
&lt;/h4&gt;

&lt;p&gt;Once you feel comfortable with the output of the model from testing, you can start sending those results to systems where that output is most useful. We’ve already set up our &lt;a href="https://app.booklet.ai/source/redshifts/2/edit"&gt;source&lt;/a&gt; in Redshift, which pulls data to feed into the model. We’ve also setup both a Redshift &lt;a href="https://app.booklet.ai/destinations/2/edit"&gt;destination&lt;/a&gt; and an Intercom &lt;a href="https://app.booklet.ai/destinations/3/edit"&gt;destination&lt;/a&gt;, where the results will be sent. To kick off an example dataflow, which pulls data from the source, scores that data with the model, and sends results to both destinations, you can try out a dataflow &lt;a href="https://app.booklet.ai/dataflow/configs/2/edit"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--t7LFg5CU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AvU3OYn_1nF7N91fTFWTq-A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--t7LFg5CU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2AvU3OYn_1nF7N91fTFWTq-A.png" alt=""&gt;&lt;/a&gt;&lt;a href="https://app.booklet.ai/model/lead-scoring"&gt;Running a dataflow in Booklet.ai&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Making your model impactful
&lt;/h4&gt;

&lt;p&gt;Tada! We’ve now made our lead scoring model impactful by sending results directly into Intercom. To get a sense of how this might show up for a sales team member, here you can see each example lead now has a custom attribute listing whether or not they are likely to convert:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BdPzRfFB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2ANowQTfCyEK6hNf-mEz5u_w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BdPzRfFB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/1%2ANowQTfCyEK6hNf-mEz5u_w.png" alt=""&gt;&lt;/a&gt;Example of lead score within &lt;a href="https://www.intercom.com/"&gt;Intercom&lt;/a&gt; Platform&lt;/p&gt;

&lt;p&gt;With these labels easily available for each potential lead, a sales team member can start to prioritize their time and pick who they will reach out to first. This will hopefully lead to better efficiency, and more sales for your business! There are many ways to measure the success of these outcomes, but we’ll visit that at another time!&lt;/p&gt;

&lt;h3&gt;
  
  
  Closing Thoughts
&lt;/h3&gt;

&lt;p&gt;If you’ve made it this far, thank you! You’ve successfully navigated an entire end-to-end machine learning project. From idea inception to business impact, and all of the steps in between. If you have any thoughts, questions, or run into issues as you follow along, please drop in a comment below.&lt;/p&gt;

&lt;p&gt;A big thank you to &lt;a href="https://www.kaggle.com/ashydv"&gt;Ashish&lt;/a&gt; for the dataset, &lt;a href="https://www.linkedin.com/in/liangbing/"&gt;Bing&lt;/a&gt; for a helpful review, and &lt;a href="https://towardsdatascience.com/@kylegallatin"&gt;Kyle&lt;/a&gt; for an &lt;a href="https://towardsdatascience.com/deploying-models-to-production-with-mlflow-and-amazon-sagemaker-d21f67909198"&gt;awesome blog&lt;/a&gt; to reference on MLflow and Sagemaker.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qNmb1dei--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AxVIGAovyJ23z5uaC" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qNmb1dei--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/1024/0%2AxVIGAovyJ23z5uaC" alt=""&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@tsunamiholmes?utm_source=medium&amp;amp;utm_medium=referral"&gt;Chloe Leis&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;




</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
