<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Mendy Kevin</title>
    <description>The latest articles on Forem by Mendy Kevin (@mendy_kevin_94ec1db73e1df).</description>
    <link>https://forem.com/mendy_kevin_94ec1db73e1df</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3073727%2F2e4c3b27-bcfc-46f3-a3c3-d536c5d174e8.png</url>
      <title>Forem: Mendy Kevin</title>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mendy_kevin_94ec1db73e1df"/>
    <language>en</language>
    <item>
      <title>Introduction to Server-less and AWS Lambda</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Fri, 12 Dec 2025 08:23:42 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/introduction-to-server-less-and-aws-lambda-2ho1</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/introduction-to-server-less-and-aws-lambda-2ho1</guid>
      <description>&lt;p&gt;AWS Lambda enables the deployment of various applications, including machine learning models. &lt;/p&gt;

&lt;p&gt;For our case, it will involve sending the URL of a picture of pants to our deployed model, and the service responds with multiple classes along with their respective scores. For our use case, we will use TensorFlow Lite instead of the standard TensorFlow.&lt;/p&gt;

&lt;p&gt;To access AWS Lambda, simply type “lambda” in the AWS console. A service named “Lambda Run Code without Thinking about Servers” will be displayed. This encapsulates the essence of what you can expect from Lambda. All you need to do is write some functions without concerning yourself with EC2 instances or similar infrastructure; AWS Lambda takes care of everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of using AWS Lambda
&lt;/h2&gt;

&lt;p&gt;There are three primary advantages of utilizing Lambda functions:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**Infrastructure Abstraction:**
    With Lambda functions, you are relieved from the burden of managing infrastructure for serving models. This eliminates the need to consider EC2 instances.

**Cost-Efficiency:**
    AWS Lambda operates on a pay-per-request model, meaning you only incur charges when the Lambda function is actively performing tasks. This pay-as-you-go approach can lead to cost savings compared to maintaining constantly running infrastructure.

**Free Tier Usage:**
    AWS Lambda provides a free tier, offering a certain amount of free Lambda requests per month (1 million requests per month), for each account. This can be advantageous for small-scale or experimental projects.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;After testing, if you no longer require the Lambda function, you can delete it by navigating to the “Actions” dropdown and selecting “Delete function.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a sample lambda function
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt4djqf5icqg5vvaj47j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt4djqf5icqg5vvaj47j.png" alt=" " width="800" height="675"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Creating a simple Lambda function is straightforward; just provide some basic parameters. Choose “Author from scratch,” assign a function name, and specify Python 3.x as the runtime. This is sufficient to create the function, resulting in a Python file with the specified function name.&lt;/p&gt;

&lt;p&gt;Understanding Function Parameters:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;event: Contains the input data passed to the function (e.g., a JSON payload).
context: Provides details about the invocation, configuration, and execution environment.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To make it easy just change the code to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# event is whatever we'll pass to the lambda function
import json
def lambda_handler(event, context):
    print("parameters: ", event)
    return "PONG"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kubmyx94mh0fdu5kmqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kubmyx94mh0fdu5kmqp.png" alt=" " width="800" height="675"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lets adjust our code to accommodate url of test-event-code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# test event code IE: add this code to the test event:
{
    "url": "some-url-of-pants"
}

# lambda function
import json
def lambda_handler(event, context):
    print("parameters: ", event)
    url = event['url']
    results = predict(url)
    return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the next post as we discuss tensor flow vs tensor flow lite.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>aws</category>
      <category>lambda</category>
    </item>
    <item>
      <title>Complete Guide to Deploying Machine Learning Models with Flask and Docker(NO fluff configure and run like a pro)</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Sun, 02 Nov 2025 13:15:23 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/complete-guide-to-deploying-machine-learning-models-with-flask-and-dockerno-fluff-configure-and-3i91</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/complete-guide-to-deploying-machine-learning-models-with-flask-and-dockerno-fluff-configure-and-3i91</guid>
      <description>&lt;p&gt;Hello all! Welcome. This article addresses the technical aspects of deploying Machine Learning models that use Logistic Regression, a linear model used to make predictions based on trained data. I promise you'll be technical like a pro in configuring machine learning models , so stick around till the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Packaging models with Pickle&lt;/li&gt;
&lt;li&gt;Serving ML models with Flask&lt;/li&gt;
&lt;li&gt;Containerizing apps with Docker&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - Exposing inference endpoints in Docker
&lt;/h2&gt;




&lt;h2&gt;
  
  
  The Big Picture: Understanding ML Model Deployment
&lt;/h2&gt;

&lt;p&gt;Let's understand the overall workflow of deploying a machine learning model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Save the Model&lt;/strong&gt;: Start by taking your Jupyter notebook where the model resides and save it to a file with a &lt;code&gt;.bin&lt;/code&gt; extension.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Load as a Web Service&lt;/strong&gt;: Load this model from a different process (using a Python script) in a web service—for example, a "churn service" that predicts a customer's churn rate. We'll use &lt;strong&gt;Flask&lt;/strong&gt; to transform the model into a web service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Isolate Python Dependencies&lt;/strong&gt;: Use &lt;strong&gt;pipenv&lt;/strong&gt; (similar to conda or pip) to isolate the dependencies for this service and prevent interference with other services on your machine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Isolate System Dependencies&lt;/strong&gt;: Add another layer using &lt;strong&gt;Docker&lt;/strong&gt; to isolate system dependencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deploy to the Cloud&lt;/strong&gt;: Once the local setup is complete, deploy the service to the cloud. You can use any cloud platform, but we'll use &lt;strong&gt;AWS Elastic Beanstalk (EB)&lt;/strong&gt; for this tutorial.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Setting Up the Environment
&lt;/h2&gt;

&lt;p&gt;Before training our model, we need to ensure our development environment has the right dependencies without interfering with other projects (which might require different versions of scikit-learn, pandas, etc.). We'll use &lt;strong&gt;pipenv&lt;/strong&gt; for this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing Pipenv
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you have Anaconda installed and added to your system variables, it will automatically activate the &lt;code&gt;(base)&lt;/code&gt; conda environment when you open a new shell/terminal. We don't want to install pipenv inside conda. Instead, we'll install it &lt;strong&gt;globally&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deactivate conda&lt;/span&gt;
conda deactivate

&lt;span class="c"&gt;# Install uv (a faster Python package manager)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;uv

&lt;span class="c"&gt;# Install pipenv globally&lt;/span&gt;
uv pip &lt;span class="nb"&gt;install &lt;/span&gt;pipenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating a Pipenv Virtual Environment
&lt;/h3&gt;

&lt;p&gt;Once you're in your project directory, manage all Python libraries and dependencies via pipenv:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a pipenv virtual environment&lt;/span&gt;
&lt;span class="c"&gt;# This automatically creates Pipfile and Pipfile.lock&lt;/span&gt;
pipenv &lt;span class="nt"&gt;--python&lt;/span&gt; 3.12

&lt;span class="c"&gt;# Activate the virtual environment&lt;/span&gt;
pipenv shell

&lt;span class="c"&gt;# Install all requirements for your project&lt;/span&gt;
&lt;span class="c"&gt;# (Using scikit-learn==1.5.1 because this project requires this specific version)&lt;/span&gt;
pipenv &lt;span class="nb"&gt;install &lt;/span&gt;flask scikit-learn&lt;span class="o"&gt;==&lt;/span&gt;1.5.1 numpy pandas requests

&lt;span class="c"&gt;# Note: pickle is built into Python, no need to install it separately&lt;/span&gt;

&lt;span class="c"&gt;# Check dependencies&lt;/span&gt;
pipenv graph

&lt;span class="c"&gt;# Update Pipfile.lock according to your current Pipfile&lt;/span&gt;
pipenv lock
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxm5fzmyu5yvhhm6bq4ok.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxm5fzmyu5yvhhm6bq4ok.png" alt="Pipenv Installation" width="736" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Launching Jupyter Notebook
&lt;/h3&gt;

&lt;p&gt;After installation, launch Jupyter notebook inside your virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# If you prefer VS Code&lt;/span&gt;
code &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Or if you have Anaconda installed&lt;/span&gt;
jupyter lab
&lt;span class="c"&gt;# or&lt;/span&gt;
jupyter notebook
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If done correctly, you'll see your virtual environment in the Jupyter launcher.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchrrtm86x6jl9no48fft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchrrtm86x6jl9no48fft.png" alt="Jupyter Environment" width="800" height="609"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Training the Model
&lt;/h2&gt;

&lt;p&gt;Let's look at how we trained our model. (This isn't the primary focus, so I'll keep it brief.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Making Necessary Imports
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KFold&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.feature_extraction&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DictVectorizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;roc_auc_score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Data Preparation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Read and prepare data
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data-week-3.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Make column names homogeneous
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Handle categorical columns
&lt;/span&gt;&lt;span class="n"&gt;categorical_columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtypes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtypes&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;categorical_columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Handle numerical data
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;totalcharges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;totalcharges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coerce&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;totalcharges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;totalcharges&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Convert target variable to binary
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;churn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;churn&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define feature types
&lt;/span&gt;&lt;span class="n"&gt;numerical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tenure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;monthlycharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;totalcharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;categorical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gender&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;seniorcitizen&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;partner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dependents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;phoneservice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;multiplelines&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;internetservice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;onlinesecurity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;onlinebackup&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deviceprotection&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;techsupport&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;streamingtv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;streamingmovies&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paperlessbilling&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paymentmethod&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Data Splitting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Split into training and test sets
&lt;/span&gt;&lt;span class="n"&gt;df_full_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Define Training Function
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; To apply the model later, we need to return both the DictVectorizer and the model. Otherwise, the function will return &lt;code&gt;None&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dicts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;categorical&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;numerical&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orient&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;dv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DictVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sparse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;X_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dicts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Define Prediction Function
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dicts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;categorical&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;numerical&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orient&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dicts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  K-Fold Cross Validation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;C&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
&lt;span class="n"&gt;n_splits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;kfold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KFold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;train_idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val_idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;kfold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_full_train&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;df_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_full_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;train_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;df_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_full_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;val_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;churn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
    &lt;span class="n"&gt;y_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;churn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

    &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;auc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;roc_auc_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;C=%s %.3f +- %.3f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Saving the Model with Pickle
&lt;/h2&gt;

&lt;p&gt;We use &lt;code&gt;'wb'&lt;/code&gt; (Write Binary) mode to save the model. Include the DictVectorizer in your file so that when you load the model in your churn service, you can convert customer data from a dictionary into a feature matrix (which the model requires for predictions).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pickle&lt;/span&gt;

&lt;span class="n"&gt;output_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_C=1.0.bin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f_out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;pickle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;f_out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Creating the Churn Service
&lt;/h2&gt;

&lt;p&gt;Use a Python script for this. Load the model and make sure to use the POST HTTP method since we need to send information to the web service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;predict.py&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pickle&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;

&lt;span class="n"&gt;model_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_C=1.0.bin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# Load the model
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f_in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pickle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f_in&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/predict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Get customer data from JSON request
&lt;/span&gt;    &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Transform and predict
&lt;/span&gt;    &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;churn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn_probability&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# Convert to native Python type
&lt;/span&gt;        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;churn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert to native Python type
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;9696&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running the Service
&lt;/h3&gt;

&lt;p&gt;Launch your app on your local server. The &lt;code&gt;--host=0.0.0.0&lt;/code&gt; flag makes the server publicly available. The &lt;code&gt;--debug&lt;/code&gt; flag auto-reloads the app when you save changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;flask &lt;span class="nt"&gt;--app&lt;/span&gt; predict.py run &lt;span class="nt"&gt;--debug&lt;/span&gt; &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Querying the Service
&lt;/h3&gt;

&lt;p&gt;Now we'll send a POST request to our server with customer details and receive a churn prediction. Use a Jupyter notebook or separate Python script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9696/predict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# Sample test data from the test dataset
&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;customerid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;4183-myfrb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gender&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;female&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;seniorcitizen&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;partner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dependents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tenure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;phoneservice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;multiplelines&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;internetservice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fiber_optic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;onlinesecurity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;onlinebackup&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deviceprotection&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;techsupport&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;streamingtv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;streamingmovies&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month-to-month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paperlessbilling&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paymentmethod&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;electronic_check&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;monthlycharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;90.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;totalcharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1862.9&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sending promo email to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customerid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Not sending promo email to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customerid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server responds with a 200 OK response:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oz0t02xfbjh2jjkzflx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oz0t02xfbjh2jjkzflx.png" alt="Server Response" width="800" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The query was successful:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi33tct7q21ukwz7pqq7a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi33tct7q21ukwz7pqq7a.png" alt="Query Result" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Docker Time!
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Understanding Docker Components
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;DOCKERFILE:&lt;/strong&gt; A text file (usually named &lt;code&gt;Dockerfile&lt;/code&gt;) containing a series of instructions for building Docker images. Each line represents a new instruction, forming a stack of layers. Each layer is cacheable—when you build an image twice, it uses the cache. When you change a line, it rebuilds all instructions after and including the change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IMAGE:&lt;/strong&gt; The output of building a Dockerfile. Think of it as an executable—just like clicking an icon launches an application, you start an image to launch a container. The image encapsulates your application code and all dependencies, ensuring consistency across environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CONTAINER:&lt;/strong&gt; A dynamic, running instance of a Docker image. One image can spawn many containers. On Linux, containers run as processes on the host machine. On Windows/macOS, Docker runs in a VM. Containers share the kernel but have isolated file systems—they appear like VMs but are much lighter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a Dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a new Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;touch &lt;/span&gt;Dockerfile

&lt;span class="c"&gt;# Open in your editor&lt;/span&gt;
code Dockerfile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important Notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make sure your Python script name matches the name in the ENTRYPOINT layer&lt;/li&gt;
&lt;li&gt;Gunicorn is only for Unix-based systems; use Waitress for Windows&lt;/li&gt;
&lt;li&gt;Leave a space after every line of code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;Dockerfile&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.12-slim&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pipenv

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="c"&gt;# Copy dependency files&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ["Pipfile", "Pipfile.lock", "./"]&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pipenv &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--system&lt;/span&gt; &lt;span class="nt"&gt;--deploy&lt;/span&gt;

&lt;span class="c"&gt;# Copy application files&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ["LOAD.py", "model_C=1.0.bin", "./"]&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 9696&lt;/span&gt;

&lt;span class="c"&gt;# Start the application with Gunicorn&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["gunicorn", "--bind=0.0.0.0:9696", "LOAD:app"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  CMD vs ENTRYPOINT
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;ENTRYPOINT&lt;/strong&gt; defines the main command that must always run—it's like the container's executable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["python", "app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;docker run myapp&lt;/code&gt; executes &lt;code&gt;python app.py&lt;/code&gt;. You can pass parameters: &lt;code&gt;docker run myapp --debug&lt;/code&gt; becomes &lt;code&gt;python app.py --debug&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CMD&lt;/strong&gt; defines default arguments or a fallback command that can be completely overridden.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;docker run myapp bash&lt;/code&gt; overrides CMD and runs &lt;code&gt;bash&lt;/code&gt; instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Combining Both:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["python", "app.py"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["--port=9696"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker run myapp&lt;/code&gt; → &lt;code&gt;python app.py --port=9696&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker run myapp --debug&lt;/code&gt; → &lt;code&gt;python app.py --debug&lt;/code&gt; (overrides CMD only)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ENTRYPOINT can only be overridden with the &lt;code&gt;--entrypoint&lt;/code&gt; flag.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multistage Dockerfiles
&lt;/h3&gt;

&lt;p&gt;For Machine Learning applications, you often need a large environment to build/train your model but only a small runtime environment to serve predictions. Multistage Dockerfiles help create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Smaller images&lt;/strong&gt; – no unused dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More secure&lt;/strong&gt; – fewer libraries = less attack surface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster deployment&lt;/strong&gt; – smaller images push/pull faster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better maintainability&lt;/strong&gt; – clean separation of concerns&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Building and Running the Container
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Build the Docker image:&lt;/strong&gt;&lt;br&gt;
churn-prediction is just a name for the docker image you can literally call it anything. But make sure the name of the build image is the same you pass to the docker run command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; churn-prediction &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxegmwhptmwn7tzvjk6f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxegmwhptmwn7tzvjk6f.png" alt="Docker Build" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run the container:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;PS&lt;/strong&gt;: The errors you are seing on terminal of a running docker instance is because I somehow installed scikit-learn==1.6.1 instead of 1.5.1 but otherwise our model is working just fine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ayc83bpj326a95mcklz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ayc83bpj326a95mcklz.png" alt=" " width="800" height="269"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 9696:9696 churn-prediction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Docker Commands Reference
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Managing Images
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all local images&lt;/span&gt;
docker image &lt;span class="nb"&gt;ls&lt;/span&gt;

&lt;span class="c"&gt;# Run an image&lt;/span&gt;
docker run churn-prediction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker first looks at the local registry for images. If not found locally, it checks Docker Hub. You can also use custom registries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull from a custom registry&lt;/span&gt;
docker run https://registrydomain.com/repository-server:0.1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Managing Containers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;List containers:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List running containers&lt;/span&gt;
docker container &lt;span class="nb"&gt;ls&lt;/span&gt;
&lt;span class="c"&gt;# or&lt;/span&gt;
docker ps

&lt;span class="c"&gt;# List all containers (including stopped)&lt;/span&gt;
docker container &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Start, stop, and remove containers:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Stop a container&lt;/span&gt;
docker container stop &amp;lt;container-id&amp;gt;

&lt;span class="c"&gt;# Restart a container&lt;/span&gt;
docker container restart &amp;lt;container-id&amp;gt;

&lt;span class="c"&gt;# Remove a stopped container&lt;/span&gt;
docker container &lt;span class="nb"&gt;rm&lt;/span&gt; &amp;lt;container-id&amp;gt;

&lt;span class="c"&gt;# Kill a running container&lt;/span&gt;
docker &lt;span class="nb"&gt;kill&lt;/span&gt; &amp;lt;container-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cleanup Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Remove unused containers, networks, and images&lt;/span&gt;
docker system prune

&lt;span class="c"&gt;# Remove everything including unused images&lt;/span&gt;
docker system prune &lt;span class="nt"&gt;-a&lt;/span&gt;

&lt;span class="c"&gt;# Also remove volumes (deletes data!)&lt;/span&gt;
docker system prune &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nt"&gt;--volumes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Accessing Containers
&lt;/h3&gt;

&lt;p&gt;For debugging purposes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Access a container's shell&lt;/span&gt;
docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &amp;lt;container-id&amp;gt; bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;









&lt;h2&gt;
  
  
  Deploying to AWS Elastic Beanstalk
&lt;/h2&gt;

&lt;p&gt;Now that we have our containerized application, let's deploy it to AWS Elastic Beanstalk using the EB CLI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before deploying, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account&lt;/li&gt;
&lt;li&gt;AWS credentials configured &lt;/li&gt;
&lt;li&gt;A credit card to verify and activate your account&lt;/li&gt;
&lt;li&gt;Your Docker image working locally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other free alternatives include render, railway,fly.io, heroku etc. But the deployment process is almost pretty follows the same logic only that you have to check out their documentation to know the commands they use. However, for now let's use aws elastic beanstalk service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing the EB CLI
&lt;/h3&gt;

&lt;p&gt;First, install the Elastic Beanstalk CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install EB CLI using pipenv (recommended for this project)&lt;/span&gt;
pipenv &lt;span class="nb"&gt;install &lt;/span&gt;awsebcli &lt;span class="nt"&gt;--dev&lt;/span&gt;

&lt;span class="c"&gt;# Or install globally using pip&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;awsebcli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;eb &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuring AWS Credentials
&lt;/h3&gt;

&lt;p&gt;If you haven't configured your AWS credentials yet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configure AWS CLI&lt;/span&gt;
aws configure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll be prompted to enter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Access Key ID&lt;/li&gt;
&lt;li&gt;AWS Secret Access Key&lt;/li&gt;
&lt;li&gt;Default region (e.g., &lt;code&gt;us-east-1&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Default output format (e.g., &lt;code&gt;json&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Initializing Elastic Beanstalk
&lt;/h3&gt;

&lt;p&gt;Navigate to your project directory and initialize EB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Initialize Elastic Beanstalk&lt;/span&gt;
eb init &lt;span class="nt"&gt;-p&lt;/span&gt; docker &lt;span class="nt"&gt;-r&lt;/span&gt; us-east-1 churn-prediction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Flags explained:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-p docker&lt;/code&gt;: Specifies the platform (Docker)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-r us-east-1&lt;/code&gt;: AWS region (choose your preferred region)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;churn-prediction&lt;/code&gt;: Your application name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You'll be prompted with questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select your region&lt;/li&gt;
&lt;li&gt;Enter application name (or accept default)&lt;/li&gt;
&lt;li&gt;Choose whether to use CodeCommit (typically select "no")&lt;/li&gt;
&lt;li&gt;Set up SSH for your instances (recommended for debugging)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Creating an Environment
&lt;/h3&gt;

&lt;p&gt;Create an environment to run your application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create an environment named 'churn-prediction-env'&lt;/span&gt;
eb create churn-prediction-env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This process takes several minutes as AWS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates an EC2 instance&lt;/li&gt;
&lt;li&gt;Sets up load balancers&lt;/li&gt;
&lt;li&gt;Configures security groups&lt;/li&gt;
&lt;li&gt;Deploys your Docker container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You'll see real-time logs of the deployment process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi15t3vepksign5i2a6yj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi15t3vepksign5i2a6yj.jpeg" alt=" " width="280" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring Deployment Status
&lt;/h3&gt;

&lt;p&gt;Check the status of your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check environment status&lt;/span&gt;
eb status

&lt;span class="c"&gt;# View recent events&lt;/span&gt;
eb events

&lt;span class="c"&gt;# Follow logs in real-time&lt;/span&gt;
eb logs &lt;span class="nt"&gt;--stream&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing Your Deployed Application
&lt;/h3&gt;

&lt;p&gt;Once deployment is complete, get your application's URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Open your application in a browser&lt;/span&gt;
eb open
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or manually test the endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# Replace with your actual EB URL
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://churn-prediction-env.us-east-1.elasticbeanstalk.com/predict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;customerid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;4183-myfrb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gender&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;female&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;seniorcitizen&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;partner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dependents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tenure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;phoneservice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;multiplelines&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;internetservice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fiber_optic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;onlinesecurity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;onlinebackup&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deviceprotection&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;techsupport&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;streamingtv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;streamingmovies&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month-to-month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paperlessbilling&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paymentmethod&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;electronic_check&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;monthlycharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;90.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;totalcharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1862.9&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Updating Your Application
&lt;/h3&gt;

&lt;p&gt;When you make changes to your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deploy updates&lt;/span&gt;
eb deploy

&lt;span class="c"&gt;# Monitor deployment&lt;/span&gt;
eb status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment Configuration
&lt;/h3&gt;

&lt;p&gt;You can modify environment variables and settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
eb setenv &lt;span class="nv"&gt;FLASK_ENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production &lt;span class="nv"&gt;MODEL_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1.0

&lt;span class="c"&gt;# View current configuration&lt;/span&gt;
eb config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scaling Your Application
&lt;/h3&gt;

&lt;p&gt;Scale your application based on traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable auto-scaling (via AWS Console or CLI)&lt;/span&gt;
&lt;span class="c"&gt;# Minimum 1 instance, maximum 4 instances&lt;/span&gt;
eb scale 2  &lt;span class="c"&gt;# Set to 2 instances&lt;/span&gt;

&lt;span class="c"&gt;# Or configure auto-scaling through the console&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Monitoring and Logs
&lt;/h3&gt;

&lt;p&gt;Access logs and monitoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download logs&lt;/span&gt;
eb logs

&lt;span class="c"&gt;# Stream logs in real-time&lt;/span&gt;
eb logs &lt;span class="nt"&gt;--stream&lt;/span&gt;

&lt;span class="c"&gt;# View health status&lt;/span&gt;
eb health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Troubleshooting Common Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Issue: Deployment fails&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check detailed logs&lt;/span&gt;
eb logs

&lt;span class="c"&gt;# SSH into the instance for debugging&lt;/span&gt;
eb ssh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Issue: Health status is degraded&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check if the application is responding on the correct port (9696)&lt;/li&gt;
&lt;li&gt;Verify the Docker container is running&lt;/li&gt;
&lt;li&gt;Check environment variables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Issue: Connection timeout&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure security groups allow inbound traffic on port 80/443&lt;/li&gt;
&lt;li&gt;Verify the load balancer health check settings&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Management
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Elastic Beanstalk environments incur costs. To avoid unnecessary charges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminate the environment when not in use&lt;/span&gt;
eb terminate churn-prediction-env

&lt;span class="c"&gt;# Or just stop it temporarily (still incurs some costs)&lt;/span&gt;
eb stop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuration Files
&lt;/h3&gt;

&lt;p&gt;For more control, create an &lt;code&gt;.ebextensions&lt;/code&gt; directory with configuration files:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.ebextensions/01_flask.config&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;option_settings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;aws:elasticbeanstalk:application:environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;PYTHONPATH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/var/app/current:$PYTHONPATH"&lt;/span&gt;
  &lt;span class="na"&gt;aws:elasticbeanstalk:container:python&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;WSGIPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;predict:app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use environment variables&lt;/strong&gt; for sensitive data (API keys, database credentials)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set up health checks&lt;/strong&gt; to ensure your application is responding correctly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable logging&lt;/strong&gt; to CloudWatch for better monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use HTTPS&lt;/strong&gt; by configuring SSL certificates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement auto-scaling&lt;/strong&gt; based on your traffic patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tag your resources&lt;/strong&gt; for better cost tracking&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Useful EB CLI Commands Summary
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Initialize EB in your project&lt;/span&gt;
eb init

&lt;span class="c"&gt;# Create a new environment&lt;/span&gt;
eb create &amp;lt;environment-name&amp;gt;

&lt;span class="c"&gt;# Deploy application updates&lt;/span&gt;
eb deploy

&lt;span class="c"&gt;# Open application in browser&lt;/span&gt;
eb open

&lt;span class="c"&gt;# Check environment status&lt;/span&gt;
eb status

&lt;span class="c"&gt;# View logs&lt;/span&gt;
eb logs

&lt;span class="c"&gt;# SSH into instance&lt;/span&gt;
eb ssh

&lt;span class="c"&gt;# Terminate environment&lt;/span&gt;
eb terminate

&lt;span class="c"&gt;# List all environments&lt;/span&gt;
eb list

&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
eb setenv &lt;span class="nv"&gt;KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;VALUE

&lt;span class="c"&gt;# Scale application&lt;/span&gt;
eb scale &amp;lt;number-of-instances&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You've successfully:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trained a machine learning model&lt;/li&gt;
&lt;li&gt;Packaged it with Pickle&lt;/li&gt;
&lt;li&gt;Created a Flask web service&lt;/li&gt;
&lt;li&gt;Containerized the application with Docker&lt;/li&gt;
&lt;li&gt;Deployed it to AWS Elastic Beanstalk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The containerized approach ensures consistency across environments, and AWS Elastic Beanstalk handles scaling, monitoring, and infrastructure management automatically. Congratulations on learning how to deploy your models. See you in the next one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/elasticbeanstalk/" rel="noopener noreferrer"&gt;AWS Elastic Beanstalk Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3.html" rel="noopener noreferrer"&gt;EB CLI Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.docker.com/" rel="noopener noreferrer"&gt;Docker Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://flask.palletsprojects.com/" rel="noopener noreferrer"&gt;Flask Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>docker</category>
      <category>containers</category>
      <category>pipenv</category>
    </item>
    <item>
      <title>Microsoft Just Laid Off 6000 Employees(but why?)Is AI to blame</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Fri, 23 May 2025 06:36:21 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/microsoft-just-laid-off-6000-employeesis-ai-to-blame-54jk</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/microsoft-just-laid-off-6000-employeesis-ai-to-blame-54jk</guid>
      <description>&lt;p&gt;Microsoft just laid off 6,000 employees and the internet has always freaked out. You have probably seen the headlines. "AI is replacing all the developers!" or "Tech is dead!" But before we spiral into yet another panic cycle, it’s worth taking a step back to assess what’s actually happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Layoffs Aren’t New
&lt;/h2&gt;

&lt;p&gt;Let’s set the record straight. Layoffs in tech aren’t some AI-induced anomaly. They’ve occurred consistently over the years, long before artificial intelligence became a mainstream buzzword.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2014: Microsoft laid off 18,000 after acquiring Nokia.&lt;/li&gt;
&lt;li&gt;2015: HP slashed 30,000 jobs during a major restructure.&lt;/li&gt;
&lt;li&gt;2016: Intel let go of 12,000, shifting away from legacy PC products.
PCs, IBM, Yahoo, and plenty of other giants have done the same. Companies change direction. They restructure. They cut costs. Layoffs like these are about strategy, not science fiction .&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What did these Layoffs mean
&lt;/h2&gt;

&lt;p&gt;That's what a pivot looks like. Letting go in one area to reinvest in another. These layoffs were strategic — part of larger business pivots. It’s called restructuring, not extinction. When tech companies lay people off, it’s often a sign not of decline, but of redirection. Intel’s layoffs, for example, weren’t just about trimming the workforce. They were about shifting focus to emerging sectors like cloud computing and smart devices. Over time, that strategic move created new opportunities and thousands of jobs in those high-growth areas. This is what a corporate pivot looks like: &lt;strong&gt;reducing investment in legacy operations to reinvest in the future.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Social Clickbait Circus.
&lt;/h2&gt;

&lt;p&gt;But of course, social media clowns turn it into something bigger than it really is. They don't waste a second before yelling, “AI took their jobs!” Something goes wrong in tech? "Blame AI." —Plenty of news outlets do the same. &lt;em&gt;Spin fear into clicks&lt;/em&gt; because fear gets attention. They're not trying to explain what's actually happening. They're just &lt;em&gt;chasing views&lt;/em&gt;. It’s no different of the vaccine panic during the pandemic when nearly every misfortune was blamed on vaccines.  Heart attacks, car crashes, even bad weather. Someone stubbed their toe, must be the vaccine. Today, AI has become the new scapegoat. If something goes wrong in tech, AI is immediately is to blame.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Isn’t Dying — It’s Evolving
&lt;/h2&gt;

&lt;p&gt;The truth, however, is far more grounded. Companies still need engineers, not just people who know how to write code. I'm talking about real engineers. People who can think clearly, write clean, maintainable code, understand how systems behave, and solve real world problems. With AI in the picture, the tech career is entering a new chapter. We'll be building tools and platforms we can't even imagine today. These tools will create new challenges. And those challenges will need you, thoughtful, capable engineers, solutions architects(you name them) to solve them. &lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe Coders Won’t Survive
&lt;/h2&gt;

&lt;p&gt;Unfortunately, vibe coding has become a trend lately. People pasting prompts into an LLM and calling themselves engineers and yet they don't understand data structures, design patterns, system design, security, or even how to debug. Relying solely on AI-generated output without the ability to evaluate, improve, or even understand it renders you dangerously unprepared. It's like flying a plane by watching YouTube tutorials midair. That makes you reckless and unsustainable. The reality is, if your skills begin and end with copying code without comprehension, you're not just replaceable by AI. You’re irrelevant.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Industry Needs Now
&lt;/h2&gt;

&lt;p&gt;If you want to keep growing in the industry and stay relevant, you need real skills. Solid fundamentals. Deep understanding of how things work under the hood. Companies are going to pay you for what you actually know. Not for the vibes. For example, Did you know the demand on engineers are going up, not down? Back in the 80s or '90s, most tech job listings focused on knowing one language, a bit of databases, maybe some basic algorithms. But today, you have to know multiple languages, frameworks, CI/CD, cloud platforms, architecture, APIs, security, automated testing, the list goes on. And tomorrow it'll be even more. AI will help us automate boring repetitive tasks and help us to focus on the bigger more complex problems. So to some point, AI will make it possible for everyone to develop code just like anyone can use a calculator but not everyone can do math. &lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If all the noise lately has left you feeling anxious, don't let it. Don't let the social media clowns get to you. Don't let fear-based headlines distract you. Stay curious, keep learning, keep growing. The future doesn't belong to people yelling on the internet. It belongs to the ones who show up, level up, and keep building.&lt;br&gt;
So, if all the noise has left you feeling anxious, take heart. &lt;/p&gt;

&lt;p&gt;That’s all for now. Let’s keep pushing boundaries with AI. I'll see you in the next one.&lt;/p&gt;

</description>
      <category>microsoft</category>
      <category>layoffs</category>
      <category>ai</category>
      <category>career</category>
    </item>
    <item>
      <title>Ultimate guide to creating a pipeline(Apache Airflow)</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Thu, 22 May 2025 03:28:08 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/ultimate-guide-to-creating-a-pipelineapache-airflow-165b</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/ultimate-guide-to-creating-a-pipelineapache-airflow-165b</guid>
      <description>&lt;p&gt;Hello there data enthusiasts. Today's guide walks you through building a complete data pipeline using Apache Airflow. Apache Airflow is an open-source platform designed to programmatically author, schedule, and monitor workflows. A workflow—such as an ETL process, machine learning pipeline, or reporting task—is a directed sequence of dependent tasks that transforms raw data into valuable output. In this article, we’ll cover setting up WSL (for Windows users), installing PostgreSQL via the terminal, setting up and configuring Apache Airflow, and creating your first DAG (a Python-defined graph representing the workflow). We’ll conclude by executing the DAG and observing it in action. So buckle up and lets dive in:&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation WSL
&lt;/h2&gt;

&lt;p&gt;We'll begin by installing the Windows Subsystem for Linux (WSL), which allows Windows users to run a native Linux environment directly on Windows—ideal for working with tools like Apache Airflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Open PowerShell as Administrator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wsl &lt;span class="nt"&gt;--install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command installs WSL 2 as the default with the latest Ubuntu distribution and required kernel updates. Restart your machine if prompted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qlpsxdwptzq7a0o820x.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qlpsxdwptzq7a0o820x.jpeg" alt="Image description" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Verify Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wsl &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fny9i53isvdqxbkfsoddh.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fny9i53isvdqxbkfsoddh.jpeg" alt="Image description" width="765" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Launch Ubuntu
&lt;/h3&gt;

&lt;p&gt;Search for Ubuntu in the Start menu, launch it, and set up your UNIX username and password.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsdlsxou6tknqmjpb6br.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsdlsxou6tknqmjpb6br.jpeg" alt="Image description" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From here, all commands in this guide will be run from within the Ubuntu terminal which is native for all linux users and we have just configured it for windows users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up Postgresql
&lt;/h2&gt;

&lt;p&gt;After successfully setting up your ubuntu account, lets proceed to setting up postgresql that will serve as the metadata database for Apache Airflow—storing DAG states, task history, logs, and configurations&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Update package lists
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Install Postgresql
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;postgresql postgresql-contrib
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Start the PostgreSQL Service
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;service postgresql start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check whether postgresql is running&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;service postgresql status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Access the PostgreSQL Shell
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; postgres
psql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see &lt;code&gt;postgres#&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Create Airflow Database and User
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'airflowpass'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting up Apache Airflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Download python to your environment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;python3 python3-venv python3-pip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1: Create and Activate a Virtual Environment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3.10 &lt;span class="nt"&gt;-m&lt;/span&gt; venv airflow_env &lt;span class="c"&gt;#make sure the python version is between 3.7 and 3.11&lt;/span&gt;
&lt;span class="nb"&gt;source &lt;/span&gt;airflow_env/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;N/B:&lt;/strong&gt; Always activate your virtual environment when starting apache airflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Set Environment Variables for Airflow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AIRFLOW_HOME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;~/airflow
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AIRFLOW__CORE__SQL_ALCHEMY_CONN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'postgresql+psycopg2://airflow:airflowpass@localhost:5432/airflow'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;'airflowpass'&lt;/code&gt; with your actual password if different.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Install Apache Airflow with PostgreSQL Support
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AIRFLOW_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2.8.1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PYTHON_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;python3 &lt;span class="nt"&gt;--version&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;" "&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; 2 | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"."&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; 1,2&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CONSTRAINT_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://raw.githubusercontent.com/apache/airflow/constraints-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AIRFLOW_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/constraints-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PYTHON_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.txt"&lt;/span&gt;

pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"apache-airflow[postgres]==&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AIRFLOW_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--constraint&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONSTRAINT_URL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Initialize the Airflow Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow db init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;N/B:&lt;/strong&gt; This command is only used during after fresh install of Apache airflow or when you create a new environment for Apache airflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Create Admin User
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow &lt;span class="nb"&gt;users &lt;/span&gt;create &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--username&lt;/span&gt; admin &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--firstname&lt;/span&gt; Admin &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--lastname&lt;/span&gt; User &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt; Admin &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--email&lt;/span&gt; admin@example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6: Start Webserver and Scheduler
&lt;/h3&gt;

&lt;p&gt;Open another ubuntu terminal instance and add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow webserver &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwdnxvvibqzggyr8ccw3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwdnxvvibqzggyr8ccw3.png" alt="Image description" width="800" height="856"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow scheduler
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7autpevl20peno10sej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7autpevl20peno10sej.png" alt="Image description" width="800" height="856"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N/B:&lt;/strong&gt; Always restart the scheduler after making changes to the dag &lt;code&gt;.py&lt;/code&gt; files or after rebooting your device or after killing the terminal instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Access the Airflow UI
&lt;/h3&gt;

&lt;p&gt;Open your browser and go to: &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F365u07lxkpyiew8yx7qg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F365u07lxkpyiew8yx7qg.png" alt="Image description" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Proper Configuration of Airflow
&lt;/h2&gt;

&lt;p&gt;Before we create our first DAGs, we need to optimize apache airflow for performance by confirming some of its configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Editing the &lt;code&gt;airflow.cfg&lt;/code&gt; file
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;airflow_env/bin/activate
&lt;span class="nb"&gt;cd &lt;/span&gt;airflow
&lt;span class="nb"&gt;ls &lt;/span&gt;airflow.cfg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw8dr3fmtb3g4vy4bco8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw8dr3fmtb3g4vy4bco8.png" alt="Image description" width="800" height="196"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now edit the file using nano:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano airflow.cfg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;default_timezone&lt;/strong&gt;
&lt;code&gt;default_timezone = Africa/Nairobi&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;executor&lt;/strong&gt;
&lt;code&gt;executor = LocalExecutor&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;load_examples&lt;/strong&gt;
&lt;code&gt;load_examples = False&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sql_alchemy_conn&lt;/strong&gt;
&lt;code&gt;sql_alchemy_conn = postgresql+psycopg2://postgres:&amp;lt;password&amp;gt;@localhost:5432/postgres&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To save: &lt;code&gt;ctrl + x&lt;/code&gt;, press &lt;code&gt;y&lt;/code&gt;, press &lt;code&gt;enter&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Creating a dags folder
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;dags
&lt;span class="nb"&gt;cd &lt;/span&gt;dags
&lt;span class="nb"&gt;touch &lt;/span&gt;DAG.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open VS Code and paste the following into &lt;code&gt;DAG.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.python&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PythonOperator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;

&lt;span class="n"&gt;DB_CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15304232&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5432&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.coingecko.com/api/v3/simple/price?ids=bitcoin,ethereum&amp;amp;vs_currencies=usd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;coin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;coin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;DB_CONFIG&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        CREATE SCHEMA IF NOT EXISTS crypto;
        CREATE TABLE IF NOT EXISTS crypto.crypto_prices (
            coin TEXT,
            usd_price NUMERIC,
            timestamp TIMESTAMP
        );
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executemany&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO crypto.crypto_prices (coin, usd_price, timestamp) VALUES (%s, %s, %s);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;etl&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;clean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;simple_coingecko_etl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@hourly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;catchup&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;run_etl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PythonOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_etl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;python_callable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;etl&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;run_etl&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;N/B:&lt;/strong&gt; Make sure you have the correct connection strings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DB_CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5432&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Open the file &lt;code&gt;DAG.py&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano DAG.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste the contents from VS Code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8g0hm8uewh0va6xdx3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8g0hm8uewh0va6xdx3y.png" alt="Image description" width="800" height="856"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N/B:&lt;/strong&gt; After the configuration, make sure you restart the scheduler.&lt;/p&gt;

&lt;h2&gt;
  
  
  Airflow UI configuration
&lt;/h2&gt;

&lt;p&gt;Refresh the tab hosted locally at: &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Log in using the credentials created earlier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo28hpqm0uejjd8jhudrl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo28hpqm0uejjd8jhudrl.png" alt="Image description" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your DAG is working correctly, you should see:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm92usc23sol27i5cn3z4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm92usc23sol27i5cn3z4.png" alt="Image description" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Confirm on DBeaver that the database is created and tables are loading
&lt;/h2&gt;

&lt;p&gt;Before checking on DBeaver, edit the &lt;code&gt;postgresql.conf&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/postgresql/&amp;lt;your_postgresversion&amp;gt;/main/postgresql.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;listen_addresses&lt;/span&gt; = &lt;span class="s1"&gt;'*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof4doy3094imxc7aa1pi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof4doy3094imxc7aa1pi.png" alt="Image description" width="646" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Edit the connection in DBeaver:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffq6zeyriykkwypvj5ltr.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffq6zeyriykkwypvj5ltr.jpeg" alt="Image description" width="642" height="638"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Enter your database info and test the connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1qomzmbzzhk1odd9r0k.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1qomzmbzzhk1odd9r0k.jpeg" alt="Image description" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To know your host name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;hostname&lt;/span&gt; &lt;span class="nt"&gt;-I&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjxeh516oxdqidunkljv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjxeh516oxdqidunkljv.png" alt="Image description" width="800" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Its been a long and bumpy article but I hope it was helpful. Feel free to leave a comment and add insights to help me improve on the article. &lt;br&gt;
That's all for now and let's keep it data. Bye for now!&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>python</category>
      <category>airflow</category>
      <category>dag</category>
    </item>
    <item>
      <title>Mistakes Beginners Make When Visualizing(Power BI)</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Tue, 20 May 2025 04:57:28 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/mistakes-beginners-make-when-visualizingpower-bi-41pg</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/mistakes-beginners-make-when-visualizingpower-bi-41pg</guid>
      <description>&lt;p&gt;Hello there data enthusiast. Today I'm back with Mistakes beginners make when creating dashboards using Power BI. I hope you stick till the end because I won’t waste your time. With that said, let's get fraudy, shall we:&lt;/p&gt;

&lt;p&gt;When creating dashboards, it is very important that you do not mix up yourself by coloring too much and forgetting the main goal of BI tools which is to show trends, patterns, and insights in a way that is easily understandable. Today I want you to learn from the mistakes I made when creating a real estate dashboard so that you won’t have to make the same mistakes again. For that reason, I would like you to pause for a minute and look at the following visual and spot out its mistakes before you dive all in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj55god2nroq43pfi0ejm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj55god2nroq43pfi0ejm.png" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Mistakes Beginners Make When Creating Dashboards Using Power BI
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1: Overuse of Colors Without Hierarchy
&lt;/h2&gt;

&lt;p&gt;Power BI makes it easy to go wild with colors. But in the dashboard shown above, we see &lt;strong&gt;bold red&lt;/strong&gt;, &lt;strong&gt;multiple greens&lt;/strong&gt;, and &lt;strong&gt;highlighted blocks&lt;/strong&gt; all competing for attention. Color should guide the eye, not confuse it. Use color sparingly—assign it to draw attention to &lt;strong&gt;critical insights only&lt;/strong&gt;, not every card. Remember the goal is to turn raw data into meaningful charts and graphs that give actionable insights and not to showcase your graphic design abilities.&lt;/p&gt;




&lt;h2&gt;
  
  
  2: Overcrowding and Visual Clutter
&lt;/h2&gt;

&lt;p&gt;The dashboard tries to cram a lot of information into a single screen. There are many visuals, and some of them feel squeezed. This makes it hard for the user to focus on the most important insights. By wanting to show everything you end up having cluttered, overwhelming dashboards. A key principle of dashboard design is less is more. For example, the tile (was intended to be a slicer) labeled “Locations” displays six regions in huge blocks, but adds &lt;strong&gt;no visual value&lt;/strong&gt;. This can be replaced with a &lt;strong&gt;complete unfiltered slicer or drop-down filter&lt;/strong&gt; to allow interaction without crowding the canvas. The number of cards, along with the two bar charts and scatter plot, completely confuse the audience. It's not immediately clear what the most important information is. The dashboard lacks a strong visual hierarchy to guide the user's eye.&lt;/p&gt;




&lt;h2&gt;
  
  
  3: Inconsistent Number Formatting
&lt;/h2&gt;

&lt;p&gt;Check the value cards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Max Price: $350M” vs. “Min Price: $6.5M” — USD.&lt;/li&gt;
&lt;li&gt;“Total Revenue: Ksh9.3bn” and “Average Revenue: Ksh65.30M” — KES.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mixing currencies without proper labels or context confuses your audience. Imagine presenting such an inconsistent dashboard report to your stakeholder (let's not even go there). Always maintain a &lt;strong&gt;consistent currency&lt;/strong&gt; or provide &lt;strong&gt;clear legends and conversion rates&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4: Inconsistent and Inappropriate Visual Choices
&lt;/h2&gt;

&lt;p&gt;Some of the visual choices don't seem optimal. Don't just pick visuals because they look fancy, but because they effectively communicate the data. For example a line chart is not the best visual for representing the price by bedrooms. Line charts are used to show trends over time. In this case the best fit would have been a bar chart. Also the bar charts, while not terrible, might not be the most effective way to compare price by location. The location should be used as with a slicer.&lt;br&gt;&lt;br&gt;
Again, the “Average Price” visual using a &lt;strong&gt;gauge chart&lt;/strong&gt;— is commonly discouraged unless you're showing progress toward a goal. In this context, the average value is better suited for a &lt;strong&gt;card or bar comparison&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  6: Overloading Summary Cards and Scatter Plots
&lt;/h2&gt;

&lt;p&gt;Six KPI cards are presented upfront—Max Price, Min Price, Total Houses, Revenue, Average Revenue, and Missing Values. It’s a cognitive overload. Group related metrics together and &lt;strong&gt;only highlight the 3–4 most actionable ones.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Again, the “Price Bedroom Scatter” chart lacks axis titles and has dense plotting. While scatter plots can reveal patterns, &lt;strong&gt;tooltips and filters&lt;/strong&gt; are essential for large datasets. Consider applying &lt;strong&gt;zoom or grouping logic&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it should be
&lt;/h2&gt;

&lt;p&gt;Thank you for sticking around to the end. Now that we have highlighted the mistakes, let's have a look at the corrected version of the dashboard report.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug62yhinuhnq0rqmfxjk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug62yhinuhnq0rqmfxjk.png" alt="Image description" width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;Dashboards aren’t just about showing data—they’re about &lt;strong&gt;communicating insight&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
As you build your BI skills, remember: clarity, consistency, and usability &amp;gt; decoration.&lt;/p&gt;

&lt;p&gt;What did I miss? Feel free to drop a comment (Critics are highly appreciated by the way).&lt;/p&gt;

&lt;p&gt;Until next time, stay data-driven and &lt;strong&gt;don’t get fraudy!&lt;/strong&gt; (just kidding)&lt;/p&gt;

</description>
      <category>powerfuldevs</category>
      <category>beginners</category>
      <category>visualize</category>
      <category>data</category>
    </item>
    <item>
      <title>Machine Learning for beginners</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Sun, 18 May 2025 07:39:43 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/machine-learning-for-beginners-3dk2</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/machine-learning-for-beginners-3dk2</guid>
      <description>&lt;p&gt;Hello there, data enthusiast! Glad to have you back for another deep dive. So buckle up. Today, I will walk you through one of the major field in data that is the talk of on everyone's lips. &lt;strong&gt;Machine learning&lt;/strong&gt;. Today, we will have a look at what machine learning is, what problems require machine learning domain, how to know when to use machine learning and then we will wrap it by creating a simple regression model.(do not worry if you do not get what a model is or what regression  involves I'll walk you through all the basics you need to  understand machine learning more intuitively.) Lets get fraudy, shall we:&lt;/p&gt;

&lt;h2&gt;
  
  
  What is machine Learning
&lt;/h2&gt;

&lt;p&gt;That's a good place to start. So, what is machine learning? &lt;strong&gt;Machine learning&lt;/strong&gt; is an &lt;strong&gt;art&lt;/strong&gt; of programming computers so that they can learn from data. The part of machine learning that learns and makes predictions is called a &lt;strong&gt;model&lt;/strong&gt;. That's right, it's an &lt;strong&gt;art&lt;/strong&gt;. And if you are paying close, attention you should realize that &lt;strong&gt;machine learning&lt;/strong&gt; is an &lt;strong&gt;art&lt;/strong&gt; of developing &lt;strong&gt;models&lt;/strong&gt;. If you are passionate about art you definitely will enjoy machine learning. (pun intended)&lt;/p&gt;

&lt;h2&gt;
  
  
  What does Machine Learning involve
&lt;/h2&gt;

&lt;p&gt;Now that we know how to define machine learning in one long sentence, let's get a little deeper and get to understand what kind of problems or projects require to be solved using machine learning concepts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Problems that require a lot of fine tuning or long lists of rules&lt;/strong&gt;. Traditional rule-based systems require explicitly coded instructions for every condition. When the number of rules becomes massive or unmanageable (e.g., spam detection based on keywords or transaction fraud) machine learning becomes practical. A spam filter would learn which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam folder. That's a perfect example of how machine learning is helpful because that simple problem would require thousands of &lt;code&gt;else if&lt;/code&gt; statements just to detect spam emails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Complex programming problems that traditional hard coding yields zero to no results&lt;/strong&gt;. No matter your coding skills, there are some real world problems you cannot hard code on your computer because there is no simple algorithmic solution to detect the problem let alone solving the problem. For example, Image recognition, natural language understanding, and speech-to-text conversion are governed by high-dimensional, non-linear patterns that can’t be feasibly encoded manually. ML models like Convolutional Neural Networks (CNNs) or Transformers automatically extract abstract features and model these relationships through training. (Don't get too scared about big words like Convolutional Neural Networks yet, there are simply no easier terms to explain this nut crack. But I hope you get the point.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. A highly fluctuating environment. A machine learning environment that can be retrained on new data&lt;/strong&gt;. Think about a highly volatile system involving stock market price predictions. This kind of system requires constant update of the system model with new available information to make sure that the system performance remains optimal.  This adaptability makes ML preferable over static, rule-based systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Getting insights from large amounts of data.&lt;/strong&gt; Digging into large amounts of data to gain insights is called data mining and Machine learning excels in it.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Machine Learning Project
&lt;/h2&gt;

&lt;p&gt;Machine learning models can be categorized in various ways, such as by the type of supervision (supervised, unsupervised, or reinforcement learning), their ability to learn incrementally (online vs. batch learning), and whether they rely on instance-based learning or model-based learning. In this context, we will build a simple model that falls under supervised learning, where the model is trained on labeled data to make predictions. It's just a simple model so just follow along.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supervised Learning.
&lt;/h2&gt;

&lt;p&gt;Under supervised learning, there can be various classifications: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Classification&lt;/strong&gt;. Training a machine learning model based on a class. eg ( ham or spam emails)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;. Predicting a target numeric value with a set of data with given features.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Logistic regression&lt;/strong&gt;. Classification of regression eg 20% chance of being spam.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The  model we are going to create below will be a very simple regression model that will give you the big picture of what creating machine learning models involves. It uses a python code that loads data, separates the inputs x and the labels y, creates a scatter plot for visualization , and then trains a linear model that makes predictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  A regression model
&lt;/h2&gt;

&lt;p&gt;First you want to make sure you have python setup on your machine and have a code editor of your choice already fired ready to create a model. For this demo, I am going to use VS code. press &lt;code&gt;ctrl+shift+p&lt;/code&gt;to open a new notebook and name the notebook with a &lt;code&gt;.ipynb&lt;/code&gt;extension. Then make sure the data set you are going to use(for those with locally available datasets) is in the same directory as the &lt;code&gt;.ipynb&lt;/code&gt; notebook. But for our case, we are going to obtain our dataset from an online Github repo. But for future models, where you have your dataset locally stored, make sure that you have it in the same directory as your notebook file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installing the Libraries.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install matplotlib numpy pandas scikit-learn&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Importing the Libraries&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 
from sklearn.linear_model import LinearRegression
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;pyplot&lt;/code&gt;&lt;/strong&gt; library provides functions to create static, animated, and interactive visualizations (e.g., scatter plots, line charts).&lt;br&gt;
&lt;strong&gt;&lt;code&gt;numpy&lt;/code&gt;&lt;/strong&gt; library is a fundamental package for numerical computing in Python, used for handling arrays, mathematical operations, and linear algebra. (Today we are just going to use it for mathematical ops and handling of arrays. No linear algebra for today)&lt;br&gt;
&lt;strong&gt;&lt;code&gt;pandas&lt;/code&gt;&lt;/strong&gt; library is a powerful data analysis library used for reading, writing, and manipulating structured data through DataFrames.&lt;br&gt;
&lt;strong&gt;&lt;code&gt;scikit-learn&lt;/code&gt;&lt;/strong&gt; library is a popular and versatile Python library for machine learning. It provides a wide range of tools and algorithms for tasks like classification, regression, clustering, and data preprocessing.The LinearRegression class from Scikit-learn enables fitting linear models to data for regression tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Obtaining, extracting and storing the datasets&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
data_root = "https://github.com/ageron/data/raw/main/"
lifesat=pd.read_csv(data_root + "lifesat/lifesat.csv")
X = lifesat[["GDP per capita (USD)"]].values
y = lifesat[["Life satisfaction"]].values
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;here is what the above code does:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;data_root = "https://github.com/ageron/data/raw/main/"&lt;/code&gt;&lt;br&gt;
Defines the base URL where the dataset is hosted; it's used to construct the full path to the CSV file.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;lifesat = pd.read_csv(data_root + "lifesat/lifesat.csv")&lt;/code&gt;&lt;br&gt;
Downloads and loads the CSV file into a pandas DataFrame named lifesat. A DataFrame is a datastructure used in python libraries like pandas and other languages like R used to organize raw csv data  in rows and columns&lt;/p&gt;

&lt;p&gt;X = lifesat[["GDP per capita (USD)"]].values&lt;br&gt;
Extracts the GDP per capita column as a NumPy array X, formatted as a 2D array with shape (n_samples, 1).&lt;/p&gt;

&lt;p&gt;y = lifesat[["Life satisfaction"]].values&lt;br&gt;
Extracts the life satisfaction scores as a NumPy array y, also as a 2D array for model training compatibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visualizing the dataset&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why visualize during training?&lt;/strong&gt;: Visualizations (e.g., scatter plots, histograms) help reveal patterns, trends, outliers, and potential correlations, guiding feature selection and preprocessing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lifesat.plot(kind='scatter',  grid=True, x="GDP per capita (USD)", y="Life satisfaction")
plt.axis([23500, 62500, 4, 9])
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;lifesat.plot(...)&lt;/code&gt;&lt;br&gt;
Creates a scatter plot from the lifesat DataFrame, plotting GDP per capita on the x-axis and Life satisfaction on the y-axis, with grid lines enabled.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;plt.axis([23500, 62500, 4, 9])&lt;/code&gt;&lt;br&gt;
Sets the range of the x-axis from 23,500 to 62,500 and the y-axis from 4 to 9 to focus on a specific region of the data.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;plt.show()&lt;/code&gt;&lt;br&gt;
Displays the plotted figure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model selection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;model = LinearRegression()&lt;/code&gt;&lt;br&gt;
Select the model &lt;code&gt;LinearRegression()&lt;/code&gt; which is an inbuilt function and the store it in the  variable model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Training.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The line model.fit(X, y) trains the machine learning model (e.g., LinearRegression) by finding the best parameters (e.g., slope and intercept) that minimize the prediction error between input features X and target values y. If I threw you off the bus a little, try to imagine your model is a student learning to draw a straight line through dots on a paper.&lt;/p&gt;

&lt;p&gt;When you say model.fit(X, y), you're telling the student: "Look at these dots (X and y), and draw the best straight line that goes as close as possible to all of them." It learns the underlying relationship in the data to make future predictions on unseen input.&lt;br&gt;
It is simply used to select the best features that will train the model to make the right predictions with minimal errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can Our Model Make A Prediction?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X_new = [[37_655.2]] #cyprus gdp per capita in 20202
print(model.predict(X_new)) #output: [[6.30165767]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code tells the trained model: "If a country has a GDP per capita of 37,655.20, what life satisfaction should we expect?" The model answers: "Based on what I learned, about 6.30." — it's using the line it learned to make a guess.&lt;/p&gt;

&lt;p&gt;And there you go, your first regression model. Based on what we did, was that difficult? Then what's stopping you from creating more models?&lt;/p&gt;

&lt;p&gt;Leave a comment for critics or anything you have in mind. That's all for now and see you in the next one.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>algorithms</category>
      <category>data</category>
    </item>
    <item>
      <title>The Five Levels Of SQL</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Fri, 02 May 2025 21:02:28 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/the-five-levels-of-sql-5hdd</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/the-five-levels-of-sql-5hdd</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction.&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Hello there data enthusiasts and welcome to yet another post on &lt;strong&gt;SQL&lt;/strong&gt;. Today we are gonna dive into the five levels of &lt;strong&gt;SQL&lt;/strong&gt; and I promise you are gonna love it and learn. So let's do some nerdy stuff. Shall we:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What is &lt;strong&gt;SQL&lt;/strong&gt;:
&lt;/h2&gt;

&lt;p&gt;Before we can get too ahead of ourselves, let's &lt;br&gt;
 develop a deeper more intuitive understanding of &lt;strong&gt;SQL.&lt;/strong&gt; As you might have already heard, &lt;strong&gt;SQL (STRUCTURED QUERY LANGUAGE)&lt;/strong&gt; is the language used to interact with structured relational database engines eg PostgreSQL or MySQL. The interaction with database  using &lt;strong&gt;SQL&lt;/strong&gt; may involve one or all of the following activities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieving records from database.-&amp;gt; SELECT.&lt;/li&gt;
&lt;li&gt;Adding new records to the database. -&amp;gt; INSERT.&lt;/li&gt;
&lt;li&gt;Modifying existing records. -&amp;gt; UPDATE.&lt;/li&gt;
&lt;li&gt;Removing existing records. -&amp;gt; DELETE.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most best database to write SQL queries is PostgreSQL because it is object relational database, it is modern and most importantly it is open source (no license required). Most other databases however like oracle and MySQL may require a license for you to use them. It is best to learn SQL queries from terminal(using psql shell) because if you happen to SSH to a server, you can be able to handle the database effectively unlike when you are used to using GUI related databases. Now let's dive into the 5 levels of SQL query writing.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. The five levels of SQL
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LEVEL ZERO&lt;/strong&gt;&lt;br&gt;
This involves knowing to use the &lt;code&gt;SELECT * FROM table_name;&lt;/code&gt; This is an equivalent of opening an excel sheet or a word document to  get your hands into the data. If you can play around with this query intuitively, then you are on the right track to mastering SQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LEVEL ONE&lt;/strong&gt;&lt;br&gt;
Do you know the rest of the key words like &lt;code&gt;SELECT&lt;/code&gt;, &lt;code&gt;FROM&lt;/code&gt;, &lt;code&gt;WHERE&lt;/code&gt;, &lt;code&gt;GROUP BY&lt;/code&gt;, &lt;code&gt;HAVING&lt;/code&gt;, &lt;code&gt;LIMIT&lt;/code&gt;, &lt;code&gt;ORDER BY&lt;/code&gt; etc. Do you also understand the order of execution of these key words when you combine them in one query which is &lt;code&gt;FROM&lt;/code&gt;, &lt;code&gt;WHERE&lt;/code&gt;, &lt;code&gt;GROUP BY&lt;/code&gt;, &lt;code&gt;HAVING&lt;/code&gt;, &lt;code&gt;SELECT&lt;/code&gt;,&lt;code&gt;ORDER BY&lt;/code&gt;,&lt;code&gt;LIMIT&lt;/code&gt;. See if you understand what is going on in the following sample code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT id,, first_name
FROM person
WHERE country_of_birth = 'Kenya'
GROUP BY gender
HAVING COUNT(*) &amp;gt; 5
ORDER BY id
LIMIT 10; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;LEVEL TWO&lt;/strong&gt;&lt;br&gt;
Can you master joins that you are going to use like the &lt;code&gt;inner join&lt;/code&gt; and &lt;code&gt;left join&lt;/code&gt; which are the most common and the less common include &lt;code&gt;full outer join&lt;/code&gt;. Joins that you are not going to use very much are going to be &lt;code&gt;right join&lt;/code&gt; and  &lt;code&gt;cross join&lt;/code&gt;.example sample query&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT p.first_name, s.salary
FROM person p
INNER JOIN salary s ON p.id = s.person_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also make sure you have a good understanding of common table expressions(CTEs) and more often use those than sub queries eg by using with the &lt;code&gt;WITH&lt;/code&gt; keyword. sample code below. Can you explain to someone what is going on?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH recent_births AS (
  SELECT * FROM person WHERE date_of_birth &amp;gt; '2020-01-01'
)
SELECT * FROM recent_births WHERE gender = 'Female';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;LEVEL THREE&lt;/strong&gt;&lt;br&gt;
You know window functions. IN this case you have a function eg &lt;code&gt;SUM()&lt;/code&gt;, &lt;code&gt;RANK()&lt;/code&gt; or &lt;code&gt;AVG()&lt;/code&gt; and then you have the over clause and you have the window. A window is defined by partition, by order in rows. Also in this level you should be able to tell the difference between a &lt;code&gt;RANK&lt;/code&gt;(skips ranks on ties), &lt;code&gt;DENSE RANK&lt;/code&gt;(no rank gaps on ties) and &lt;code&gt;ROW NUMBER&lt;/code&gt;( unique sequence). For example, do you relate with the following sample code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT id, first_name,
       RANK() OVER (PARTITION BY country_of_birth ORDER BY date_of_birth) AS birth_rank
FROM person;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;LEVEL FOUR&lt;/strong&gt;&lt;br&gt;
Understand table scans to be able to get the right optimization techniques eg TABLE SCANS, INDICES, PARTITIONING etc. For example using EXPLAIN ANALYZE to inspect query plans. eg&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EXPLAIN ANALYZE
SELECT * FROM person WHERE country_of_birth = 'Kenya'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;LEVEL FIVE&lt;/strong&gt;&lt;br&gt;
If you have made it up to this far, you wouldnt be surprised to know that at level five, you should be able to write these queries effectively, debug them for errors without the help of an LLM. Only use LLMs to speed up your query writing process or automate repetitive tasks.&lt;/p&gt;

&lt;p&gt;So at what level of SQL mastery are you at. I'd love to know. Leave a comment below. Hope you enjoyed the read and see you in the next one.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>data</category>
      <category>database</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Mastering Microsoft Excel: From Basics to Advanced Analysis</title>
      <dc:creator>Mendy Kevin</dc:creator>
      <pubDate>Sun, 27 Apr 2025 12:12:54 +0000</pubDate>
      <link>https://forem.com/mendy_kevin_94ec1db73e1df/mastering-microsoft-excel-from-basics-to-advanced-analysis-1cck</link>
      <guid>https://forem.com/mendy_kevin_94ec1db73e1df/mastering-microsoft-excel-from-basics-to-advanced-analysis-1cck</guid>
      <description>&lt;p&gt;Whether you are a data analyst or a regular MS Excel user, you will definitely find this post useful because I will take you through a complete mastery of MS Excel. I will touch on how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Organize and manipulate data&lt;/li&gt;
&lt;li&gt;Perform calculations using formulas and functions&lt;/li&gt;
&lt;li&gt;Visualize data through charts and graphs&lt;/li&gt;
&lt;li&gt;Create interactive dashboards and reports&lt;/li&gt;
&lt;li&gt;Perform statistical, financial, and logical analysis using MS Excel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So buckle up and let's dive right in:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What is Excel?
&lt;/h2&gt;

&lt;p&gt;You may be asking yourself, "What is Excel?" And you're right to. MS Excel is a powerful data analysis tool developed by the Microsoft company. It is used by business analysts, data analysts, project and operations managers, teachers and lecturers, researchers, sales and marketing teams, etc., to help them organize data, perform calculations, visualize information, create interactive dashboards, and conduct statistical analysis. Let's dive into an overview of Microsoft Excel:&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Launching MS Excel&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;MS Excel is primarily a Windows application, so this article is Windows-based. To start MS Excel on Windows, click on the Windows button, type "excel," and click Enter to launch. You will see many templates, but for now, we are working with creating a new blank &lt;strong&gt;workbook&lt;/strong&gt;. When you create a new &lt;strong&gt;workbook&lt;/strong&gt;, you will be redirected to a new window with an empty &lt;strong&gt;worksheet&lt;/strong&gt;. However, if for some reason your application doesn't have MS Excel, you can search for an online version of Excel on your favorite browser, which requires you to have a Microsoft account.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. Key Concepts in Excel&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Workbook&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A workbook is an Excel file that contains one or more worksheets. You can open a workbook in Excel only if it has a &lt;code&gt;.xlsx&lt;/code&gt; extension, which used to be &lt;code&gt;.xls&lt;/code&gt; for older versions of Excel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worksheet&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A worksheet is a single spreadsheet consisting of grids of rows and columns within a workbook. Each worksheet has a name (e.g., Sheet1) and can be renamed to your liking. You can create multiple worksheets by pressing the little '+' button at the bottom of the Excel workbook.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqzts9qssxq88f95sw04z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqzts9qssxq88f95sw04z.png" alt="Image description" width="415" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cell&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's right, a cell. A cell is the basic unit of a worksheet where you enter data. It is identified by a cell reference, such as A1 (Column A, Row 1). A cell reference is generated according to the current active cell.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Types&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;General, Number, Currency, Date, Time, Scientific, Fraction, etc. You can change the type of data stored in multiple cells by highlighting them, going to the Home tab, where you will find the data type dropdown as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5lmo1thm8ty287k8zyy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5lmo1thm8ty287k8zyy.png" alt="Image description" width="231" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Range&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A range is a group of two or more cells. It is used to define the group of data you would like to work on. A range can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vertical:&lt;/strong&gt; &lt;code&gt;{A1:A5}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal:&lt;/strong&gt; &lt;code&gt;{A1:E1}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rectangular block:&lt;/strong&gt; &lt;code&gt;{A1:C5}&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Formula&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Formulas in Excel are user-defined mathematical equations (meaning they start with an '=' sign) that the user employs to perform mathematical calculations, e.g., &lt;code&gt;=A1+B1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Functions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Functions are also mathematical tools; however, instead of being user-defined, they are built-in Excel functions. &lt;br&gt;
&lt;strong&gt;NB&lt;/strong&gt;: They also start with an '=' sign, e.g., &lt;code&gt;=SUM(A2:A6)&lt;/code&gt; where &lt;code&gt;A2:A6&lt;/code&gt; is the range, and the built-in function &lt;strong&gt;SUM&lt;/strong&gt; returns the sum of all the values in the given range. Other important functions in Excel include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AVERAGE&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=AVERAGE(C2:C6)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;COUNT&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=COUNT(B2:B6)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;MAX AND MIN&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=MAX(B2:B6) or =MIN(B2:B6)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More advanced functions include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CONCATENATE&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Combines text from multiple cells into one, e.g.,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=CONCATENATE(A2, " ", B2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPPER()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;→ makes all text uppercase&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOWER()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;→ makes all text lowercase&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PROPER()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;→ capitalizes first letters (e.g., “John Smith”)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LEFT&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Extracts a specific number of characters from the beginning of a string. E.g.,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=LEFT(A2, 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;returns the first 3 characters of cell A2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IF, IFS, and SWITCH&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;IF:&lt;/strong&gt; Used to make a decision based on true or false. Example:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IF(C2&amp;gt;=50, "Pass", "Fail")
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;IFS:&lt;/strong&gt; Used to make a decision based on multiple statements.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IFS(B2&amp;gt;=90, "A", B2&amp;gt;=80, "B", B2&amp;gt;=70, "C", B2&amp;gt;=60, "D", TRUE, "F")
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SWITCH:&lt;/strong&gt; Used to compare one value or expression against a list of values and returns the matching result, e.g.,&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=SWITCH(A2, "HR", "Human Resources", "IT", "Information Technology", "FIN", "Finance", "Unknown")
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Returns "Information Technology" if A2 is "IT".&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;XLOOKUP&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=XLOOKUP(lookup_value, lookup_array, return_array)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;E.g.:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=XLOOKUP("Pen", A2:A6, B2:B6)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With those key concepts, let's now look at the key data operations in Excel, like &lt;strong&gt;Data filtering&lt;/strong&gt;, &lt;strong&gt;sorting&lt;/strong&gt;, &lt;strong&gt;validation&lt;/strong&gt;, &lt;strong&gt;cleaning&lt;/strong&gt;, &lt;strong&gt;visualization&lt;/strong&gt;, and &lt;strong&gt;formatting&lt;/strong&gt;. We will also touch on &lt;strong&gt;Pivot Tables&lt;/strong&gt; and &lt;strong&gt;Pivot Charts&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. DATA OPERATIONS IN EXCEL&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DATA FILTERING and SORTING&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Filtering allows you to display only rows that meet specific criteria. Sorting helps arrange your data in a meaningful order, e.g., in ascending or descending order.&lt;/p&gt;

&lt;p&gt;There are two ways of doing this: the table way and the un-tabled format way:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TABLE WAY:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This brings us to the concept of a table in Excel. To create a table in Excel is easier than you might think. First, you might want to select all the data you want to put in a table. Now, there are two ways of selecting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click and drag your mouse, which is the easiest for small datasets.&lt;/li&gt;
&lt;li&gt;Select the first row using your mouse and then use Shift + Down Arrow Keys or Ctrl + Click your mouse for advanced selective selections.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once your data is selected, you can right-click, then click on "Table," then "Create Table."&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcneccaxshga8hztqjcij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcneccaxshga8hztqjcij.png" alt="Image description" width="490" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Or, you could go to the &lt;code&gt;Insert&lt;/code&gt; tab, select "Table," and voila!&lt;/p&gt;

&lt;p&gt;You have a table you can use to filter and sort your data to your liking, only if you have auto-filter enabled for your table on the Home tab in MS Excel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RANGE WAY:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Select the column data you want to sort or filter as explained earlier. Then, on the Home tab, you can find "Sort &amp;amp; Filter" buttons where you can sort and filter your data according to your preferences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DATA VALIDATION&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Restrict user input to only allow "Yes," "No," or "Maybe" in selected cells.&lt;/p&gt;

&lt;p&gt;Steps in Excel:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Select the cells where the user should input a response (e.g., C2:C20).&lt;/li&gt;
&lt;li&gt; Go to the &lt;strong&gt;Data&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt; Click on &lt;strong&gt;Data Validation&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; In the dialog box, under the &lt;strong&gt;Settings&lt;/strong&gt; tab:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Allow:&lt;/strong&gt; List&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source:&lt;/strong&gt; Yes,No,Maybe
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxxougaggd81o0sph0dc.png" alt="Image description" width="509" height="387"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; Under the &lt;strong&gt;Input Message&lt;/strong&gt; tab:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Title:&lt;/strong&gt; Response Required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input message:&lt;/strong&gt; Please select Yes, No, or Maybe from the list.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbwb6dgc789j48o9ahz79.png" alt="Image description" width="509" height="387"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; Under the &lt;strong&gt;Error Alert&lt;/strong&gt; tab:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Style:&lt;/strong&gt; Stop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Title:&lt;/strong&gt; Invalid Entry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error message:&lt;/strong&gt; Only Yes, No, or Maybe are allowed.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxjmqo555t5gpiejvez8.png" alt="Image description" width="509" height="387"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Users will now see a dropdown list in each selected cell and can only choose "Yes," "No," or "Maybe," preventing invalid data entries.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxlblevvqo06m7w42vly.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxlblevvqo06m7w42vly.png" alt="Image description" width="168" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DATA CLEANING&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NB&lt;/strong&gt;: Before cleaning your data, make sure you study it well enough so that you do not remove &lt;strong&gt;nulls&lt;/strong&gt; or &lt;strong&gt;duplicates&lt;/strong&gt; of crucial data that will be needed for analysis. Use your wisdom and knowledge as a data analyst to clean your data effectively.&lt;/p&gt;

&lt;p&gt;a. &lt;strong&gt;DUPLICATES&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's how you remove duplicates in Excel:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Select your data (click and drag to highlight the range, or click on any cell in the range if it's a full table).&lt;/li&gt;
&lt;li&gt; Go to the &lt;strong&gt;Data&lt;/strong&gt; tab on the ribbon.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Remove Duplicates&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; A popup appears—check the columns you want to check for duplicates.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Remove Duplicates&lt;/strong&gt;.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkuq9lic5xzmskt7awx21.png" alt="Image description" width="800" height="457"&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;b. &lt;strong&gt;NULLS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's say you want to replace nulls in Excel using the mode:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Go to &lt;strong&gt;File&lt;/strong&gt; &amp;gt; &lt;strong&gt;Options&lt;/strong&gt; &amp;gt; &lt;strong&gt;Add-ins&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; At the bottom, choose &lt;strong&gt;Excel Add-ins&lt;/strong&gt;, click &lt;strong&gt;Go&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Check &lt;strong&gt;Analysis ToolPak&lt;/strong&gt;, then click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Go to the &lt;strong&gt;Data&lt;/strong&gt; tab and click &lt;strong&gt;Data Analysis&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Choose &lt;strong&gt;Descriptive Statistics&lt;/strong&gt;, then click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Select your data range, choose where to output the results, and check &lt;strong&gt;Summary statistics&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You could also use the formula &lt;code&gt;=MODE.SNGL(A2:A100)&lt;/code&gt;, which returns the mode, but make sure you select the correct range.&lt;/p&gt;

&lt;p&gt;After finding the mode, we need to replace the nulls with the mode value:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Click on the header of the column where the blanks are.&lt;/li&gt;
&lt;li&gt; Open &lt;strong&gt;Find and Replace&lt;/strong&gt; (Ctrl + H or Home &amp;gt; Find &amp;amp; Select &amp;gt; Replace).&lt;/li&gt;
&lt;li&gt; In &lt;strong&gt;Find what:&lt;/strong&gt; leave it completely blank (do NOT type a space).&lt;/li&gt;
&lt;li&gt; In &lt;strong&gt;Replace with:&lt;/strong&gt; type the mode value.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Replace All&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;c. &lt;strong&gt;DATA TYPES&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Make sure you select the correct data types for the columns, which is part of data cleaning.&lt;/p&gt;

&lt;p&gt;d. &lt;strong&gt;Conditional Formatting&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Highlight anomalies or errors visually: &lt;br&gt;
Go to &lt;strong&gt;Home&lt;/strong&gt; &amp;gt; &lt;strong&gt;Conditional Formatting&lt;/strong&gt;. Example: Highlight cells with text in a numbers column, or highlight duplicates.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkjxj4y80adxp8kk84uw5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkjxj4y80adxp8kk84uw5.png" alt="Image description" width="510" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. &lt;strong&gt;Data Visualization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We do data visualization using charts and graphs. The general way to insert a chart for your data is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Select Your Data (include headers, as they act as labels for charts).&lt;/li&gt;
&lt;li&gt; Go to the &lt;strong&gt;Insert&lt;/strong&gt; Tab.&lt;/li&gt;
&lt;li&gt; Choose a Chart Type you need.&lt;/li&gt;
&lt;li&gt; Customize the Chart (change colors or chart styles, add titles, data labels, or legends, switch rows and columns if the chart isn't displayed as expected, move and resize the chart).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are various types of visualization templates, but the most common include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Line charts:&lt;/strong&gt; Great for trends over time.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6w0o1j84x1p3fl4ni0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6w0o1j84x1p3fl4ni0y.png" alt="Image description" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Column charts and bar charts:&lt;/strong&gt; Good for comparing values.&lt;br&gt;
&lt;strong&gt;Column chart:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygfqf034rae609be4hpw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygfqf034rae609be4hpw.png" alt="Image description" width="800" height="428"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Bar chart:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjsdekbuzpbtvhucmtqhk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjsdekbuzpbtvhucmtqhk.png" alt="Image description" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pie chart:&lt;/strong&gt; Shows parts of a whole.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2157r7h2rg6qsu5g3m6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2157r7h2rg6qsu5g3m6.png" alt="Image description" width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scatter Plot:&lt;/strong&gt; For relationships between two variables, e.g., weight and height.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iwumyt0jcyhuxdmc3p9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iwumyt0jcyhuxdmc3p9.png" alt="Image description" width="538" height="376"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Area chart:&lt;/strong&gt; Perfect for showing how values change over time.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwc7f56nlerlnhgwwzgd1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwc7f56nlerlnhgwwzgd1.png" alt="Image description" width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;f. &lt;strong&gt;PIVOT TABLES&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pivot tables are a powerful tool in Excel for summarizing and analyzing large datasets. They allow you to extract meaningful insights by grouping and aggregating data based on different fields.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cx2m1lheub4j5h5b9k9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cx2m1lheub4j5h5b9k9.png" alt="Image description" width="757" height="103"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;g. &lt;strong&gt;Creating a Pivot Table:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Select the data range&lt;/strong&gt; you want to analyze. Ensure your data has clear headers.&lt;/li&gt;
&lt;li&gt; Go to the &lt;strong&gt;Insert&lt;/strong&gt; tab and click &lt;strong&gt;PivotTable&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; In the "Create PivotTable" dialog box, confirm the data range and choose where you want to place the pivot table (a new worksheet is usually recommended). Click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejhag31xyxbvfaaef7n8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejhag31xyxbvfaaef7n8.png" alt="Image description" width="489" height="641"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; The "Pivot Table Fields" pane will appear on the right. Drag and drop fields into the four areas:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffoi0npzbow0x0drplczk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffoi0npzbow0x0drplczk.png" alt="Image description" width="771" height="704"&gt;&lt;/a&gt;&lt;br&gt;
    * &lt;strong&gt;Rows:&lt;/strong&gt; Fields placed here will appear as row labels in the pivot table.&lt;br&gt;
    * &lt;strong&gt;Columns:&lt;/strong&gt; Fields placed here will appear as column labels.&lt;br&gt;
    * &lt;strong&gt;Values:&lt;/strong&gt; Fields placed here will be aggregated (e.g., summed, averaged, counted). Excel will often automatically suggest an aggregation.&lt;br&gt;
    * &lt;strong&gt;Filters:&lt;/strong&gt; Fields placed here can be used to filter the entire pivot table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits of Pivot Tables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Easy Summarization:&lt;/strong&gt; Quickly summarize large amounts of data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility:&lt;/strong&gt; Easily rearrange fields to view data from different perspectives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calculations:&lt;/strong&gt; Perform calculations like sums, averages, counts, percentages, etc., without writing formulas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grouping:&lt;/strong&gt; Group data by date, time, or other categories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filtering:&lt;/strong&gt; Filter data to focus on specific subsets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;h. &lt;strong&gt;PIVOT CHARTS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A pivot chart is a dynamic chart that is directly connected to a pivot table. It provides a visual representation of the summarized data in the pivot table, making it easier to understand trends and patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating a Pivot Chart:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Create a pivot table&lt;/strong&gt; first (as described above).&lt;/li&gt;
&lt;li&gt; Select any cell within your created pivot table.&lt;/li&gt;
&lt;li&gt; Go to the &lt;strong&gt;PivotTable Analyze&lt;/strong&gt; (or &lt;strong&gt;Options&lt;/strong&gt; in older versions) tab.&lt;/li&gt;
&lt;li&gt; In the &lt;strong&gt;Tools&lt;/strong&gt; group, click &lt;strong&gt;PivotChart&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Choose the desired chart type (e.g., column, bar, line).&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pivot chart will automatically display the data from your pivot table. As you manipulate the fields in the pivot table, the pivot chart will update dynamically to reflect those changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzlowh49bug1yo8adrsv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzlowh49bug1yo8adrsv.png" alt="Image description" width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You could also add a slicer for a more dedicated and easy viewing of the pivot chart as shown on the left of the pivot chart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits of Pivot Charts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual Summarization:&lt;/strong&gt; Provides a visual representation of pivot table data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactivity:&lt;/strong&gt; Changes in the pivot table are immediately reflected in the chart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Understanding:&lt;/strong&gt; Makes it easier to identify trends and patterns in summarized data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Reporting:&lt;/strong&gt; Creates dynamic reports where the visuals change based on data manipulation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By leveraging Pivot Tables and Pivot Charts, you can significantly enhance your data analysis capabilities in Excel, allowing for more insightful and dynamic reporting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, you are good to go in excel. Hope you enjoyed the post. Leave a comment for critics or anything you might have in mind. Thank you.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>analytics</category>
      <category>data</category>
    </item>
  </channel>
</rss>
