<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Paul Karikari</title>
    <description>The latest articles on Forem by Paul Karikari (@paulkarikari).</description>
    <link>https://forem.com/paulkarikari</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F329686%2F5828c24f-8fdd-42fc-a4f3-81dbf9adf741.jpeg</url>
      <title>Forem: Paul Karikari</title>
      <link>https://forem.com/paulkarikari</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/paulkarikari"/>
    <language>en</language>
    <item>
      <title>When Software Is Not Soft Anymore: The Nature of Software Complexity.</title>
      <dc:creator>Paul Karikari</dc:creator>
      <pubDate>Thu, 23 Apr 2020 15:43:12 +0000</pubDate>
      <link>https://forem.com/paulkarikari/when-software-is-not-soft-anymore-the-nature-of-software-complexity-5fc4</link>
      <guid>https://forem.com/paulkarikari/when-software-is-not-soft-anymore-the-nature-of-software-complexity-5fc4</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;One morning you show up to work and your boss walked to you and asked you to implement a feature that is in high demand by clients of an application you’ve developed with your team sometime in the past. &lt;br&gt;
As usual, your boss asked your estimated time of delivery and with much enthusiasm, you gave a couple of weeks (assuming 3weeks) for the feature to be completed.&lt;br&gt;
A week passed by and you’ve not been able to add any substantial contribution to the project, you spend more time than usual staring at your screen and alternating between multiple opened tabs.&lt;br&gt;
Managers keep asking for updates on your progress but you can barely give meaningful feedback. All you could say is “I’m working on it”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AkG-89irfL6FwQN9MlUzRlw.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AkG-89irfL6FwQN9MlUzRlw.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You realized it’s now more difficult to add new features or make changes than when the project was freshly being developed in the past. But wait! Isn’t Software supposed to be soft?&lt;br&gt;
 As in, shouldn’t software be able to be reshaped into any form by making changes or adding features easily?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AhkhljBUO2G7ziBI-" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AhkhljBUO2G7ziBI-"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since the early days, Software has been hailed for the possible changes that can be made to it as compared to hardware which is impossible to change once manufactured and has to be replaced when changes are required. The possibility of making changes to software when requirement demands it has always been the selling point for software for as long as we know, yet there comes a time when making changes to software seems close to impossible and sometimes the software project has to be rewritten from scratch just like how hardware has to be replaced when changes are required.&lt;br&gt;
This hindrance to adding features or making changes to software usually comes at a cost, which is unexpected considering the fact that it is well known that software should allow changes to be made when needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AWkZJk4p-yR3ZJk0t" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AWkZJk4p-yR3ZJk0t" alt="[https://www.computerhistory.org/revolution/birth-of-the-computer/4/78](https://www.computerhistory.org/revolution/birth-of-the-computer/4/78)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes adding features or changes to software more difficult?
&lt;/h2&gt;

&lt;p&gt;Now, what causes this hindrance? what makes adding features more difficult?&lt;br&gt;
The &lt;strong&gt;complexity&lt;/strong&gt; of software is what makes it difficult to make changes to it.&lt;br&gt;
This comes in many forms and almost every developer experiences it at some point in their career.&lt;br&gt;
The ability to recognize and reduce &lt;strong&gt;complexity&lt;/strong&gt; in software is a very important skill for a software developer and that’s what distinguishes a great developer from the others.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding software complexity.
&lt;/h2&gt;

&lt;p&gt;Complexity in software can be defined as &lt;strong&gt;anything related to the structure of the software system that makes it difficult to understand and modify.&lt;/strong&gt;&lt;br&gt;
The various forms of software complexity are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Difficulty in understanding what a piece of code does.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Takes too much effort to make small improvement or it’s unclear which part of the system to modify in order to make a small improvement.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When fixing one bug introduces or creates another bug.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When software is difficult to understand and modify it is considered as complicated or &lt;strong&gt;complex&lt;/strong&gt; but when it’s easy to understand and modify, then it’s &lt;strong&gt;simple&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Size Doesn’t Matter.
&lt;/h3&gt;

&lt;p&gt;The word &lt;strong&gt;Complex&lt;/strong&gt; is often used to describe large software systems with very sophisticated features but for the purpose of this article when a large system is easy to understand and modify it’s not complex.&lt;br&gt;
In other words, when a small software system is difficult to understand and takes too much effort to modify then it’s considered as complex.&lt;br&gt;
Complexity is what you face as a developer at a particular time when you are trying to achieve a goal. It doesn’t relate to the overall size or functionality of the software system.&lt;/p&gt;

&lt;h3&gt;
  
  
  You Read More Code Than You Write.
&lt;/h3&gt;

&lt;p&gt;If you’ve been in software development for a while I bet you’ve already come to the realization that developers read more code than they write.&lt;br&gt;
The complexity of a piece of code is more obvious to readers of your code. If your own code seems simple to you but others find it difficult to understand then it’s complex. Your job as a developer is not just writing code that works but also to write code that others can understand and work with easily&lt;/p&gt;

&lt;h2&gt;
  
  
  Attributes Of Software Complexity
&lt;/h2&gt;

&lt;p&gt;Generally, complexity manifests in three ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change amplification&lt;/strong&gt;: This happens when multiple parts of a software system have to be modified to satisfy one simple requirement.&lt;br&gt;
eg. Consider a web application that has multiple frontend templates with colors defined as inline CSS on each page. When the theme or color palettes of the web application change, all other pages have to be updated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cognitive Load&lt;/strong&gt;: This refers to the amount of information a developer has to know about the system in order to modify it. A system that requires a developer to spend more time learning a lot of information before accomplishing a task is said to have a high cognitive load. This can lead to unwanted bugs when a developer misses some vital information about the software system.&lt;br&gt;
eg. Using a resource-intensive class that has no in-built mechanism to free the acquired resources but expects the developer to know when to free those resources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unknown unknowns&lt;/strong&gt;: This is when a developer doesn’t know which part of the software system to modify or doesn’t know the information needed to accomplish a task.&lt;br&gt;
eg. When a developer is tasked to make changes to a system that uses a library that is not well documented makes it very difficult for the developer to complete the task assigned.&lt;/p&gt;

&lt;h2&gt;
  
  
  Causes of Complexity
&lt;/h2&gt;

&lt;p&gt;Complexity is mostly caused by &lt;strong&gt;dependencies&lt;/strong&gt; and &lt;strong&gt;obscurity.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;: &lt;br&gt;
A dependency exists when a piece of code can’t work in isolation but depends on other pieces of code or other parts of the software system to function properly.&lt;br&gt;
Technically, Anything that a piece of software requires to do what it is intended to do can be classified as a dependency. &lt;br&gt;
Dependencies are inevitable and are used in almost any software system.&lt;br&gt;
Whenever you call a function in your code, you create a dependency between your code and the implementation of that function. When a new parameter is added to the function or there are some changes in the implementation of the function, it affects your code directly or indirectly (remember &lt;em&gt;Change amplification ?).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Obscurity&lt;/strong&gt;:&lt;br&gt;
This happens when an important piece of information is not obvious. eg Not using meaningful variable or function names or the documentation of a function is very lacking and does not state the information needed to use the function properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complexity is Incremental
&lt;/h3&gt;

&lt;p&gt;Complexity doesn’t just happen. It accumulates over a period of time.&lt;br&gt;
This happens because many small dependencies and obscurities build up over time. Eventually, this makes it difficult to understand and modify the software system. Tasks that suppose to take little time to accomplish will then take too much time to complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  It’s All about complexity
&lt;/h2&gt;

&lt;p&gt;There has been a wave of ideas, experienced engineers, and industry experts that have given talks, authored books, and other resources with one common goal, to help other developers build maintainable software systems.&lt;br&gt;
Most of the ideas shared are techniques or methods to reduce complexity in software systems.&lt;br&gt;
Ideas like &lt;a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself" rel="noopener noreferrer"&gt;DRY&lt;/a&gt;, &lt;a href="https://martinfowler.com/bliki/Yagni.html" rel="noopener noreferrer"&gt;YAGNI&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/SOLID" rel="noopener noreferrer"&gt;SOLID&lt;/a&gt;, books like clean code and clean architecture by Robert C. Martin and Refactoring by Martin Fowler and many more have one thing in common, that is building software that is easy to understand and to modify.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;A combination of dependency and obscurity leads to complexity in software systems which manifests in the form of change amplification, cognitive load, and unknown unknowns.&lt;br&gt;
Building software systems with the intensions of making them easy to understand and modify can yield long term benefits in the future which might not be obvious at the beginning of a software project but it’s worth putting in the extra effort of making it as simple as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A Philosophy of Software Design by John Ousterhout&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Clean Code Robert C. Martin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Clean Architecture: A Craftsman’s Guide to Software Structure and Design by Robert C. Martin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Refactoring by Martin Fowler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Code Simplicity: The Fundamentals of Software by Max Kanat-Alexander&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>codenewbie</category>
      <category>codequality</category>
      <category>leadership</category>
      <category>career</category>
    </item>
    <item>
      <title>Build, Train and Deploy Tensorflow Deep Learning Models on Amazon SageMaker: A Complete Workflow Guide.</title>
      <dc:creator>Paul Karikari</dc:creator>
      <pubDate>Wed, 15 Apr 2020 21:58:54 +0000</pubDate>
      <link>https://forem.com/paulkarikari/build-train-and-deploy-tensorflow-deep-learning-models-on-amazon-sagemaker-a-complete-workflow-guide-495i</link>
      <guid>https://forem.com/paulkarikari/build-train-and-deploy-tensorflow-deep-learning-models-on-amazon-sagemaker-a-complete-workflow-guide-495i</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Machine learning(ML) projects mostly follow a workflow that involves generating example data, training a model and deploying the model.&lt;br&gt;
These steps have subtasks and are iterative.&lt;br&gt;
More often ML engineers and data scientists need an environment where they can experiment and prototype ideas faster.&lt;br&gt;
After prototyping, deploying and scaling machine learning models is also a mystery that is known to few.&lt;/p&gt;

&lt;p&gt;It will be ideal and convenient if without any tiresome setup ML engineers and data scientists can easily go from experimentation or prototyping to deploying production-ready and scalable ML models. This is where &lt;strong&gt;Amazon SageMaker&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AeH3exQLyFHE8Swdp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AeH3exQLyFHE8Swdp.png" alt="machine-learning workflow"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What is Amazon SageMaker:
&lt;/h2&gt;

&lt;p&gt;Sagemaker was built to provide a platform to support the development and deployment of machine learning models.&lt;br&gt;
Quoting from the official website:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.&lt;br&gt;
 Traditional ML development is a complex, expensive, iterative process made even harder because there are no integrated tools for the entire machine learning workflow. You need to stitch together tools and workflows, which is time-consuming and error-prone. SageMaker solves this challenge by providing all of the components used for machine learning in a single toolset so models get to production faster with much less effort and at lower cost.&lt;br&gt;
&lt;em&gt;source: &lt;a href="https://aws.amazon.com/sagemaker/" rel="noopener noreferrer"&gt;https://aws.amazon.com/sagemaker/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Features of Amazon SageMaker:&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Sagemaker Provides customizable Amazon ML instances with developer-friendly notebook environment preloaded with ML frameworks and libraries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Seamless integration with AWS storage services such as( s3, RDS DynamoDB, Redshift, etc) for analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SageMaker provides 15+ most commonly used &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html" rel="noopener noreferrer"&gt;ML algorithms&lt;/a&gt; and also supports building custom algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AJgyk2EiCrwtl2FSzCPNs6w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AJgyk2EiCrwtl2FSzCPNs6w.png" alt="SageMaker features"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  How SageMaker Works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AgXfeKtFap01dpxVk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F0%2AgXfeKtFap01dpxVk.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To train models on sagemaker , you will have to create a training job by specifying the path to your training data on s3, the training script or built-in algorithm and the EC2 container for training.&lt;/p&gt;

&lt;p&gt;After training, the Model artifacts are uploaded to s3. From this artifact, a model can be created and deployed on EC2 containers with endpoint configuration for prediction or inference.&lt;/p&gt;
&lt;h2&gt;
  
  
  What we will build
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we will build an ML model to predict the sentiment of a text.&lt;br&gt;
The details of processing the data and building the model are well explained in my previous &lt;a href="https://medium.com/datadriveninvestor/deep-learning-lstm-for-sentiment-analysis-in-tensorflow-with-keras-api-92e62cde7626" rel="noopener noreferrer"&gt;tutorial&lt;/a&gt;. We will focus on training and deploying the model on Amazon Sagemaker. &lt;br&gt;
Optionally, I accompanied this tutorial with a &lt;a href="https://github.com/paulkarikari/sentiment_analysis_deployment/blob/master/sentiment_analysis_tensorflow_keras_LSTM.ipynb" rel="noopener noreferrer"&gt;complete notebook&lt;/a&gt; to upload in your Sagemaker notebook instance to run alongside this tutorial if you want.&lt;/p&gt;

&lt;p&gt;We are building a custom model and it’s much more convenient to use the &lt;a href="https://sagemaker.readthedocs.io/en/stable/index.html" rel="noopener noreferrer"&gt;sagemaker python SDK&lt;/a&gt; for training and deploying the model.&lt;br&gt;
Same tasks can be accomplished by using the web UI of sagemaker, mostly when using built-in algorithms.&lt;/p&gt;
&lt;h3&gt;
  
  
  Steps:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Step 1: Create an Amazon S3 Bucket&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 2: Create an Amazon SageMaker Notebook Instance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 3: Create a Jupyter Notebook&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 4: Download, Explore, and Transform the Training Data (refer to the previous &lt;a href="https://medium.com/datadriveninvestor/deep-learning-lstm-for-sentiment-analysis-in-tensorflow-with-keras-api-92e62cde7626" rel="noopener noreferrer"&gt;tutorial&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 5: Train a Model&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 6: Deploy the Model to Amazon SageMaker&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 7: Validate the Model&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 8: Integrating Amazon SageMaker Endpoints into Internet-facing Applications&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step 9: Clean Up&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Create an Amazon s3 Bucket:
&lt;/h3&gt;

&lt;p&gt;First, we create an s3 bucket. This is where we will store the training data and also where the model artifacts will be saved later.&lt;br&gt;
Create a bucket called tensorflow_sentiment_analysis&lt;/p&gt;
&lt;h3&gt;
  
  
  Create an Amazon SageMaker Notebook Instance:
&lt;/h3&gt;

&lt;p&gt;Go to Sagemaker in the AWS console on the left panel click on &lt;em&gt;Notebook instance (1) *and then click on create *Notebook instance (2)&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3472%2F1%2AcG8NIywcetTbjk0l9HCMbQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3472%2F1%2AcG8NIywcetTbjk0l9HCMbQ.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;on the next page enter the name of the notebook, any name of your choice will work. You can leave the rest as default for the purpose of this tutorial. After that click on create notebook instance&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AEqTCgsP8fbg_gg07m4vgZw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AEqTCgsP8fbg_gg07m4vgZw.png" alt="Creating a notebook instance"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The notebook will be created and the status will be pending for a short period of time and then will switch to InService. At this stage, you can click on open Jupyter or Open Jupyter lab. The difference between the two are differences in UI.&lt;br&gt;
I prefer to use Jupyter lab because it has file explorer and supports multiple tabs for opened files and also feels more like an IDE&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3280%2F1%2AU5sP3Hq5HHFNkMRFUeYCDA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3280%2F1%2AU5sP3Hq5HHFNkMRFUeYCDA.png" alt="Pending status"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2768%2F1%2AluUaJwx-zMUONyL1_8DXbA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2768%2F1%2AluUaJwx-zMUONyL1_8DXbA.png" alt="InService status"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Download, Explore, and Transform the Training Data
&lt;/h3&gt;

&lt;p&gt;Download the &lt;a href="https://www.kaggle.com/crowdflower/twitter-airline-sentiment" rel="noopener noreferrer"&gt;dataset&lt;/a&gt; and upload it to your notebook instance. Refer to this &lt;a href="https://medium.com/datadriveninvestor/deep-learning-lstm-for-sentiment-analysis-in-tensorflow-with-keras-api-92e62cde7626" rel="noopener noreferrer"&gt;tutorial&lt;/a&gt; for the explanation of the exploration and transformation of data.&lt;/p&gt;

&lt;p&gt;The data is transformed and saved into s3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A-iMWq64MDPbe3w28dGoCxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A-iMWq64MDPbe3w28dGoCxg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before you can use the sagemaker SDK API you have to create a session,&lt;br&gt;
then you call the upload_data with the name of the data and key prefix which is the path of the s3 bucket.&lt;br&gt;
This returns the complete s3 path of the data file. You can query to verify as shown above.&lt;/p&gt;
&lt;h3&gt;
  
  
  Training the Model
&lt;/h3&gt;

&lt;p&gt;To Train a TensorFlow model you have to use TensorFlow estimator from the sagemaker SDK&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AFnbw0g2aBwHda-E7HcWuMg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AFnbw0g2aBwHda-E7HcWuMg.png" alt="TensorFlow estimator"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;**&lt;em&gt;entry_point&lt;/em&gt;: **This is the script for defining and training your model. This script will be run in a container. (more on this later)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;role:&lt;/em&gt;&lt;/strong&gt; The role assigned to the running notebook. you get that by running the coderole = sagemaker.get_execution_role()&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;train_instance_count&lt;/em&gt;&lt;/strong&gt;: The number of container instances to spin up for training the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;train_instance_type:&lt;/em&gt;&lt;/strong&gt; The &lt;a href="https://aws.amazon.com/sagemaker/pricing/instance-types/" rel="noopener noreferrer"&gt;instance type&lt;/a&gt; of container to be used for training the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;framwork_version:&lt;/em&gt;&lt;/strong&gt; TensorFlow version used in the training script. you get that by running tf_version = tf.&lt;strong&gt;version&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;py_version&lt;/em&gt;:&lt;/strong&gt; Python version used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;script_mode&lt;/em&gt;:&lt;/strong&gt; If set to True the estimator will use the Script Mode containers (default: False). This will be ignored if py_version is set to ‘py3’.&lt;br&gt;
This allows for running arbitrary script code in a container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;hyperparameters&lt;/em&gt;&lt;/strong&gt;: The are parameters needed to run the training script.&lt;/p&gt;

&lt;p&gt;Now that you know what each parameter means, let’s understand the content of the training script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;writefile&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;

  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;
  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;
  &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.preprocessing.text&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Tokenizer&lt;/span&gt;
  &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.preprocessing.sequence&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pad_sequences&lt;/span&gt;
  &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Sequential&lt;/span&gt;
  &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LSTM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Dense&lt;/span&gt;
  &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dropout&lt;/span&gt;
  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# hyperparameters sent by the client are passed as command-line    arguments to the script.
&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;pochs&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;learning&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gpu&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                      &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SM_NUM_GPUS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# input data and model directories
&lt;/span&gt;     &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                     &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SM_MODEL_DIR&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

     &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SM_CHANNEL_TRAIN&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

     &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_known_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

     &lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;
     &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;
     &lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;
     &lt;span class="n"&gt;gpu_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gpu_count&lt;/span&gt;
     &lt;span class="n"&gt;model_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_dir&lt;/span&gt;
     &lt;span class="n"&gt;training_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;

     &lt;span class="n"&gt;training_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;training_dir&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;training_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
     &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;training_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;airline_sentiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

     &lt;span class="n"&gt;num_of_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;
     &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_of_words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_on_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="n"&gt;vocab_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;# 1 is added due to 0 index
&lt;/span&gt;
     &lt;span class="n"&gt;tweet_sequence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;texts_to_sequences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="n"&gt;max_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
     &lt;span class="n"&gt;padded_tweet_sequence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pad_sequences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet_sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                     &lt;span class="n"&gt;maxlen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="c1"&gt;# Build the model
&lt;/span&gt;     &lt;span class="n"&gt;embedding_vector_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;
     &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
     &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vocab_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding_vector_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   
                                            &lt;span class="n"&gt;input_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_len&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
     &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
     &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LSTM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
     &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
     &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="n"&gt;sigmoid&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
     &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="n"&gt;binary_crossentropy&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="n"&gt;adam&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                               &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; 

     &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;padded_tweet_sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;validation_split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    
                        &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

     &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;saved_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;simple_save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
     &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
     &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
     &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first line is a command to write the content of the cell to a file train.py.&lt;/p&gt;

&lt;p&gt;Because SageMaker imports your training script, you should put your training code in a main guard (if &lt;strong&gt;name&lt;/strong&gt;=='&lt;strong&gt;main&lt;/strong&gt;':) so that SageMaker does not inadvertently run your training code at the wrong point in execution.&lt;/p&gt;

&lt;p&gt;All hyperparameters are passed to the script as command-line arguments.&lt;br&gt;
The training script also gets access to environment variables in the training container instance. Such as the following&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SM_MODEL_DIR: A string that represents the path where the training job writes the model artifacts to. After training, artifacts in this directory are uploaded to S3 for model hosting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SM_NUM_GPUS: An integer representing the number of GPUs available to the host.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SM_CHANNEL_XXXX: A string that represents the path to the directory that contains the input data for the specified channel. For example, if you specify two input channels in the Tensorflow estimator’s fit call, named ‘train’ and ‘test’, the environment variables SM_CHANNEL_TRAIN and SM_CHANNEL_TEST are set.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The midsection of the script is the usual model definition and training.&lt;br&gt;
The last part of the saves the model artifacts to the s3 path provided. Take note of how the path is created by appending a numeric to it.&lt;/p&gt;

&lt;p&gt;To start training call the fit method and pass the training data path to it to start training. This creates a training job on sagemaker. You can check the training jobs section to see the job created.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2Amgs7iiVBzGxe6XAhkzcIyA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2Amgs7iiVBzGxe6XAhkzcIyA.png" alt="Start Training"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3838%2F1%2ABNhyQdBA9eFBvGY9ETx_fA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3838%2F1%2ABNhyQdBA9eFBvGY9ETx_fA.png" alt="Training Job in progress"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If everything goes well you should see the output below at the last section of the output logs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2494%2F1%2AEDdEkvMRDAJJNCi0gmzqAQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2494%2F1%2AEDdEkvMRDAJJNCi0gmzqAQ.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy the Model to Amazon SageMaker
&lt;/h3&gt;

&lt;p&gt;To deploy we call the deploy method on the estimator by passing the following parameters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;initial_instance_count:&lt;/em&gt;&lt;/strong&gt; The initial number of inference instance to lunch.&lt;br&gt;
This can be scaled up if the request load increases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;instance_type:&lt;/em&gt;&lt;/strong&gt; The instance type for the inference container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;endpoint_name:&lt;/em&gt;&lt;/strong&gt; A unique name for the model endpoint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AIp00HazJKdG8V5_LeT-OHA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AIp00HazJKdG8V5_LeT-OHA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Validating the Model
&lt;/h3&gt;

&lt;p&gt;After calling the deploy method, the endpoint for the model is returned and this can be used to validate the model by using test data as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2106%2F1%2AkdSGr6FcXXnf512Yr1FSMQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2106%2F1%2AkdSGr6FcXXnf512Yr1FSMQ.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2380%2F1%2A1IpefzCURpeOPdKJtDbeWA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2380%2F1%2A1IpefzCURpeOPdKJtDbeWA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Integrating Amazon SageMaker Endpoints into Internet-facing Applications.
&lt;/h3&gt;

&lt;p&gt;The end use of ML models is for applications to send requests to it for inference/prediction. This can be accomplished you using API gateway and lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A4tal3jzl3C459XnAb37q3Q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A4tal3jzl3C459XnAb37q3Q.png" alt="architecture for application integration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Applications will make requests to the API endpoint, this will trigger a lambda function, the lambda function will preprocess the data to what the input model expects. ie convert text input to numeric representation and then send this to the model for prediction. &lt;br&gt;
The prediction result is received by the lambda function which is then returned to the API gateway to be sent to the users.&lt;/p&gt;

&lt;h3&gt;
  
  
  Clean Up
&lt;/h3&gt;

&lt;p&gt;Make sure to call end_point.delete_endpoint()to delete the model endpoint.&lt;br&gt;
After go ahead and delete any files uploaded by sagemaker from your s3 bucket.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you learned how to train and deploy deep learning models on Amazon Sagemaker.&lt;br&gt;
Here is a &lt;a href="https://github.com/paulkarikari/sentiment_analysis_deployment/blob/master/sentiment_analysis_tensorflow_keras_LSTM.ipynb" rel="noopener noreferrer"&gt;link&lt;/a&gt; to the complete Notebook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.tensorflow.org/tfx/serving/serving_basic" rel="noopener noreferrer"&gt;https://www.tensorflow.org/tfx/serving/serving_basic&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://sagemaker.readthedocs.io/en/stable/index.html#" rel="noopener noreferrer"&gt;https://sagemaker.readthedocs.io/en/stable/index.html#&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.kaggle.com/crowdflower/twitter-airline-sentiment" rel="noopener noreferrer"&gt;https://medium.com/r/?url=https%3A%2F%2Fwww.kaggle.com%2Fcrowdflower%2Ftwitter-airline-sentiment&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/datadriveninvestor/deep-learning-lstm-for-sentiment-analysis-in-tensorflow-with-keras-api-92e62cde7626" rel="noopener noreferrer"&gt;https://medium.com/datadriveninvestor/deep-learning-lstm-for-sentiment-analysis-in-tensorflow-with-keras-api-92e62cde7626&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
      <category>aws</category>
    </item>
    <item>
      <title>Deep Learning LSTM for Sentiment Analysis in Tensorflow with Keras API</title>
      <dc:creator>Paul Karikari</dc:creator>
      <pubDate>Thu, 13 Feb 2020 14:16:38 +0000</pubDate>
      <link>https://forem.com/paulkarikari/deep-learning-lstm-for-sentiment-analysis-in-tensorflow-with-keras-api-b7</link>
      <guid>https://forem.com/paulkarikari/deep-learning-lstm-for-sentiment-analysis-in-tensorflow-with-keras-api-b7</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Sentiment analysis is the process of determining whether language reflects a positive, negative, or neutral sentiment.&lt;br&gt;
Analyzing the sentiment of customers has many benefits for businesses. eg.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A company can filter customer feedback based on sentiments to identify things they have to improve about their services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A company can manage their online reputation easily by monitoring the sentiment of comments customers write about their products&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this tutorial, we will build a &lt;a href="https://en.wikipedia.org/wiki/Deep_learning"&gt;Deep learning&lt;/a&gt; model to classify text as either negative or positive.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Requirements&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data&lt;/strong&gt;: The data used is a collection of tweets about a major U.S airline available on &lt;a href="https://www.kaggle.com/crowdflower/twitter-airline-sentiment"&gt;Kaggle&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.tensorflow.org"&gt;Tensorflow&lt;/a&gt; version 1.15.0 or higher with &lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras"&gt;Keras&lt;/a&gt; API&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://numpy.org/"&gt;Numpy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Data Preparation
&lt;/h3&gt;

&lt;p&gt;let’s see how the data looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Tweets.csv'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;','&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rVfthQah--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3680/1%2Af-66q6ix-gByre1QZ2U-fg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rVfthQah--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3680/1%2Af-66q6ix-gByre1QZ2U-fg.png" alt="Data preview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Steps to prepare the data:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select relevant columns:
The data columns needed for this project are the &lt;strong&gt;&lt;em&gt;airline_sentiment&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;em&gt;text&lt;/em&gt;&lt;/strong&gt; columns. we are solving a classification problem so &lt;strong&gt;text&lt;/strong&gt; will be our features and &lt;strong&gt;&lt;em&gt;airline_sentiment&lt;/em&gt;&lt;/strong&gt; will be the labels.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Machine learning models work best when inputs are numerical. we will convert all the chosen columns to their needed numerical formats.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Transform &lt;strong&gt;&lt;em&gt;airline_sentiment&lt;/em&gt;&lt;/strong&gt; column to numerical category:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transform &lt;strong&gt;&lt;em&gt;text&lt;/em&gt;&lt;/strong&gt; column to a vector of numbers. (&lt;em&gt;more on this later&lt;/em&gt;)&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;#select relavant columns
&lt;/span&gt;    &lt;span class="n"&gt;tweet_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'airline_sentiment'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uu2VqZth--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A9Z0sgkVK_OoZI1Waqxb7lA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uu2VqZth--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A9Z0sgkVK_OoZI1Waqxb7lA.png" alt="Selected relevant columns"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need to classify tweets as either negative or positive, so we will filter out rows with neutral sentiment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;tweet_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweet_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tweet_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'airline_sentiment'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;'neutral'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H3Vdqm-H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Ax2GyENOsZ6UrE2ENSt5mHA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H3Vdqm-H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Ax2GyENOsZ6UrE2ENSt5mHA.png" alt="Data without neutral sentiment"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# convert airline_seentiment to numeric
&lt;/span&gt;    &lt;span class="n"&gt;sentiment_label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweet_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;airline_sentiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;factorize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SIFZ0-FR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Ar7MDtV8mxqDAaAGlClT0JQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SIFZ0-FR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Ar7MDtV8mxqDAaAGlClT0JQ.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Calling the &lt;strong&gt;factorize&lt;/strong&gt; method returns an array of numeric categories and an index of the categories. In this case, index 0 is positive and index 1 is negative sentiment respectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Preparing text for NLP:
&lt;/h3&gt;

&lt;p&gt;As I said earlier, Inputs to machine learning models need to be in numeric formats.&lt;br&gt;
This can be achieved by the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Assign a number to each word in the sentences and replace each word with their respective assigned numbers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use word &lt;a href="https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa"&gt;embeddings&lt;/a&gt;. This is capable of capturing the context of a word in a sentence or document.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tensorflow.keras.preprocessing.text&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Tokenizer&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tensorflow.keras.preprocessing.sequence&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pad_sequences&lt;/span&gt;

    &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweet_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_on_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;vocab_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="n"&gt;encoded_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;texts_to_sequences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;padded_sequence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pad_sequences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoded_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maxlen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;From the above code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;we get the actual texts from the data frame&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Initialize the tokenizer with a 5000 word limit. This is the number of words we would like to encode.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;we call &lt;strong&gt;fit_on_texts&lt;/strong&gt; to create associations of words and numbers as shown in the image below.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--p9BOV4pt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2484/1%2AAFVz2GXT-HNF7KUzCDjUlw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--p9BOV4pt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2484/1%2AAFVz2GXT-HNF7KUzCDjUlw.png" alt="word index"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;calling &lt;strong&gt;text_to_sequence&lt;/strong&gt; replaces the words in a sentence with their respective associated numbers. This transforms each sentence into sequences of numbers.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoded_docs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--N-bn8Pjv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ARo12i0MMmfv7iEr6zGiFTQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--N-bn8Pjv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ARo12i0MMmfv7iEr6zGiFTQ.png" alt="A tweet and it’s encoded version"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the above result, you can see the tweet is encoded as a sequence of numbers. eg. &lt;strong&gt;&lt;em&gt;to&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;em&gt;the&lt;/em&gt;&lt;/strong&gt; are converted to &lt;strong&gt;1&lt;/strong&gt; and &lt;strong&gt;2&lt;/strong&gt; respectively. &lt;br&gt;
Check the word index above to verify.&lt;/p&gt;

&lt;p&gt;The sentences or tweets have different number of words, therefore, the length of the sequence of numbers will be different. &lt;br&gt;
Our model requires inputs to have equal lengths, so we will have to pad the sequence to have the chosen length of inputs. This is done by calling the &lt;strong&gt;pad_sequence&lt;/strong&gt; method with a length of 200.&lt;br&gt;
All input sequences will have a length of 200.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;padded_sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GRudbBtr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AZW2I34dsu5CT-bABfElEWQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GRudbBtr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AZW2I34dsu5CT-bABfElEWQ.png" alt="Padded Sequence."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Build Model
&lt;/h2&gt;

&lt;p&gt;Now that we have the inputs processed. It's time to build the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# Build the model
&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tensorflow.keras.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Sequential&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LSTM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SpatialDropout1D&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Embedding&lt;/span&gt;

    &lt;span class="n"&gt;embedding_vector_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vocab_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding_vector_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     
                                         &lt;span class="n"&gt;input_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SpatialDropout1D&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LSTM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recurrent_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'sigmoid'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'binary_crossentropy'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'adam'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                               &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'accuracy'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SDkrQJFL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Am4owvpCTfcXdqUF8NoM8qg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SDkrQJFL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Am4owvpCTfcXdqUF8NoM8qg.png" alt="Model Summary"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ojGf3cx5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AcWfWK1xZQB9EtDvukRGHRA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ojGf3cx5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AcWfWK1xZQB9EtDvukRGHRA.png" alt="Model Structure"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where we get to use the &lt;strong&gt;LSTM&lt;/strong&gt; layer. The model consists of an embedding layer, &lt;strong&gt;LSTM&lt;/strong&gt; layer and a Dense layer which is a fully connected neural network with sigmoid as the &lt;a href="https://medium.com/datadriveninvestor/a-gentle-introduction-to-activation-functions-in-deep-learning-5d5402fcb033"&gt;activation function&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Dropouts are added in-between layers and also on the &lt;strong&gt;LSTM&lt;/strong&gt; layer to avoid overfitting.&lt;/p&gt;

&lt;h2&gt;
  
  
  LSTM
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7zbWBI3w--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3960/0%2ACXFcIW6nFgIlP2j4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7zbWBI3w--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3960/0%2ACXFcIW6nFgIlP2j4.png" alt="source: [http://colah.github.io/posts/2015-08-Understanding-LSTMs/](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Long Short Term Memory networks — usually just called “LSTMs” — are a special kind of RNN, capable of learning long-term dependencies. They were introduced by &lt;a href="http://www.bioinf.jku.at/publications/older/2604.pdf"&gt;Hochreiter &amp;amp; Schmidhuber (1997)&lt;/a&gt;, and were refined and popularized by many people in following work.&lt;a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/#fn1"&gt;1&lt;/a&gt; They work tremendously well on a large variety of problems, and are now widely used.&lt;br&gt;
 LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!&lt;br&gt;
 &lt;strong&gt;source&lt;/strong&gt;: &lt;a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/"&gt;http://colah.github.io/posts/2015-08-Understanding-LSTMs/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Train Model
&lt;/h2&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;padded_sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;sentiment_label&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                      &lt;span class="n"&gt;validation_split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--anBrikkH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2192/1%2A2JQYyPccGG25fahc_daffA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--anBrikkH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2192/1%2A2JQYyPccGG25fahc_daffA.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model is trained for 5 epochs which attains a validation accuracy of ~92%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; &lt;em&gt;Your result may vary slightly due to the stochastic nature of the model, try to run it a couple of times and you will have averagely about the same validation accuracy.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Model
&lt;/h2&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;test_word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"This is soo sad"&lt;/span&gt;

    &lt;span class="n"&gt;tw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;texts_to_sequences&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;test_word&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;tw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pad_sequences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;maxlen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tw&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="n"&gt;sentiment_label&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9KerZuXX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AVb7lXWn4-C5fak4okU7KYg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9KerZuXX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AVb7lXWn4-C5fak4okU7KYg.png" alt="Prediction result"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model is tested with a sample text to see how it predicts sentiment and we can see that it predicted the right sentiment for the sentence.&lt;/p&gt;

&lt;p&gt;You can run the entire notebook on Google Colab &lt;a href="https://colab.research.google.com/github/paulkarikari/LSTM-sentiment-analysis-with-tensorflow-keras-api/blob/master/Tutorial_sentiment_analysis.ipynb"&gt;here&lt;/a&gt; or check the entire notebook on &lt;a href="https://github.com/paulkarikari/LSTM-sentiment-analysis-with-tensorflow-keras-api/blob/master/Tutorial_sentiment_analysis.ipynb"&gt;Github&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/"&gt;http://colah.github.io/posts/2015-08-Understanding-LSTMs/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://keras.io/examples/imdb_lstm/"&gt;https://keras.io/examples/imdb_lstm/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://keras.io/layers/recurrent/"&gt;https://keras.io/layers/recurrent/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this tutorial, you learned how to use Deep learning LSTM for sentiment analysis in Tensorflow with Keras API.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>computerscience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>What is Data: A beginner's guide to understanding what Data means.</title>
      <dc:creator>Paul Karikari</dc:creator>
      <pubDate>Thu, 13 Feb 2020 09:44:43 +0000</pubDate>
      <link>https://forem.com/paulkarikari/what-is-data-a-beginner-s-guide-to-understanding-what-data-means-49ha</link>
      <guid>https://forem.com/paulkarikari/what-is-data-a-beginner-s-guide-to-understanding-what-data-means-49ha</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WXQSo1nL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/9620/0%2AUnQtEWgSIakQbxHK" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WXQSo1nL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/9620/0%2AUnQtEWgSIakQbxHK" alt="Photo by [Luke Chesser](https://unsplash.com/@lukechesser?utm_source=medium&amp;amp;utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;You’ve probably heard of the word &lt;strong&gt;“data”&lt;/strong&gt; several times maybe in school, from the news, in your daily work or profession, stumbled upon a couple of times on the Internet or anywhere you might find yourself and if you are a data scientist, well your entire profession depends on it.&lt;/p&gt;

&lt;p&gt;Data is limitless and its present anywhere in the universe, yet using the term data can sometimes be confusing because nearly everyone has an idea of what it means to them.&lt;br&gt;
[My data is not your data 😃]&lt;/p&gt;

&lt;h2&gt;
  
  
  Definition
&lt;/h2&gt;

&lt;p&gt;In computing, data may be in the form of text, documents, images, audio, and video. At its rudimentary level data is a bunch of ones and zeros.&lt;/p&gt;

&lt;p&gt;In statistics data is defined as facts or figures from which conclusion can be drawn.&lt;/p&gt;

&lt;p&gt;IT professionals will describe data in terms of entities and attributes.&lt;/p&gt;

&lt;p&gt;In layman’s terms, data describes a person, place, object, event or concept in the user context or environment with its meaning dependent on its organization.&lt;br&gt;
eg.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In computing different organization of &lt;strong&gt;1’s&lt;/strong&gt; and &lt;strong&gt;0’s&lt;/strong&gt; means different things,&lt;br&gt;
&lt;strong&gt;[0001 = 1 and 0010 = 2]&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In biology different sequence of the genome &lt;strong&gt;(A, C, G, and T)&lt;/strong&gt; result in different genetic code which represents different individuals or species.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Listing the purchase history with the identity of a customer represents the purchasing habit of that particular individual.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your tweets could be a random arrangement of any of the 26 characters in English and spaces. Yet you chose to arrange them in a way to convey meaning.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If data is not put into context it’s of no value to humans or computers. Context is key.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In the context of computing, &lt;strong&gt;0001&lt;/strong&gt; is the binary representation of &lt;strong&gt;1&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the context of Italian, your tweet in English means nothing even though they might contain the same sequence of characters.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some say that “facts” are things that can be shown to be true, to exist, or to have happened.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h1&gt;
  
  
  Ideally Data can be defined as the factual representation of the attributes of anything.
&lt;/h1&gt;
&lt;/blockquote&gt;

&lt;p&gt;Well, I say “Ideally” because data are not always factual. Simply put, data can be wrong. Part or whole data can sometimes represent something entirely different from what you expect or intend to measure. eg. &lt;a href="http://www.bbc.com/news/av/science-environment-39355424/nasa-error-schoolboy-finds-data-flaw"&gt;Schoolboy finds a flaw in Nasa Data&lt;/a&gt; and &lt;a href="http://www.washingtonpost.com/wp-dyn/content/article/2009/01/11/AR2009011102287.html"&gt;Math Error To Cost Maryland $31 Million&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Data that is factual or true or serves the needs of the problem domain is sometimes referred to as good data or signal.&lt;br&gt;
Data that is false, or invalid or does not serve the needs of the problem domain is sometimes called bad data or noise.&lt;/p&gt;

&lt;p&gt;Data that describes a set (more than one) of data is called metadata and a set of data is often referred to as a dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of data
&lt;/h2&gt;

&lt;p&gt;Let's consider a scenario (circumstance or a particular experiment) where you want to learn about the kinds of passengers with whom you board the same bus/train at your local bus/train station. So you gathered some information about each individual which becomes your dataset. [stalker 😏]&lt;/p&gt;

&lt;p&gt;Datasets are typically displayed in tables, as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GUpuh2Uh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AA6LkgRT-h_uuKsqI.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GUpuh2Uh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AA6LkgRT-h_uuKsqI.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A dataset is a set of data identified with a particular experiment, scenario,subject or circumstance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the table, rows represent individuals and columns represent variables&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y9i1lnnH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2Ayf9QDi1o450E-1fB.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y9i1lnnH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2Ayf9QDi1o450E-1fB.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the above we can say that:&lt;br&gt;
&lt;strong&gt;Data&lt;/strong&gt; are pieces of information about &lt;strong&gt;individuals&lt;/strong&gt; organized into &lt;strong&gt;variables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By an &lt;strong&gt;individual&lt;/strong&gt;, we mean a particular person or object.&lt;br&gt;
In our scenario, the passengers are the individuals.&lt;br&gt;
&lt;strong&gt;Individuals&lt;/strong&gt; are sometimes called observations, cases, vector or feature vector.&lt;/p&gt;

&lt;p&gt;By a &lt;strong&gt;variable&lt;/strong&gt;, we mean a particular characteristic of the individual. In our scenario, the variables are Age, Height, Seat Number, Gender, Class.&lt;br&gt;
&lt;strong&gt;Variables&lt;/strong&gt; are sometimes called observations, variables, or features.&lt;/p&gt;

&lt;p&gt;Each row gives us all of the information about a particular individual (in this case a passenger), and each column gives us information about a particular characteristic of all of the passengers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of data
&lt;/h2&gt;

&lt;p&gt;Data can be classified in many ways and from different perspectives which deserves its own blog but in short, Data can be classified as &lt;strong&gt;raw&lt;/strong&gt; or &lt;strong&gt;processed&lt;/strong&gt;, &lt;strong&gt;structured&lt;/strong&gt; or &lt;strong&gt;unstructured&lt;/strong&gt; and can also be classified as &lt;strong&gt;qualitative&lt;/strong&gt; or &lt;strong&gt;quantitative&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Names, Names, and More Names
&lt;/h2&gt;

&lt;p&gt;If you follow carefully you would realize that there are different ways of naming the same thing which stems from the field of study, preference or mere convention. This can be overwhelming for a beginner or someone new to a particular field but don’t be discouraged. You might already know what a term means. It's all a matter of familiarity. Don’t feel bad to ask or search the Internet.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>A Gentle Introduction To Activation Functions in Deep Learning.</title>
      <dc:creator>Paul Karikari</dc:creator>
      <pubDate>Mon, 03 Feb 2020 20:34:18 +0000</pubDate>
      <link>https://forem.com/paulkarikari/a-gentle-introduction-to-activation-functions-in-deep-learning-3ja6</link>
      <guid>https://forem.com/paulkarikari/a-gentle-introduction-to-activation-functions-in-deep-learning-3ja6</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uMl2o8M1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A3tLHUJWOjUrL5aZlWo56yQ.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uMl2o8M1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A3tLHUJWOjUrL5aZlWo56yQ.jpeg" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When you get started with deep learning you will definitely come across the term &lt;strong&gt;&lt;em&gt;activation functions&lt;/em&gt;&lt;/strong&gt; also known as Neural Transfer Functions.&lt;br&gt;
In this blog, I will explain what activation functions are and why they are used in deep learning models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; &lt;em&gt;I assume you have a basic understanding of neural networks.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The goal of Machine Learning/Deep learning algorithms is to recognize patterns in data. This pattern can be considered as a function from a mathematical point of view. The ability of machine learning algorithms to approximate this underlying function in a given data is what makes such algorithms very powerful.&lt;br&gt;
The recognition of this function or pattern makes it possible for the model to predict the output of new data.&lt;/p&gt;

&lt;p&gt;The pattern/function underlying data can be simple such as a linear relation and sometimes complex such as a non-linear relation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--a9EZXLOB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AGeyQBtPVgcurFjY1LQdlxw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--a9EZXLOB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AGeyQBtPVgcurFjY1LQdlxw.jpeg" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A Simple Artificial Neuron&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Deep learning models usually consist of many neurons stacked in layers.&lt;br&gt;
Let’s consider a single neuron for simplicity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vrY7HOVi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AJsm0NBsPuVUKQvEjdpUOpg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vrY7HOVi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AJsm0NBsPuVUKQvEjdpUOpg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The operations performed by a neuron basically involve multiplication and summation operations which are linear and produce an output.&lt;br&gt;
After this, an activation function is applied to produce the final out of the neuron.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YW_U4w49--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AZ5mWm24o4cDVZ5UI.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YW_U4w49--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AZ5mWm24o4cDVZ5UI.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Without applying the activation function, the above will just be like a linear function that maps inputs to outputs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kNpmlXdY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AY3b42mQ4XMOglknn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kNpmlXdY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AY3b42mQ4XMOglknn.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This makes the neuron only approximate linear functions. As a result, the model can’t recognize complex patterns in data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why are activation functions needed?
&lt;/h2&gt;

&lt;p&gt;In order for neural networks to approximate non-linear or complex functions, there has to be a way to add a non-linear property to the computation of results. &lt;br&gt;
Using activation functions serves the purpose of introducing non-linearity into the model. This makes it possible for the deep learning models to find complex patterns in data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can any non-linear function be used as an activation function?
&lt;/h3&gt;

&lt;p&gt;No, before a function can be considered as a good candidate for Deep learning models it should have the following properties:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Non-linear&lt;/strong&gt;&lt;br&gt;
This is required to introduce non-linearity in the model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monotonic&lt;/strong&gt;&lt;br&gt;
A function that is either entirely non-increasing or non-decreasing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Differentiable&lt;/strong&gt;&lt;br&gt;
Deep learning algorithms update their weights via an algorithm called &lt;a href="https://en.wikipedia.org/wiki/Backpropagation"&gt;backpropagation&lt;/a&gt;. This algorithm can work when the activation function used is differentiable. ie it’s derivates can be calculated.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Types of Activation Functions.
&lt;/h2&gt;

&lt;p&gt;Most useful activation functions are non-linear functions. The following lists common activation functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Tahn or Hyperbolic tangent function&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EEvw1s7I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2A2Ltoo51YGOjC4Dlo" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EEvw1s7I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2A2Ltoo51YGOjC4Dlo" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This function has an upper bound of 1 and a lower bound of -1, therefore it will produce outputs between the ranges of 1 to -1.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Sigmoid or logistic function&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Cw69Sbt5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2A8qWWTH6z50LP28jd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Cw69Sbt5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2A8qWWTH6z50LP28jd.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This function outputs values between (0,1) and it is centered at 0.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Relu (Rectified Linear Unit)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EbtrZ-QO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AFVQSRNxLvditSAqD.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EbtrZ-QO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AFVQSRNxLvditSAqD.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This function produces values between 0 to infinity. ie it only outputs positive values.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Leaky Relu&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FUk2TyTi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AjdLsPwZWmhZNPtd1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FUk2TyTi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2AjdLsPwZWmhZNPtd1.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a variation of &lt;strong&gt;&lt;em&gt;Relu.&lt;/em&gt;&lt;/strong&gt; unlike Relu, &lt;strong&gt;&lt;em&gt;Leaky Relu&lt;/em&gt;&lt;/strong&gt; allows more output values.&lt;br&gt;
It outputs values between 0.01 to infinity&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Softmax&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oZsLcBw5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2Ai20dHyZAjUFzsL_s" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oZsLcBw5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2Ai20dHyZAjUFzsL_s" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This function is mostly used for multiclass prediction problems and it outputs class probabilities for a given input.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Which Activation Function Should I use?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Activation functions have strengths and weaknesses which is based on how well they allow the model to learn features for generalization.&lt;/p&gt;

&lt;p&gt;The choice of activation function also depends on the problem you're trying to solve.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relu&lt;/strong&gt; is commonly used for hidden layers and &lt;strong&gt;sigmoid / softmax&lt;/strong&gt; is commonly used for the output layer.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;sigmoid&lt;/em&gt;&lt;/strong&gt; for binary classification problems and &lt;strong&gt;&lt;em&gt;softmax&lt;/em&gt;&lt;/strong&gt; for multiclass classification problems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tanh&lt;/strong&gt; is avoided most of the time due to &lt;strong&gt;dead neuron&lt;/strong&gt; problem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sigmoid&lt;/strong&gt; and &lt;strong&gt;Tanh&lt;/strong&gt; functions are sometimes avoided due to the vanishing gradient and dead neuron problems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If we encounter a &lt;strong&gt;case of dead neurons&lt;/strong&gt; in the networks the &lt;strong&gt;leaky ReLU&lt;/strong&gt; function is the best choice.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning-activation-functions-when-to-use-them/"&gt;https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning-activation-functions-when-to-use-them/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.datastuff.tech/machine-learning/why-do-neural-networks-need-an-activation-function/"&gt;http://www.datastuff.tech/machine-learning/why-do-neural-networks-need-an-activation-function/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Activation_function"&gt;https://en.wikipedia.org/wiki/Activation_function&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@abhigoku10/activation-functions-and-its-types-in-artifical-neural-network-14511f3080a8"&gt;https://medium.com/@abhigoku10/activation-functions-and-its-types-in-artifical-neural-network-14511f3080a8&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, you learned what activation functions are and why they are needed in deep learning models and you also saw commonly used activation functions.&lt;br&gt;
I hope this article served the purpose of introducing you to Activation functions.&lt;/p&gt;

</description>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
