<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Njeri Gitome</title>
    <description>The latest articles on Forem by Njeri Gitome (@njerigitome).</description>
    <link>https://forem.com/njerigitome</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1025831%2F4879998e-f5d8-416b-8390-684323a8a886.jpg</url>
      <title>Forem: Njeri Gitome</title>
      <link>https://forem.com/njerigitome</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/njerigitome"/>
    <language>en</language>
    <item>
      <title>Getting started with Sentiment Analysis and Implementation</title>
      <dc:creator>Njeri Gitome</dc:creator>
      <pubDate>Sat, 25 Mar 2023 03:27:59 +0000</pubDate>
      <link>https://forem.com/njerigitome/getting-started-with-sentiment-analysis-and-implementation-2g4d</link>
      <guid>https://forem.com/njerigitome/getting-started-with-sentiment-analysis-and-implementation-2g4d</guid>
      <description>&lt;p&gt;The advent of the internet has revolutionized how people communicate their thoughts and opinions. Today, millions of people share their daily lives and express their emotions on social media platforms like Facebook, Twitter, Google, and others.&lt;/p&gt;

&lt;p&gt;A significant amount of sentiment-rich data is being generated via social media in the form of tweets, status updates, blog posts, comments, reviews, etc. This essential data drives analysts to discover insights and patterns through sentiment analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Sentiment Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Sentiment analysis is the process of extracting the emotions from the user's written text by processing unstructured information and preparing a model to extract the knowledge from it. It’s often used by businesses to detect sentiment in social data, gauge brand reputation, and understand customers.&lt;/p&gt;

&lt;p&gt;It involves the use of data mining, machine learning (ML), artificial intelligence and computational linguistics to mine text for sentiment and subjective information such as whether it is expressing positive, negative or neutral feelings.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Different approaches for sentiment analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are various approaches for sentiment analysis on linguistic data, and which approach to use depends on the nature of the data and the platform one is working on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--woYeRzdn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kfgbkfy4sa66n1ah275i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--woYeRzdn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kfgbkfy4sa66n1ah275i.png" alt="sentimental analysis approaches" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
Most research carried out in the field of sentiment analysis employs lexicon-based analysis or machine learning techniques.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lexicon-Based approach
&lt;/h3&gt;

&lt;p&gt;Also known as Dictionary based approach, it classifies linguistic (sentiments) data using lexical databases like SentiWordNet and WordNet.&lt;/p&gt;

&lt;p&gt;It obtains a score for each word in the sentence or document and annotates it using the feature from the lexical database present. It derives text polarity based on a set of words, each which is annotated with the weight and extracts information that contributes to conclude overall sentiments to the text.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Machine Learning approach&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In this approach, the words in the sentence are considered in form of vectors, and analyzed using different machine learning algorithms like Naïve Bayes, Support Vector Machine (SVM) and Maximum Entropy.&lt;/p&gt;

&lt;p&gt;In this article, we will dataset that is available at:&lt;br&gt;
&lt;a href="https://www.kaggle.com/datasets/kazanova/sentiment140"&gt;https://www.kaggle.com/datasets/kazanova/sentiment140&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data consists of sentiments expressed by users through various tweets. each comment is a record, which is classified as either positive or negative.&lt;/p&gt;

&lt;p&gt;By filtering and analyzing the data using Natural Language Processing Techniques, and sentiment polarity is calculated based on the emotion word detected in the user tweets. This approach is implemented using Python programming language and Natural Language Toolkit(NLTK).&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Text-Preprocessing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Natural Language Processing (NLP) is a branch of Data Science that deals with Text data. Text data is unstructured and therefore needs extensive text preprocessing. &lt;/p&gt;

&lt;p&gt;Some steps of the preprocessing are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lower casing&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Removing Hyperlinks&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Removing punctuations&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Removing Stop words&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tokenization&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stemming&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lemmatization&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lets start by loading the data!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xjRWR17x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q8p4h0dk9r92g41z67ai.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xjRWR17x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q8p4h0dk9r92g41z67ai.png" alt="load data" width="800" height="310"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our columns of interest are those of unstructured textual tweets and sentiment. Therefore, the rest of the columns are dropped.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JNgrLUsU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/npquj1eym8p4j7sa6ftk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JNgrLUsU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/npquj1eym8p4j7sa6ftk.png" alt="drop columns" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Lowercase all the tweets&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The first step is transforming the tweets into lowercase to maintain a consistent flow during the NLP tasks and text mining.&lt;br&gt;
For example 'Nation' and 'nation' will be treated as two different words in any sentence, and hence, we need to make all the words lowercase in the tweets to avoid duplication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uG4ZGAUP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5yk4nyz6giuqp6iekthu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uG4ZGAUP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5yk4nyz6giuqp6iekthu.png" alt="lowercase" width="800" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Remove Hyperlinks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;They are very common in tweets and don't add any additional information as per our problem statement of sentiment analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IUCtjVTC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/12dubq329obxiqkn03ii.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IUCtjVTC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/12dubq329obxiqkn03ii.png" alt="hyperlinks" width="800" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Remove Punctuations&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For most NLP problems, punctuations do not provide additional language information and are generally removed.&lt;br&gt;
Similarly, punctuation symbols are not crucial for sentiment analysis. they are redundant are the removal of punctuation before text modelling is highly recommended.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Uq5Whmox--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4etow9yfb5h85j125voe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Uq5Whmox--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4etow9yfb5h85j125voe.png" alt="punctuation" width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Remove Stop words&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Stop words are English words that do not add much meaning to a sentence. They are removed as they do not add value to the analysis.&lt;/p&gt;

&lt;p&gt;NLTK library consists of a list of words that are considered stop words for the English language. Some of them are : [i, me, my, myself, we, our, ours]&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PQhvc8Ys--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mfr7idyit1ddd71bwcpn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PQhvc8Ys--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mfr7idyit1ddd71bwcpn.png" alt="stopwords" width="800" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Tokenization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This refers to splitting up a larger body of text into smaller lines (paragraphs) and words. These broken pieces are called tokens (either word token or sentence tokens). They help in understanding the context and create a vocabulary.&lt;/p&gt;

&lt;p&gt;Below is an example of a string of data:&lt;br&gt;
&lt;em&gt;"What is your favourite food joint?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In order for this sentence to be understood by a machine, tokenization is performed on the string to break it into individual parts. &lt;/p&gt;

&lt;p&gt;Tokens:&lt;br&gt;
"What" "is" "your" "favourite" "food" "joint" "?"&lt;/p&gt;

&lt;p&gt;Code sample:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5F8ozAzf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1ng2vivvuupgru4ll2c6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5F8ozAzf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1ng2vivvuupgru4ll2c6.png" alt="tokenize" width="800" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tokenization of the tweets:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xMgt5QUW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gvso8nsrmuw2e6vj83er.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xMgt5QUW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gvso8nsrmuw2e6vj83er.png" alt="tweet tokenization" width="800" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Stemming and Lemmatization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;These are text normalization techniques for Natural Language Processing. Both processes aim to reduce the word into a common base word or root word.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iJNsggKd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6u8nxw7yvxboo5ilr2am.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iJNsggKd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6u8nxw7yvxboo5ilr2am.png" alt="stem vs lem" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stemming&lt;/strong&gt;&lt;br&gt;
This is the process of reducing a word to its stem, using a list of common prefixes such as ( -ing, -ed, -es).&lt;br&gt;
&lt;strong&gt;Pros&lt;/strong&gt;: Faster in executing large amounts of datasets.&lt;br&gt;
&lt;strong&gt;Cons&lt;/strong&gt;: This may result to meaningless words.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lemmatization&lt;/strong&gt;&lt;br&gt;
The process of reducing a word to its stem by utilizing linguistic analysis of the word.&lt;br&gt;
&lt;strong&gt;Pros&lt;/strong&gt;: Preserve the meaning after extracting the root word.&lt;br&gt;
&lt;strong&gt;Cons&lt;/strong&gt;: Computationally expensive.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lemmatization is almost always preferred over stemming algorithms until and unless there is need for super-fast execution on a massive corpus of text data.&lt;/p&gt;

&lt;p&gt;Applying lemmatization to the tweets:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_lscBVZI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/a9383tc927mp4fi47pim.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_lscBVZI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/a9383tc927mp4fi47pim.png" alt="lemmatization" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Text Exploratory Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;First, analyzing the text length for different sentiments.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Word Cloud&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;A word cloud is a graphical representation of word frequency. The larger the word in the visualization the more common the word was in the document(s).&lt;/p&gt;

&lt;p&gt;Word cloud for positive tweets in our dataset:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mTrKUngH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cy0i36onrlj44kluy5qy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mTrKUngH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cy0i36onrlj44kluy5qy.png" alt="code for wordcloud" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ada0rbks--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zyw94r66mf0mhska6hou.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ada0rbks--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zyw94r66mf0mhska6hou.png" alt="positive word cloud" width="800" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Word cloud for negative tweets in our dataset:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9jpnRw86--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hn4ay1b8flf6b1d9tjkt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9jpnRw86--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hn4ay1b8flf6b1d9tjkt.png" alt="Image description" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rctYc2Aj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kk8gic0pqqshjuuafxm9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rctYc2Aj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kk8gic0pqqshjuuafxm9.png" alt="negative wordcloud" width="800" height="540"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analysis</category>
    </item>
    <item>
      <title>Essential SQL Commands for Data Science</title>
      <dc:creator>Njeri Gitome</dc:creator>
      <pubDate>Sat, 11 Mar 2023 05:11:39 +0000</pubDate>
      <link>https://forem.com/njerigitome/essential-sql-commands-for-data-analysis-3c3d</link>
      <guid>https://forem.com/njerigitome/essential-sql-commands-for-data-analysis-3c3d</guid>
      <description>&lt;p&gt;Structured Query Language (SQL) is important for a data scientist because it is a powerful way to access, process, clean and analyze data stored in relational databases.&lt;/p&gt;

&lt;p&gt;Understanding of essential SQL commands is therefore crucial to allow the data scientist perform efficiently in their role.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Categories of SQL commands&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Data Definition Language (DDL)
&lt;/h3&gt;

&lt;p&gt;These are set of SQL commands that can be used to define the database schema.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;List of DDL commands:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CREATE&lt;/strong&gt; - create the database or its objects (table, index, function, views, store procedure and triggers)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DROP&lt;/strong&gt; - delete objects from the database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ALTER&lt;/strong&gt; - make changes to a table, view or entire database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TRUNCATE&lt;/strong&gt; - remove all records from a table without deleting the table structure.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Data Manipulation Language (DML)
&lt;/h3&gt;

&lt;p&gt;A subset of SQL commands used to manipulate the data that is present in the database.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;List of DDL commands:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;INSERT&lt;/strong&gt; - insert data into a table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;UPDATE&lt;/strong&gt; - modify or change existing records in a table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DELETE&lt;/strong&gt; - delete records from a table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SELECT&lt;/strong&gt; - query data from a table(s)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  SQL for Data Scientists
&lt;/h2&gt;

&lt;p&gt;One is able to perform the following using SQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Basics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Joins&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Aggregations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Subqueries &amp;amp; Temporary Tables&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Cleaning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Window Functions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lets delve deeper into these processes on SQL as well as the essential SQL commands that come in handy.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Basic SQL Commands&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This comprises of CRUD (Create Read Update Delete) operations that is made possible by the use of the DDL and DML commands&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;SQL Joins&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;JOIN statements are used to combine two or more tables, based on a related column between them.&lt;/p&gt;

&lt;p&gt;-&lt;strong&gt;JOIN&lt;/strong&gt; - returns records that have matching values in both tables.&lt;/p&gt;

&lt;p&gt;-&lt;strong&gt;LEFT JOIN&lt;/strong&gt; - returns all records from the left table, and the matched records from the right table.&lt;/p&gt;

&lt;p&gt;-&lt;strong&gt;RIGHT JOIN&lt;/strong&gt; - returns all records from the right table, and the matched records from the left table&lt;/p&gt;

&lt;p&gt;-&lt;strong&gt;FULL JOIN&lt;/strong&gt; - returns all records when there is a match in either left or right table&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;SQL Aggregations&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;These are commands that are used to perform calculations on a set of rows in a table and return a single value as the result.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Here are the most common SQL aggregation functions:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SUM()&lt;/strong&gt; - calculates the sum of values in a column.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AVG()&lt;/strong&gt; -calculates the average value of a column.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MAX()&lt;/strong&gt; - finds the maximum value in a column.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MIN()&lt;/strong&gt; - finds the minimum value in a column.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;COUNT()&lt;/strong&gt; - counts the number of rows in a table or the number of rows that meet a certain condition.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aggregation functions are very useful for data scientists as they enable them to perform calculations on large datasets quickly and efficiently. They can be used to generate summaries of data, identify trends, and make informed decisions based on the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;SQL Subqueries &amp;amp; Temporary Tables&lt;/strong&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Data Cleaning&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;SQL string functions are used to clean data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LEFT()&lt;/strong&gt; - extracts  characters from a string starting from the left.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RIGHT()&lt;/strong&gt; - extracts  characters from a string starting from the right.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SUBSTR()&lt;/strong&gt; - extracts a substring from a string starting at any position.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CONCAT()&lt;/strong&gt; - adds two or more strings together.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CAST()&lt;/strong&gt; - converts a value of any type into a specific, different data type.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;POSITION()&lt;/strong&gt; - used to find the position of a substring within a string, starting from a specified position.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;STRPOS()&lt;/strong&gt; - finds the position of the first occurrence of a string inside another string.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;COALESE()&lt;/strong&gt; - returns the first non-null value in a list.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Subqueries &amp;amp; Temporary Tables&lt;/strong&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Window Functions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A window function allows one to compare one row to another without doing any joins. Window functions are effective while measuring trends over time or rank a specific column, and it retains the total number of records without collapsing or condensing any of the original datasets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aggregate Window Function&lt;/strong&gt;&lt;br&gt;
Aggregate functions such as &lt;em&gt;SUM()&lt;/em&gt;, &lt;em&gt;COUNT()&lt;/em&gt;, &lt;em&gt;AVERAGE()&lt;/em&gt;, &lt;em&gt;MAX()&lt;/em&gt;, &lt;em&gt;MIN()&lt;/em&gt; applied over a particular window (set of rows) are called aggregate window functions. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ranking Window Functions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RANK()&lt;/strong&gt; - assigns ranks to all the rows within every partition. Rank is assigned such that rank 1 given to the first row and rows having same value are assigned same rank. For the next rank after two same rank values, one rank value will be skipped. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DENSE_RANK()&lt;/strong&gt; - assigns rank to each row within partition. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference between &lt;em&gt;RANK()&lt;/em&gt; and &lt;em&gt;DENSE_RANK()&lt;/em&gt; is that in &lt;em&gt;DENSE_RANK()&lt;/em&gt;, for the next rank after two same rank, consecutive integer is used, no rank is skipped. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ROW_NUMBER()&lt;/strong&gt; - assigns consecutive integers to all the rows within partition. Within a partition, no two rows can have same row number. &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>sql</category>
    </item>
    <item>
      <title>Exploratory Data Analysis Ultimate Guide</title>
      <dc:creator>Njeri Gitome</dc:creator>
      <pubDate>Thu, 02 Mar 2023 10:57:24 +0000</pubDate>
      <link>https://forem.com/njerigitome/exploratory-data-analysis-ultimate-guide-2h9g</link>
      <guid>https://forem.com/njerigitome/exploratory-data-analysis-ultimate-guide-2h9g</guid>
      <description>&lt;p&gt;The phrase "Data is the new gold", emphasizes the increasing value of data in today's world. When properly analyzed, data can uncover valuable insights that inform critical decisions and shape the future. In order to extract insights from data, one must first understand it. This is where Exploratory Data Analysis (EDA) comes in. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. What is Exploratory Data Analysis?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Exploratory data analysis is one of the first steps in the data analytics process. It entails application of various techniques in analysis of the dataset in order to understand the data.&lt;/p&gt;

&lt;p&gt;Understanding the dataset simply means to getting to know the data and its characteristics, which can help in identifying potential issues, patterns, and relationships within the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. What is the objective of Exploratory Data Analysis?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are two main objectives of EDA:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;EDA assists in identifying faulty points in the data. Once the faulty points have been identified, they can be easily removed, resulting in data cleaning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It also helps in understanding the relationship between the variables. This gives a wider perspective on the data which helps  in building models by utilizing the relationship between various features(variables).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3.Types of Exploratory Data Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are two main types of exploratory data analysis which are Univariate EDA and Multivariate EDA.&lt;/p&gt;

&lt;h3&gt;
  
  
  Univariate EDA
&lt;/h3&gt;

&lt;p&gt;Uni means one and variate means variable, so in Univariate Analysis there is only one dependable variable. The goal of univariate analysis is to simply describe the data and find patterns with the data. Univariate EDA techniques include:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Univariate non-graphical EDA techniques:&lt;/em&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Central Tendency (mean, mode and median)&lt;/li&gt;
&lt;li&gt;Dispersion (range, variance)&lt;/li&gt;
&lt;li&gt;Quartiles (interquartile range) &lt;/li&gt;
&lt;li&gt;Standard deviation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Univariate graphical EDA techniques:&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;These are graphical methods that provide a visualization of the data. Common types of univariate graphics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Histograms&lt;/strong&gt;, a graph plot in which each bar represents the frequency distribution of numerical data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Box plots&lt;/strong&gt;, which graphically depict the five-number summary of minimum, first quartile, median, third quartile and maximum.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multivariate EDA
&lt;/h3&gt;

&lt;p&gt;This is a method of analyzing data involving more than two variables. The goal is to understand patterns, correlations and interactions between variables. Multivariate techniques include:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Multivariate non-graphical EDA techniques:&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;These techniques generally show the relationship between two or more variables of the data through cross-tabulation or statistics.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Multivariate graphical EDA techniques:&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;These are graphical methods that display the relationships between two or more sets of data. Common types of multivariate graphics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scatter plot&lt;/strong&gt;, it is used to plot two quantitative variables on a horizontal (x-axis) and vertical(y-axis) to display the relationship between the continuous variables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multivariate chart&lt;/strong&gt;, is a graphical representation of the relationships between factors and a response.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run chart&lt;/strong&gt;, a line graph drawn over time. It visually illustrates the data values in a time sequence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bubble chart&lt;/strong&gt;, scatter plots that display multiple circles (bubbles) in a two-dimensional plot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Heatmap&lt;/strong&gt;, a graphical representation of data in the form of a map or diagram in which data values are represented as colors.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Exploratory Data Analysis Tools&lt;/strong&gt;
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Python
&lt;/h4&gt;

&lt;p&gt;Python is used for different tasks in EDA, such as finding missing values in data collection, data description, handling outliers, obtaining insights through charts etc.&lt;/p&gt;

&lt;h4&gt;
  
  
  R
&lt;/h4&gt;

&lt;p&gt;R programming language is a regularly used option to make statistical observations and analyze data, i.e., perform detailed EDA by data scientists and statisticians.&lt;/p&gt;

&lt;h4&gt;
  
  
  MATLAB
&lt;/h4&gt;

&lt;p&gt;It is common among engineers and domain experts due to its strong mathematical calculation ability.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. Steps involved in Exploratory Data Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are three main steps involved in exploratory data analysis. They can be simplified as data collection, data cleaning and analysis of the relationship between the variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Data Collection
&lt;/h3&gt;

&lt;p&gt;It is the first step in EDA, it involves gathering relevant data from various sources. Some reliable sites for data collection are Kaggle, GitHub, UCI Machine Learning Repository etc.&lt;/p&gt;

&lt;p&gt;The data depicted in the example represents the 120 years of Olympic History dataset that is available on Kaggle.&lt;/p&gt;

&lt;p&gt;While at the IDE of choice, start by importing the necessary libraries.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiy1lfu03u29bvva4jvfs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiy1lfu03u29bvva4jvfs.png" alt="Importing necessary Libraries" width="800" height="200"&gt;&lt;/a&gt;&lt;br&gt;
Then, load the dataset into DataFrames:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1itvrihmrfcu1bmhtvx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1itvrihmrfcu1bmhtvx.png" alt="Loading dataset" width="800" height="74"&gt;&lt;/a&gt;&lt;br&gt;
Display the content of the datasets:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zyqs4j6lqhmq5j7d0s2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zyqs4j6lqhmq5j7d0s2.png" alt="Main df" width="800" height="222"&gt;&lt;/a&gt;&lt;br&gt;
Regions dataset:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbj7h0vy2mfv3himh8s2f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbj7h0vy2mfv3himh8s2f.png" alt="Regions" width="800" height="178"&gt;&lt;/a&gt;&lt;br&gt;
Check the shape of the DataFrames:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa69vdo4gm35bm3tcc88n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa69vdo4gm35bm3tcc88n.png" alt="df.shape" width="800" height="89"&gt;&lt;/a&gt;&lt;br&gt;
This DataFrame shape is (271116, 15) which means that it has 271116 observations (rows) and 15 features (columns).&lt;br&gt;
Checking the region's DataFrame shape:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7vb4xcda08r7hlck3ha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7vb4xcda08r7hlck3ha.png" alt="region" width="800" height="99"&gt;&lt;/a&gt;&lt;br&gt;
The DataFrame shape is (230, 3) which implies that it has 230 rows and 3 columns.&lt;br&gt;
Next, Merge the two DataFrames:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1vjkqodaczc27lcbt23.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1vjkqodaczc27lcbt23.png" alt="Olympics df" width="800" height="74"&gt;&lt;/a&gt;&lt;br&gt;
Check the shape of the Olympics DataFrame:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey40nuv23vgkmr6z531z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey40nuv23vgkmr6z531z.png" alt="Olympics shape" width="800" height="82"&gt;&lt;/a&gt;&lt;br&gt;
Display the content of the Olympics DataFrame:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqm8lrdlhwjdwemaasy82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqm8lrdlhwjdwemaasy82.png" alt="olympics.head()" width="800" height="260"&gt;&lt;/a&gt;&lt;br&gt;
Check the concise summary of the DataFrame using the info() function.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g0736ctpphtap1m1gfm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g0736ctpphtap1m1gfm.png" alt="information" width="800" height="376"&gt;&lt;/a&gt;&lt;br&gt;
Check the descriptive analysis of the DataFrame using the describe() function. It provides descriptive information about the dataset.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oyq8fqjs8iy0znxx7bo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oyq8fqjs8iy0znxx7bo.png" alt="descriptive analysis" width="800" height="190"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Data Cleaning
&lt;/h3&gt;

&lt;p&gt;This is a critical step in EDA that involves identifying and correcting errors and inconsistencies in the data to ensure its accuracy and integrity.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1. &lt;em&gt;Handling the missing values.&lt;/em&gt;&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;This is a crucial step in data analysis. Missing values can be handled in various ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Removing missing values&lt;/em&gt; - this is simply removing any rows or columns that contain missing values. This is only appropriate if the amount of missing data is small relative to the size of the dataset and removing the missing data does not significantly affect the analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Imputing missing values&lt;/em&gt; - this is imputing the missing value with an estimated value. The simplest approach is to impute the missing values with the mean, median, or mode of the non-missing values. More advanced imputation techniques involve using machine learning algorithms to predict the missing values based on other features in the dataset.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Ignoring missing values&lt;/em&gt;- in some cases, it may be appropriate to ignore missing values if they do not significantly affect the analysis.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Handling missing values in the Olympics dataset:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First check for missing values:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzbaoxnf6hd3p0iksnv0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzbaoxnf6hd3p0iksnv0.png" alt="missing values" width="800" height="159"&gt;&lt;/a&gt;&lt;br&gt;
Then, the percentage of missing values:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg3txou6n5ylh9sg9j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg3txou6n5ylh9sg9j.png" alt="Percentage of missing values" width="800" height="180"&gt;&lt;/a&gt;&lt;br&gt;
The results above provide insights on how to handle the missing values in the Olympics dataset.&lt;/p&gt;

&lt;p&gt;The Notes columns has 98% of the data missing and can therefore be dropped.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0mef1bs0cqcp1shr940.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0mef1bs0cqcp1shr940.png" alt="drop column" width="800" height="79"&gt;&lt;/a&gt;&lt;br&gt;
The Height and Weight missing values can be be imputed by the mean.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgpzaepj84tqv297g77fk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgpzaepj84tqv297g77fk.png" alt="impute missing" width="800" height="95"&gt;&lt;/a&gt;&lt;br&gt;
The Age column has 3% of the data missing, while Region has 0.3%, this value is relatively small and thus any modification to the column can be ignored. The missing values in the Medal column are ignored since Nan indicates that no medals were won.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2. &lt;em&gt;Handling duplicate values.&lt;/em&gt;&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;This involves identifying and removing or modifying duplicates. Here are some common approaches of handling duplicate values:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Identifying and removing exact duplicates&lt;/em&gt;- Exact duplicates are rows that have identical values in all columns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Identifying and removing partial duplicates&lt;/em&gt;- Partial duplicates are rows that have the same values in some columns but differ in others.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's a code example of how to handle the duplicates mentioned above:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx4on58hjwrvhwypia7gw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx4on58hjwrvhwypia7gw.png" alt="duplicates" width="800" height="176"&gt;&lt;/a&gt;&lt;br&gt;
The Olympics dataset does not require this check because duplicates are inevitable given the nature of the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Analyzing the relationship between the variables.
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;&lt;em&gt;Univariate non-graphical EDA&lt;/em&gt;&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Top 10 participating countries&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fksbtm1k84aoxxk9mba9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fksbtm1k84aoxxk9mba9a.png" alt="Top 10 participating countries" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;&lt;strong&gt;Univariate graphical EDA&lt;/strong&gt;&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Bar plot for Top 10 participating countries&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F704dxys7c9eror8rqez0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F704dxys7c9eror8rqez0.png" alt="barplot" width="800" height="412"&gt;&lt;/a&gt;&lt;br&gt;
code:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2enz7r5brqfmvkkp1913.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2enz7r5brqfmvkkp1913.png" alt="code" width="800" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Age distribution of the athletes&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frryidjxo3xwwk71njlvy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frryidjxo3xwwk71njlvy.png" alt="Age distribution of the athletes" width="800" height="428"&gt;&lt;/a&gt;&lt;br&gt;
code:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25988djsjv2zmy043492.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25988djsjv2zmy043492.png" alt="code" width="800" height="115"&gt;&lt;/a&gt;&lt;br&gt;
Interpretation: Most participants were aged between 23-26 years.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Height distribution of the athletes&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf16tfsmh675yp9b38pw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf16tfsmh675yp9b38pw.png" alt="height" width="800" height="428"&gt;&lt;/a&gt;&lt;br&gt;
Code:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8lvelq3rkpfyaicnzaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8lvelq3rkpfyaicnzaq.png" alt="code" width="800" height="109"&gt;&lt;/a&gt;&lt;br&gt;
Interpretation: The height of the athletes ranges between 150cm to 178cm. Most of the participants had a height of 175cm&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;&lt;em&gt;Multivariate non-graphical EDA&lt;/em&gt;&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Number of athletes in respect to their gender&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv4n7eczvbqn9jd1jc7nb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv4n7eczvbqn9jd1jc7nb.png" alt="Gender" width="800" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Top 15 Countries and number of Gold Medals Won in the 2016 Olympics&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyp27m47mwvhffuy56zn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyp27m47mwvhffuy56zn.png" alt="top 15" width="800" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;&lt;em&gt;Multivariate graphical EDA&lt;/em&gt;&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Pie plot for male and female distribution of athletes&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3k5px0sbyzvbvllatf9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3k5px0sbyzvbvllatf9.png" alt="Pie Plot" width="482" height="502"&gt;&lt;/a&gt;&lt;br&gt;
Code:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hh9rrgn7781amxggyis.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hh9rrgn7781amxggyis.png" alt="code" width="800" height="113"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Line Plot of Female Athletes over time&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzlajws6rvalt9gg0ems.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzlajws6rvalt9gg0ems.png" alt=" " width="800" height="424"&gt;&lt;/a&gt;&lt;br&gt;
Code:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiykpynizsknutthl0a0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiykpynizsknutthl0a0r.png" alt="code" width="800" height="111"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Bar-plot for Top 15 Countries and number of Gold Medals Won in the 2016 Olympics&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvf8rust93gt9lwc0hztt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvf8rust93gt9lwc0hztt.png" alt="top 15" width="620" height="437"&gt;&lt;/a&gt;&lt;br&gt;
Code:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e4yggzaytrcy7kz8ufo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e4yggzaytrcy7kz8ufo.png" alt=" " width="800" height="88"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It is crucial to keep in mind that EDA is an iterative process and that the steps used can change based on the  dataset and the objectives of the analysis. In addition, domain knowledge and context are important factors in understanding and drawing meaningful insights from the data.&lt;/p&gt;

</description>
      <category>software</category>
      <category>showcase</category>
    </item>
    <item>
      <title>SQL 101:Introduction to SQL for Data Analysis</title>
      <dc:creator>Njeri Gitome</dc:creator>
      <pubDate>Fri, 17 Feb 2023 09:41:07 +0000</pubDate>
      <link>https://forem.com/njerigitome/introduction-to-sql-for-data-analysis-4di0</link>
      <guid>https://forem.com/njerigitome/introduction-to-sql-for-data-analysis-4di0</guid>
      <description>&lt;p&gt;SQL is like a paintbrush for data analysis. It is an essential tool that every data analysts analyst should have in their arsenal.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is SQL?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;SQL(Structured Query Language) is a programming language used in relational databases to manage and manipulate data.&lt;/p&gt;

&lt;p&gt;It allows the user to interact with databases through a set of commands, or statements. It is used to create tables, insert, update and delete, as well as query the database to extract information.&lt;/p&gt;

&lt;p&gt;SQL has also become a widely used language in data analysis and data science, because it provides a powerful set of tools for manipulating and analyzing large datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How is SQL used for Data Analysis?&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Data Extraction
&lt;/h3&gt;

&lt;p&gt;One of the primary responsibilities of a Data Analyst is to extract and analyze data from various sources. SQL comes in handy to achieve this.&lt;/p&gt;

&lt;p&gt;Using SQL, data can be retrieved from one or more databases using SQL queries. This process is simply known as data extraction.&lt;/p&gt;

&lt;p&gt;Here are steps for data extraction in SQL:&lt;/p&gt;

&lt;p&gt;Log in to the MySQL Server using a user account&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgxmize2jj1uw8mod09l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgxmize2jj1uw8mod09l.png" alt="Image description" width="800" height="58"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Start by creating the database:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wmgtryrrup3s1j5sul2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wmgtryrrup3s1j5sul2.png" alt="Image description" width="800" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that select the newly created database in order to use it:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6q16mzfzg15irfd5uo5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6q16mzfzg15irfd5uo5.png" alt="Image description" width="800" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, &lt;strong&gt;create a table&lt;/strong&gt; within the &lt;strong&gt;Bookstore&lt;/strong&gt; database:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62eqzg3uwfdczluaqqi6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62eqzg3uwfdczluaqqi6.png" alt="Image description" width="800" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, &lt;strong&gt;input data&lt;/strong&gt; into the table:&lt;/p&gt;

&lt;p&gt;Example 1: Populate one record:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb2s69gh85up95dwzwd9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb2s69gh85up95dwzwd9.png" alt="Image description" width="800" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example 2: Populate multiple records:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfaml8smjr8x1tr4wddw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfaml8smjr8x1tr4wddw.png" alt="Image description" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In order to &lt;strong&gt;extract data&lt;/strong&gt; the &lt;strong&gt;SELECT&lt;/strong&gt; statement is used:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt7oxaycqscax5y8ebhs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt7oxaycqscax5y8ebhs.png" alt="Image description" width="800" height="71"&gt;&lt;/a&gt;&lt;br&gt;
Output:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjoe1e5ct5oyetub1rz7w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjoe1e5ct5oyetub1rz7w.png" alt="Image description" width="793" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In addition&lt;/strong&gt;&lt;br&gt;
In this case, our scenario is a Bookstore Database. More tables namely; Books, Stock and Categories were created.&lt;/p&gt;

&lt;p&gt;Below is an example of creating a table that has a relationship with another.&lt;/p&gt;

&lt;p&gt;Books Table:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto3vbuews0ykxnuwlqpf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto3vbuews0ykxnuwlqpf.png" alt="Image description" width="800" height="155"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stock Table:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2odoc874p07hhgkecyd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2odoc874p07hhgkecyd.png" alt="Image description" width="800" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a database entity relationship diagram that will be useful while analyzing the data:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbtfrux6t7ineah1h8gp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbtfrux6t7ineah1h8gp.png" alt="Image description" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Joins
&lt;/h3&gt;

&lt;p&gt;SQL joins are a powerful tool for data analysts because they allow them to combine data from multiple tables into a single result set. This is crucial because data is often spread across multiple tables, and combining this data is necessary to answer complex business questions.&lt;/p&gt;

&lt;p&gt;There are several joins that can be used in SQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;INNER JOIN&lt;/li&gt;
&lt;li&gt;LEFT JOIN&lt;/li&gt;
&lt;li&gt;RIGHT JOIN &lt;/li&gt;
&lt;li&gt;FULL OUTER JOIN&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below are Venn diagrams, that provide a visual illustration of SQL Joins.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8731rv6k9n6ykt5kmli1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8731rv6k9n6ykt5kmli1.png" alt="Image description" width="631" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As Data Analysts we will try to answer a business question. Using SQL joins we will obtain various results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business Question:&lt;br&gt;
Which book categories are available for purchase in our database?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Inner Join&lt;/strong&gt;&lt;br&gt;
It is also simply indicated as &lt;strong&gt;JOIN&lt;/strong&gt;. It returns only the rows that have matching values in both tables being joined.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec357qavdon1g33g52z2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec357qavdon1g33g52z2.png" alt="Image description" width="800" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q9cy69vgvuc6uyynn7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q9cy69vgvuc6uyynn7o.png" alt="Image description" width="731" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Left Join&lt;/strong&gt;&lt;br&gt;
It returns all rows from the left table (table1) as well as the matched rows from the right table (table2). If no match is found, NULL values are returned.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cnipk4vcbfu45aa1tb5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cnipk4vcbfu45aa1tb5.png" alt="Image description" width="800" height="159"&gt;&lt;/a&gt;&lt;br&gt;
Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn90d2oyk8wjwcruag9pb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn90d2oyk8wjwcruag9pb.png" alt="Image description" width="731" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Right Join&lt;/strong&gt;&lt;br&gt;
Returns all rows from the right table (table2), and the matched rows from the left table (table1). If there is no match, NULL values are returned.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjoewarhzdacfs1qa7fzj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjoewarhzdacfs1qa7fzj.png" alt="Image description" width="800" height="132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrvfuaye2datl0g23ig0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrvfuaye2datl0g23ig0.png" alt="Image description" width="656" height="327"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Full Outer Join&lt;/strong&gt;&lt;br&gt;
Returns all rows from both tables, and NULL values are returned for any unmatched rows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxn6yv9qtih3fk33m3b2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxn6yv9qtih3fk33m3b2.png" alt="Image description" width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmj9yhdjtbnkqcbqh9l8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmj9yhdjtbnkqcbqh9l8.png" alt="Image description" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Data Filtering
&lt;/h3&gt;

&lt;p&gt;This is an important technique for data analysis which allows selection of a subset of data based on a certain criteria. SQL utilizes the WHERE clause for filtering data.&lt;/p&gt;

&lt;p&gt;Case 1: Filtering by a single condition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business question:&lt;br&gt;
Which Nicholas Sparks novels are available for purchase?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl309tnh1o1b1mt0ladye.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl309tnh1o1b1mt0ladye.png" alt="Image description" width="800" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswf3wz160j2632ewt9hd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswf3wz160j2632ewt9hd.png" alt="Image description" width="626" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business question:&lt;br&gt;
Which bookshops are located in Kisumu county?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqr7f1azi5sjudj6lneu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqr7f1azi5sjudj6lneu.png" alt="Image description" width="794" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Result&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckqu5khzalxmzrdamk2u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckqu5khzalxmzrdamk2u.png" alt="Image description" width="798" height="118"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Case 2: Filtering by a multiple conditions.&lt;br&gt;
&lt;strong&gt;Business question:&lt;br&gt;
Is there a bookstore called Bookworms Haven in Kisumu County?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2y1wszxvmrhbmkehcsc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2y1wszxvmrhbmkehcsc.png" alt="Image description" width="800" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Result:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyeyi61l5ri95a6gwx023.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyeyi61l5ri95a6gwx023.png" alt="Image description" width="800" height="96"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Data Aggregation
&lt;/h3&gt;

&lt;p&gt;This is the process of summarizing and grouping data to obtain useful insights and metrics. It is a common task in data analysis and is often used to generate reports, perform statistical analysis, and identify trends.&lt;/p&gt;

&lt;p&gt;In SQL, this is achieved via aggregate function. Here are some common functions used for aggregation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SUM() Function&lt;/li&gt;
&lt;li&gt;AVG() Function&lt;/li&gt;
&lt;li&gt;MIN() Function&lt;/li&gt;
&lt;li&gt;MAX() Function&lt;/li&gt;
&lt;li&gt;COUNT Function&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  SUM() Function
&lt;/h4&gt;

&lt;p&gt;It returns the total sum of a numeric column.&lt;br&gt;
&lt;strong&gt;Business question:&lt;br&gt;
 How many book copies in stock?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5lb2u91dqba2rhufunk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5lb2u91dqba2rhufunk.png" alt="Image description" width="701" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sxrezbj7xa2mqkihfbu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sxrezbj7xa2mqkihfbu.png" alt="Image description" width="590" height="125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  AVG() Function
&lt;/h4&gt;

&lt;p&gt;It calculates the average of a column.&lt;br&gt;
&lt;strong&gt;Business question: &lt;br&gt;
What is the average number of books in stock ?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbyyn5gltoojw0adiezmm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbyyn5gltoojw0adiezmm.png" alt="Image description" width="800" height="101"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4l315kj57xh7dk0ns52u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4l315kj57xh7dk0ns52u.png" alt="Image description" width="750" height="119"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  MIN() Function
&lt;/h4&gt;

&lt;p&gt;It returns the lowest(smallest) value of a numerical column.&lt;br&gt;
&lt;strong&gt;Business question:&lt;br&gt;
What is the minimum number of book copies available in a bookstore?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpiv54ya323w39q94wjq4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpiv54ya323w39q94wjq4.png" alt="Image description" width="795" height="102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrxmuha6nv9msi0fqz8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrxmuha6nv9msi0fqz8t.png" alt="Image description" width="782" height="101"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  MAX() Function
&lt;/h4&gt;

&lt;p&gt;It returns the largest value of a numerical column.&lt;br&gt;
&lt;strong&gt;Business question:&lt;br&gt;
What is the maximum number of book copies available in a bookstore?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwz30xp98pigriwn8nww5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwz30xp98pigriwn8nww5.png" alt="Image description" width="758" height="88"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdggq65yipu60ciwhtvul.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdggq65yipu60ciwhtvul.png" alt="Image description" width="705" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  COUNT() Function
&lt;/h4&gt;

&lt;p&gt;It counts the number of rows in a table or the number of non-null values in a column.&lt;br&gt;
&lt;strong&gt;Business question:&lt;br&gt;
  What is the count of the stock entries?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8asp048exzsbm9rizo4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8asp048exzsbm9rizo4v.png" alt="Image description" width="800" height="84"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqnusomvcqfoa90gdby6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqnusomvcqfoa90gdby6.png" alt="Image description" width="689" height="97"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Data Transformation
&lt;/h3&gt;

&lt;p&gt;This is the process of converting data from one format or structure to another to make it more suitable for analysis. In SQL, data transformation can be achieved using various techniques such as filtering, aggregating, joining, and grouping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business question:&lt;br&gt;
How many different title book options are there in total for each book category??&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrrygy7gnb8rt6317s2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrrygy7gnb8rt6317s2t.png" alt="Image description" width="800" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Result:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffseojyek2su04cjh5cl6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffseojyek2su04cjh5cl6.png" alt="Image description" width="751" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Data Cleaning
&lt;/h3&gt;

&lt;p&gt;Data cleaning is an important step in preparing data for analysis in SQL. Here are some common techniques used for data cleaning in SQL:&lt;br&gt;
&lt;strong&gt;1.Removing duplicates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbks4257kfdqdoujl1la.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbks4257kfdqdoujl1la.png" alt="Image description" width="785" height="89"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This query will remove any duplicate rows from the table, and return only unique rows based on the columns specified in the SELECT statement.&lt;br&gt;
Result:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusr5wu1ite6ngpohndn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusr5wu1ite6ngpohndn4.png" alt="Image description" width="783" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.Handling missing values:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdw11h6a3fp5dd71gya5t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdw11h6a3fp5dd71gya5t.png" alt="Image description" width="736" height="106"&gt;&lt;/a&gt;&lt;br&gt;
This query will return only the rows that do not contain null values in the specified columns. One replace null values with a default value or with values from other rows or columns.&lt;br&gt;
Result:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lc3l877ogmbc3ntsac9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lc3l877ogmbc3ntsac9.png" alt="Image description" width="728" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In conclusion, SQL is a valuable tool for data analysts retrieving, transforming and analyzing large datasets stored in relational databases, making it a must-have tool for any data analyst.&lt;/p&gt;

</description>
      <category>gratitude</category>
    </item>
  </channel>
</rss>
