<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Kelvin</title>
    <description>The latest articles on Forem by Kelvin (@murimikelvin).</description>
    <link>https://forem.com/murimikelvin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3708665%2F61dd7f26-b8b7-4ac9-8608-089f7d6a64b2.jpg</url>
      <title>Forem: Kelvin</title>
      <link>https://forem.com/murimikelvin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/murimikelvin"/>
    <language>en</language>
    <item>
      <title>Machine learning Ensemble models.</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Fri, 01 May 2026 12:42:43 +0000</pubDate>
      <link>https://forem.com/murimikelvin/ensemble-models-69p</link>
      <guid>https://forem.com/murimikelvin/ensemble-models-69p</guid>
      <description>&lt;p&gt;Ensemble Learning in machine learning integrates multiple models called weak learners to create a single effective model for prediction. This technique is used to enhance accuracy, minimizing variance and removing overfitting. Here we will learn different ensemble techniques and their algorithms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhphb0ll79f79hi5zzy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhphb0ll79f79hi5zzy5.png" alt=" " width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Main types of ensemble models
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. Bagging - Bootstrap Aggregating.
&lt;/h2&gt;

&lt;p&gt;Bagging is a technique that involves creating multiple versions of a model and combining their outputs to improve overall performance.&lt;br&gt;
In bagging several base models are trained on different subsets of the training data, then aggregate their predictions to make the final decision. The subsets of the data are created using bootstrapping, a statistical technique where samples are drawn with replacement, meaning some data points can appear more than once in a subset.&lt;br&gt;
The final prediction from the ensemble is typically made by either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Averaging the predictions (for regression problems), or&lt;/li&gt;
&lt;li&gt;Majority voting (for classification problems).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Common Bagging Algorithms.
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Random Forest.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Random forest is an ensemble method based on decision trees. Multiple decision trees are trained using different bootstrapped samples of the data.&lt;/li&gt;
&lt;li&gt;In addition to bagging, Random Forest also introduces randomness by selecting a random subset of features at each node, further reducing variance and overfitting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Loading data from sklearn datasets.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#loading data
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_iris&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_iris&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;splitting data.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#split data
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Training the model on random forest.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#Training the random forest
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;
&lt;span class="n"&gt;rf_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;#number of trees
&lt;/span&gt;    &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rf_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Predict.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# predict
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classification_report&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Train accuracy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rf_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test accuracy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rf_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Decision Trees.
&lt;/h3&gt;

&lt;p&gt;In Bagged Decision Trees, multiple decision trees are trained using bootstrapped samples of the data.&lt;br&gt;
Each tree is trained independently and the final prediction is made by averaging the predictions of all the trees in the ensemble.&lt;br&gt;
&lt;strong&gt;Splitting data.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Training model.&lt;/strong&gt;&lt;br&gt;
unconstrained tree (will overfit on the data)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plot_tree&lt;/span&gt;

&lt;span class="n"&gt;tree_big&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree_big&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;constrained tree (better generalization)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tree_small&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree_small&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Predictions&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Constrained tree results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Train accuracy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tree_small&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test accuracy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tree_small&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Visualization&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Vizualization
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;plot_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree_small&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rounded&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;decision tree(depth = 3)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fep7xuin30kp83k57y2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fep7xuin30kp83k57y2.png" alt=" " width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Boosting.
&lt;/h2&gt;

&lt;p&gt;Boosting is an ensemble technique where multiple models are trained sequentially, with each new model attempting to correct the errors made by the previous ones.&lt;br&gt;
Boosting focuses on adjusting the weights of incorrectly classified data points, so the next model pays more attention to those difficult cases. By combining the outputs of these models, boosting typically improves the accuracy of the final prediction.&lt;/p&gt;
&lt;h2&gt;
  
  
  Common Boosting Algorithms.
&lt;/h2&gt;
&lt;h2&gt;
  
  
  1. AdaBoost - Adaptive Boosting.
&lt;/h2&gt;

&lt;p&gt;AdaBoost works by adjusting the weights of misclassified instances and combining the predictions of weak learners (usually decision trees). Each subsequent model is trained to correct the mistakes of the previous model.&lt;br&gt;
AdaBoost can significantly improve the performance of weak models, especially when used for classification problems.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Gradient Boosting.
&lt;/h2&gt;

&lt;p&gt;Gradient Boosting is a more general approach to boosting that builds models sequentially, with each new model fitting the residual errors of the previous model.&lt;br&gt;
The models are trained to minimize a loss function, which can be customized based on the specific task.&lt;br&gt;
We can perform regression and classification tasks using Gradient Boosting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;loading data.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#loading data
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_iris&lt;/span&gt;
&lt;span class="n"&gt;iris&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_iris&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;splitting data.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#split data
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iris&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iris&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gradient boosting training.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#Gradient boosting training
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GradientBoostingClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;

&lt;span class="n"&gt;gb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GradientBoostingClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;n_estimators&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# how much each tree contributes
&lt;/span&gt;    &lt;span class="n"&gt;max_depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# shallow trees works best for boosting 
&lt;/span&gt;    &lt;span class="n"&gt;random_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;gb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gradient boosting accuracy score:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. XGBoost - Extreme Gradient Boosting.
&lt;/h2&gt;

&lt;p&gt;XGBoost is an optimized version of gradient boosting. It includes regularization to prevent overfitting and supports parallelization to speed up training.&lt;br&gt;
XGBoost has become a popular choice in machine learning competitions due to its high performance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#create model XGBClassifier. eXtreme Gradient Boosting Classifier.
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;xgboost&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;XGBClassifier&lt;/span&gt;
&lt;span class="n"&gt;xgb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;n_estimators&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mlogloss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;random_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbosity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#model training
&lt;/span&gt;&lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XGBoost accuracy score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Stacking.
&lt;/h2&gt;

&lt;p&gt;Stacking combines multiple models of different types, where each model makes independent predictions and a meta-model is trained to combine these predictions. Instead of simply averaging or voting, as in bagging and boosting, stacking trains a higher-level model (meta-model) to learn how to best combine the predictions of the base models.&lt;br&gt;
In stacking, the base models are trained on the original data and their predictions are then used as features for the meta-model, which learns how to combine them effectively. The final prediction is made by the meta-model based on the combined outputs of all the base models.&lt;br&gt;
&lt;strong&gt;Meta-model is a model that learns how to combine the predictions of the base models to generate the final output.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Stacking Algorithms.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Generalized Stacking.
&lt;/h3&gt;

&lt;p&gt;In Generalized Stacking, multiple different models (e.g., decision trees, logistic regression, neural networks) are trained on the same dataset.&lt;br&gt;
And a meta-model (such as a logistic regression or another decision tree) is trained on the predictions made by th andese base models.&lt;br&gt;
The meta-model learns how to combine the predictions of the base models to make the final prediction.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Stacking with Cross-Validation.
&lt;/h2&gt;

&lt;p&gt;In stacking with cross-validation, the base models are trained using cross-validation and their predictions on the validation set are used to train the meta-model.&lt;br&gt;
This prevents overfitting and ensures that the meta-model is trained on unbiased data.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Multi-Layer Stacking
&lt;/h2&gt;

&lt;p&gt;Multi-layer stacking involves multiple levels of base models, where the outputs of the first level of models are fed into a second level of base models and so on, before reaching the meta-model.&lt;br&gt;
This approach creates a more complex ensemble that can capture a wider variety of patterns in the data.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>algorithms</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Dimensionality Reduction in Machine Learning: PCA and t-SNE.</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Fri, 01 May 2026 10:34:42 +0000</pubDate>
      <link>https://forem.com/murimikelvin/dimensionality-reduction-in-machine-learning-pca-and-t-sne-1n40</link>
      <guid>https://forem.com/murimikelvin/dimensionality-reduction-in-machine-learning-pca-and-t-sne-1n40</guid>
      <description>&lt;p&gt;Dimensionality reduction is a fundamental concept in machine learning used to reduce the number of input features (dimensions) in a dataset while preserving as much important information as possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Principal Component Analysis (PCA)&lt;/strong&gt;&lt;br&gt;
Principal Component Analysis (PCA) is a linear dimensionality reduction technique that transforms data into a new coordinate system.&lt;/p&gt;

&lt;p&gt;Instead of using the original features, PCA creates new variables called principal components, which are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linear combinations of the original features&lt;/li&gt;
&lt;li&gt;Ordered by importance (variance explained)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PCA works by identifying directions (called principal axes) where the data varies the most.&lt;/p&gt;

&lt;p&gt;The first principal component captures the maximum variance while the second principal component captures the next highest variance.&lt;/p&gt;

&lt;p&gt;This allows us to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep only the most informative components&lt;/li&gt;
&lt;li&gt;Discard less important ones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How PCA Works&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Standardize the data - Features must be scaled (very important for PCA)&lt;/li&gt;
&lt;li&gt;Compute the covariance matrix - Shows relationships between features.&lt;/li&gt;
&lt;li&gt;Compute eigenvalues and eigenvectors&lt;/li&gt;
&lt;li&gt;Eigenvectors - directions (principal components)&lt;/li&gt;
&lt;li&gt;Eigenvalues - importance (variance explained)&lt;/li&gt;
&lt;li&gt;Sort components by eigenvalues - Highest variance first.&lt;/li&gt;
&lt;li&gt;Select top K components - Reduce dimensions.&lt;/li&gt;
&lt;li&gt;Transform the data - Project data onto new axes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Splitting data.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#splitting data
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;scaler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Scaling data.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#scaling
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;
&lt;span class="n"&gt;X_scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#PCA Principal Components
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.decomposition&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PCA&lt;/span&gt;

&lt;span class="n"&gt;pca_full&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PCA&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pca_full&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This line of code is about understanding how much information the PCA model is capturing as you add more components.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="n"&gt;cumvar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cumsum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pca_full&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explained_variance_ratio_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Plotting a cumvar on 95% threshold.&lt;/strong&gt;&lt;br&gt;
You look for the point where the curve starts flattening (diminishing returns).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cumvar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axhline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;95% threshold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axhline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;orange&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;99% threshold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;number of components&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cumulative explained variance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;how many components to explain 95% of the variance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F41egur1w564lybmf6yet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F41egur1w564lybmf6yet.png" alt=" " width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# fit pca on training only
&lt;/span&gt;&lt;span class="n"&gt;pca_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PCA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_components&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_train_r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pca_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_test_r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pca_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Visualization&lt;/strong&gt;&lt;br&gt;
using the trained PCA model dimensions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;scatter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_2d&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;X_2d&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tab10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;colorbar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;digit class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;64-dimensional digit data projected to 2d via PCA&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PC1(highest variance direction)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pc2(second highest variance direction)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznauscj81z5hut5jxehf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznauscj81z5hut5jxehf.png" alt=" " width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Distributed Stochastic Neighbor Embedding (t-SNE)
&lt;/h1&gt;

&lt;p&gt;t-SNE is a non-linear dimensionality reduction technique specifically designed for visualizing high-dimensional data in 2D or 3D spaces. It works by modelling focuses pairwise similarities between data points in the high-dimensional space and optimizing their representation in a lower-dimensional space to preserve these similarities.&lt;/p&gt;

&lt;p&gt;Unlike PCA t-SNE focuses on maintaining local relationships by minimizing the Kullback–Leibler divergence (KL divergence) between the high-dimensional and low-dimensional distributions of data points making it highly effective to find clusters and patterns in complex datasets.&lt;br&gt;
However it takes a lot of time to run the results and it doesn't work well with very large datasets.&lt;br&gt;
t-SNE is primarily used for exploratory data analysis and visualization rather than feature reduction or pre processing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Importing load digits from sklearn datasets.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_digits&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_digits&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Scaling data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scaling data
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;
&lt;span class="n"&gt;X_scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Generating 1000 figures randomly for easy training from the scaled data.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="n"&gt;ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_sub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_sub&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Running t-sne.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.manifold&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TSNE&lt;/span&gt;
&lt;span class="n"&gt;tsne&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TSNE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_components&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;perplexity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_tsne&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tsne&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_sub&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Visualization.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#plotting
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;scatter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_tsne&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;X_tsne&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_sub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tab10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cbar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;colorbar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cbar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;cbar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ticklabels&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;cbar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Digit class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2t2rb8s5keg4y1gwozo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2t2rb8s5keg4y1gwozo.png" alt=" " width="800" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>algorithms</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Unsupervised Machine Learning. K-Means &amp; Hierarchical Clustering</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Thu, 30 Apr 2026 11:40:27 +0000</pubDate>
      <link>https://forem.com/murimikelvin/unsupervised-machine-learning-k-means-hierarchical-clustering-3eg6</link>
      <guid>https://forem.com/murimikelvin/unsupervised-machine-learning-k-means-hierarchical-clustering-3eg6</guid>
      <description>&lt;p&gt;Unsupervised machine learning is a branch of machine learning where models are trained on data without labelled outcomes. Unlike supervised learning, where the goal is to predict a known target, unsupervised learning focuses on discovering hidden patterns, structures, or relationships within the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common tasks in unsupervised learning include&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clustering (grouping similar data points)&lt;/li&gt;
&lt;li&gt;Dimensionality reduction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Clustering&lt;/strong&gt; is the process of grouping data points such that points within the same cluster are similar and points in different clusters are dissimilar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Similarity is usually measured using distance metrics like&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Euclidean distance (most common)&lt;/li&gt;
&lt;li&gt;Manhattan distance&lt;/li&gt;
&lt;li&gt;Cosine similarity&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  K-Means Clustering.
&lt;/h1&gt;

&lt;p&gt;K-Means is a partition-based clustering algorithm that divides data into K distinct clusters, where K is predefined. The goal is to minimize the within-cluster variance, also called inertia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How K-Means Works&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Choose K (number of clusters) - Example: K = 3&lt;/li&gt;
&lt;li&gt;Initialize centroids randomly - These are K points representing cluster centers.&lt;/li&gt;
&lt;li&gt;Assign data points to nearest centroid - Each point is assigned to the cluster with the closest centroid (using distance, usually Euclidean).&lt;/li&gt;
&lt;li&gt;Update centroids - Compute the new centroid as the mean of all points in that cluster.
iterate steps 3 and 4 until Centroids stop changing, or maximum iterations is reached.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  K-Means Workflow.
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scaling data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#scaling data
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;
&lt;span class="n"&gt;scaler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;X_scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Finding the best K&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# elbow method - finding the best number of clusters (K)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.cluster&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KMeans&lt;/span&gt;
&lt;span class="n"&gt;inertias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;km&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KMeans &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_clusters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;km&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;inertias&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;km&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inertia_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Plotting the elbow curve.&lt;/strong&gt; This helps identify the best K - where the curve starts to plateau.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#Plotting the elbow curve
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;inertias&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;o--&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markersize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Number of clusters K&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Inertia (Within-Cluster Sum of Squares)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Elbow method - Finding the optimal K&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Elbow curve.&lt;/strong&gt; (4 is our best k)&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplruo7uqibrgm5auzftu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplruo7uqibrgm5auzftu.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training the model&lt;/strong&gt; &lt;br&gt;
Fits our model on 4 clusters then creates a new column named 'cluster'.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;km_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;KMeans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_clusters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;km_final&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;customers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo42t0crb6ii6qrsmjsw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo42t0crb6ii6qrsmjsw.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Profiling clusters.&lt;/strong&gt;&lt;br&gt;
This code is all about understanding what each cluster actually represents after you’ve created them with K-Means.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#profile for each cluster
&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;annual_spend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;visit_frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_basket_size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; Cluster profiles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fambspsce6iounl99k2i0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fambspsce6iounl99k2i0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visualization of the clusters and the centroids.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;colours&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#3498db&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#2ecc71&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#e74c3c&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#12f3a4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mid-value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;VIP customers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Low spenders&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Occasional&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;annual_spend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;visit_frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;colours&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;centroids_orig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inverse_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;km_final&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_centers_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;centroids_orig&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
    &lt;span class="n"&gt;centroids_orig&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;X&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;zorder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Centroids&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Annual Spend (KES)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Visit Frequency (per month)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Customer Segments - K-Means (K=4)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9w65h08v9uedye5ypwnx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9w65h08v9uedye5ypwnx.png" alt=" "&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Advantages of K-Means&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple and fast&lt;/li&gt;
&lt;li&gt;Works well on large datasets&lt;/li&gt;
&lt;li&gt;Easy to interpret&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations of K-Means&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Must specify K in advance&lt;/li&gt;
&lt;li&gt;Sensitive to - Initial centroid placement &amp;amp; Outliers&lt;/li&gt;
&lt;li&gt;Assumes clusters are spherical and equally sized&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Hierarchical Clustering&lt;/strong&gt;.
&lt;/h1&gt;

&lt;p&gt;Hierarchical clustering builds a tree-like structure of clusters, called a dendrogram. Unlike K-Means, it does not require specifying the number of clusters upfront.&lt;/p&gt;

&lt;p&gt;There are two types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agglomerative (bottom-up) – most common&lt;/li&gt;
&lt;li&gt;Divisive (top-down)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Agglomerative clustering&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;steps&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with all points separate:
Treat each data point as its own cluster like A, B, C, ...
Initially, you have n clusters for n data points.&lt;/li&gt;
&lt;li&gt;Compute pairwise distances:
Calculate the distance between every pair of clusters.
Common choices include Euclidean, Manhattan or Cosine distance.
Store these values in a distance matrix.
To know more about them refer to: Measures of Distance&lt;/li&gt;
&lt;li&gt;Merge the nearest clusters:
Identify the two clusters that are closest based on the chosen linkage method such as single, complete, average or Ward linkage.
Combine them into a single new cluster.&lt;/li&gt;
&lt;li&gt;Update distances:
Recalculate the distances between the newly formed cluster and all remaining clusters.
Use the same linkage rule to ensure consistency.&lt;/li&gt;
&lt;li&gt;Repeat the process:
Continue merging clusters and updating distances iteratively.
Stop when you reach a predefined number of clusters (k) or a distance threshold.&lt;/li&gt;
&lt;li&gt;Visualize the results:
Create a dendrogram to visualize how clusters merged at each step.
Choose a suitable cut on the dendrogram to obtain the final cluster groups.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Linkage methods&lt;/strong&gt; &lt;br&gt;
How we measure the distance between clusters.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Single Linkage: Minimum distance between points&lt;/li&gt;
&lt;li&gt;Complete Linkage: Maximum distance&lt;/li&gt;
&lt;li&gt;Average Linkage: Average distance&lt;/li&gt;
&lt;li&gt;Ward’s Method: Minimizes variance (most common)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Dendrogram&lt;/strong&gt;&lt;br&gt;
A dendrogram is a tree diagram that shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How clusters are merged&lt;/li&gt;
&lt;li&gt;At what distance they are merged
You can “cut” the dendrogram at a certain height to decide the number of clusters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hierarchical model workflow.&lt;/strong&gt;&lt;br&gt;
Picking a dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Picking a subset of 60 customers for readability
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="n"&gt;subset_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_sub&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;subset_ids&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Linkage&lt;/strong&gt; (Ward) to minimize variance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Linkage method
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.cluster.hierarchy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;linkage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dendrogram&lt;/span&gt;

&lt;span class="n"&gt;Z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;linkage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_sub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ward&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Plotting the dendrogram&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# plot dendogram
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;dendrogram &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;truncate_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axhline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cut here for 3 clusters&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8uzs715x37we6iyqz2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8uzs715x37we6iyqz2p.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fitting the model.&lt;/strong&gt;&lt;br&gt;
Fits our model on 4 clusters then creates a new column named 'hc-cluster'.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.cluster&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgglomerativeClustering&lt;/span&gt;
&lt;span class="n"&gt;hc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgglomerativeClustering&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_clusters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linkage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ward&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hc_cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_scaled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;customers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzatc5ht6h416gtecj7m0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzatc5ht6h416gtecj7m0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Profiling.&lt;/strong&gt;&lt;br&gt;
This step helps understand what each cluster represents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Hierachical cluster profiles&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hc_cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;annual_spend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;visit_frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;mean &lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglefw9nie32xgd0ylv29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglefw9nie32xgd0ylv29.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visualization.&lt;/strong&gt;&lt;br&gt;
A comparison between the two models. K-Means &amp;amp; Hierarchical clustering.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hc_cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;K-means cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Hierarchical clusters&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;
        &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;annual_spend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;visit_frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cluster &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60cs40oyhhawgkp3pknn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60cs40oyhhawgkp3pknn.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages of Hierarchical Clustering&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No need to predefine number of clusters&lt;/li&gt;
&lt;li&gt;Produces interpretable tree structure&lt;/li&gt;
&lt;li&gt;Works well for small datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations of Hierarchical Clustering&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Computationally expensive (slow for large datasets)&lt;/li&gt;
&lt;li&gt;Once clusters are merged, they cannot be undone&lt;/li&gt;
&lt;li&gt;Sensitive to noise and outliers&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>algorithms</category>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Machine learning - Supervised learning: Regression &amp; Classification.</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Mon, 20 Apr 2026 13:33:30 +0000</pubDate>
      <link>https://forem.com/murimikelvin/machine-learning-jh9</link>
      <guid>https://forem.com/murimikelvin/machine-learning-jh9</guid>
      <description>&lt;p&gt;Machine learning is a branch of Artificial Intelligence that focuses on building systems that can learn from data and improve their performance over time without being explicitly programmed.&lt;/p&gt;

&lt;h1&gt;
  
  
  TYPES OF MACHINE LEARNING.
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Supervised learning.&lt;/li&gt;
&lt;li&gt;Unsupervised learning.&lt;/li&gt;
&lt;li&gt;Reinforcement learning.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Supervised learning.
&lt;/h1&gt;

&lt;p&gt;Supervised learning is a type of machine learning where a model learns from labelled data.&lt;/p&gt;

&lt;h1&gt;
  
  
  Unsupervised learning.
&lt;/h1&gt;

&lt;p&gt;Unsupervised learning is a type of machine learning where the model works with unlabelled data.&lt;br&gt;
Instead of predicting a known output, the model tries to discover hidden patterns, structures, or relationships in the data on its own.&lt;/p&gt;
&lt;h1&gt;
  
  
  Reinforcement learning.
&lt;/h1&gt;

&lt;p&gt;Reinforcement learning is a type of machine learning where a model learns by interacting with an environment and receiving rewards or penalties based on its actions.&lt;br&gt;
Instead of learning from labelled data or finding patterns, it learns through trial and error.&lt;/p&gt;
&lt;h2&gt;
  
  
  Supervised learning.
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Regression - is a type of supervised learning used to predict continuous numerical values.&lt;/li&gt;
&lt;li&gt;Classification - Classification is a type of supervised learning where the goal is to predict categories (classes) instead of numbers.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Regression.
&lt;/h2&gt;
&lt;h3&gt;
  
  
  How models learn in regression.
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;The model starts with random values for y = mx + c
### Example: Predicting house prices. 
&lt;code&gt;y = mx + c&lt;/code&gt;
&lt;code&gt;price(y) = 100(m) * size(x) + 5000(c)&lt;/code&gt;
5000 is the base price if x = 0&lt;/li&gt;
&lt;li&gt;The model compares predictions with actual prices using loss function.
&lt;code&gt;error = (prediction price - actual price)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The model adjusts the formula to reduce the error using gradient descent.
&lt;code&gt;price = 150 * size + 30000&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The model repeats until error is minimized.
&lt;code&gt;price = 200 * size + 10000&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Supervised learning workflow.
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Collect data.&lt;/li&gt;
&lt;li&gt;Prepare data - Cleaning, pre-processing, visualization.&lt;/li&gt;
&lt;li&gt;Split data into train and test sets.&lt;/li&gt;
&lt;li&gt;Choose the algorithm to use.&lt;/li&gt;
&lt;li&gt;Training the model - learning patterns from training data.&lt;/li&gt;
&lt;li&gt;Make predictions on the test data.&lt;/li&gt;
&lt;li&gt;Evaluate the model's performance.&lt;/li&gt;
&lt;li&gt;Tune and improve the model.&lt;/li&gt;
&lt;li&gt;Deploy - use on new data.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Linear regression workflow.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Collect data&lt;/strong&gt; &lt;br&gt;
Collecting data involves gathering relevant CSV's, databases, Excel, APIs etc&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kenya_Crops_Dataset.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prepare data&lt;/strong&gt;&lt;br&gt;
Data pre-processing involves cleaning, handling missing values, removing duplicates etc&lt;/p&gt;
&lt;h3&gt;
  
  
  examples.
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;df.info()&lt;/code&gt; - gets a summary of the dataset structure in pandas.&lt;br&gt;
&lt;code&gt;df.drop(columns = ['Cabin'], inplace = True)&lt;/code&gt; - dropping columns with lots of missing values.&lt;br&gt;
&lt;code&gt;df['Age'] = df['Age'].fillna(df['Age'].median())&lt;br&gt;
df.info()&lt;/code&gt; - filling missing values with median. Used median in case of outliers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visualization&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# visualization
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="n"&gt;annot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coolwarm&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;center&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Feature correlation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  example heatmap.
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9haoc41ok4cddx2hfq7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9haoc41ok4cddx2hfq7.png" alt=" " width="800" height="595"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Split data into train and test sets&lt;/strong&gt;.&lt;br&gt;
Splitting data helps avoid overfitting of the models i.e. models performing very well on training data but poorly on new data. It also ensures the model doesn't memorize data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#separate features(X) from label(y)
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrooms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bathrooms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;#split data
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Training the model&lt;/strong&gt;&lt;br&gt;
 Training involves adjusting model parameters to minimize error using training data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#training model
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intercept_&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Make predictions on the test data&lt;/strong&gt;.&lt;br&gt;
After training, the model uses learned patterns to predict outcomes on new data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# evaluation of the model
&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Evaluate the model's performance&lt;/strong&gt;.&lt;br&gt;
Model evaluation involves measuring how well predictions match real values using test data&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r2_score&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;root_mean_squared_error&lt;/span&gt;
&lt;span class="n"&gt;mse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rmse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;root_mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;r2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;r2_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;the rmse is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rmse&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;the r2 is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Regularization.
&lt;/h2&gt;

&lt;p&gt;Regularization is a technique used to reduce overfitting by preventing the model from becoming too complex.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ridge (l2) and Lasso (L1)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Ridge
&lt;/h3&gt;

&lt;p&gt;Adds squared penalty on coefficients. Shrinks coefficients but doesn’t make them zero. Keeps all features but reduces magnitude of weights&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#Ridge l2 regularization
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Ridge&lt;/span&gt;
&lt;span class="c1"&gt;# create a rigde model
&lt;/span&gt;&lt;span class="n"&gt;ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# train linear model with reguralization
&lt;/span&gt;&lt;span class="n"&gt;ridge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#make prediction
&lt;/span&gt;&lt;span class="n"&gt;y_pred_ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ridge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#Evaluation
&lt;/span&gt;&lt;span class="n"&gt;r2_ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;r2_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_ridge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r2_ridge= &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r2_ridge&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;R2 linear = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Difference : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r2_ridge&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Lasso
&lt;/h3&gt;

&lt;p&gt;Adds absolute value penalty. Can reduce some coefficients to zero. it removes useless features.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# create a lasso model
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Lasso&lt;/span&gt;
&lt;span class="n"&gt;lasso&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Lasso&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# train linear model with reguralization
&lt;/span&gt;&lt;span class="n"&gt;lasso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#make predictions
&lt;/span&gt;&lt;span class="n"&gt;y_pred_lasso&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lasso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#Evaluation
&lt;/span&gt;&lt;span class="n"&gt;r2_lasso&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;r2_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_ridge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r2_lasso&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Difference : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r2_lasso&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Classification.
&lt;/h1&gt;

&lt;p&gt;Classification is a supervised machine learning technique used to predict labels or categories from input data. It assigns each data point to a predefined class based on learned patterns.&lt;/p&gt;

&lt;p&gt;Predict categories: Determines the class of new data points.&lt;br&gt;
Uses labeled data: Trained on datasets where the correct class is known.&lt;br&gt;
Common examples: Spam vs non spam emails, diseased vs. healthy patients.&lt;/p&gt;
&lt;h2&gt;
  
  
  Types of Classification.
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Binary Classification.
&lt;/h3&gt;

&lt;p&gt;Binary classification is the simplest type of classification where data is divided into two possible categories. The model analyzes input features and decides which of the two classes the data belongs to.&lt;br&gt;
&lt;strong&gt;Some important aspects of binary classification are:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two classes only: Each data point is assigned to one of two categories.&lt;/li&gt;
&lt;li&gt;Common examples: Spam vs not spam emails, diseased vs healthy patients.&lt;/li&gt;
&lt;li&gt;Decision based on features: The model uses input features to determine the correct class.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. Multiclass Classification.
&lt;/h3&gt;

&lt;p&gt;Multiclass classification is used when data needs to be divided into more than two categories. The model analyzes the input features and selects the class that best matches the data. &lt;br&gt;
&lt;strong&gt;Some important aspects of multiclass classification are:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple classes: Each data point is assigned to one of several possible categories.&lt;/li&gt;
&lt;li&gt;Single final prediction: The model selects only one class for each input.&lt;/li&gt;
&lt;li&gt;Common examples: Image classification such as identifying animals like cat, dog or bird.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. Multi Label Classification.
&lt;/h3&gt;

&lt;p&gt;Multi label classification allows a single piece of data to belong to multiple categories at the same time. Unlike multiclass classification, where each data point is assigned only one class, this approach allows the model to assign multiple labels to the same input. &lt;br&gt;
&lt;strong&gt;Key aspects include:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple labels per data point: One input can belong to more than one category.&lt;/li&gt;
&lt;li&gt;Labels can overlap: Classes are not mutually exclusive.&lt;/li&gt;
&lt;li&gt;Common example: A movie recommendation system may tag a movie as both action and comedy based on features like plot, actors or genre tags.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Logistic regression workflow.
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Hands on - Breast cancer diagnosis
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_breast_cancer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; 
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;split data&lt;/strong&gt;&lt;br&gt;
Splitting data helps avoid overfitting of the models i.e. models performing very well on training data but poorly on new data. It also ensures the model doesn't memorize data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# split data into train and test datasets.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;scaling data&lt;/strong&gt;&lt;br&gt;
Feature scaling is the process of putting all numerical features on a similar scale so that no feature dominates others just because of its size.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Scale features.
# Logistic regression is sensitive to feature scale
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;

&lt;span class="n"&gt;scale&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#Learn scale from training data only.
&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Training the model.&lt;/strong&gt;&lt;br&gt;
 Training involves adjusting model parameters to minimize error using training data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Train model.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;making&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Predict.&lt;/strong&gt;&lt;br&gt;
After training, the model uses learned patterns to predict outcomes on new data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# predict
&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sample predictions: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sample probabilities: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;y_prob&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Evaluating the model&lt;/strong&gt;&lt;br&gt;
Model evaluation involves measuring how well predictions match real values using test data&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Evaluate
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classification_report&lt;/span&gt;

&lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;classificationR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classification_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_names&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accuracy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;classificationR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabxo1g31e98ti41gzvgb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabxo1g31e98ti41gzvgb.png" alt=" " width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Display the confusion matrix&lt;/strong&gt;&lt;br&gt;
The confusion matrix is used to evaluate classification models by showing correct and incorrect predictions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConfusionMatrixDisplay&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;ConfusionMatrixDisplay&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_estimator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display_labels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Blues&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;span class="c1"&gt;# TP FN
# FP TN
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example of confusion matrix.
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbeigyaqyyv5hhec8mjwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbeigyaqyyv5hhec8mjwb.png" alt=" " width="800" height="629"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>computerscience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Advanced SQL for Data Analytics:</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Tue, 17 Mar 2026 15:20:06 +0000</pubDate>
      <link>https://forem.com/murimikelvin/advanced-sql-for-data-analytics-advanced-techniques-every-data-analyst-should-know-1leb</link>
      <guid>https://forem.com/murimikelvin/advanced-sql-for-data-analytics-advanced-techniques-every-data-analyst-should-know-1leb</guid>
      <description>&lt;h2&gt;
  
  
  WINDOW FUNCTIONS IN SQL.
&lt;/h2&gt;

&lt;p&gt;Window functions perform calculations across a set of rows related to the current row without grouping the rows into a single output. They preserve the individual row details while providing additional contextual insights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key components.&lt;/strong&gt;&lt;br&gt;
Window functions are defined using the mandatory OVER() clause which specifies how the rows are partitioned and ordered for the calculation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common types of window functions.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;1: RANKING&lt;/strong&gt; - Row_number, rank, dense_rank - It assigns numbers or ranks to rows based on order.&lt;br&gt;
&lt;strong&gt;Rank&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2d9ulhdjtdkjeq7u2hh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2d9ulhdjtdkjeq7u2hh.png" alt=" " width="800" height="179"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Row number&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtosm2mn3bchyvbergpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtosm2mn3bchyvbergpe.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;2: AGGREGATE&lt;/strong&gt; - sum, avg, min, max, count - calculates sum, averages or extremes across the window.&lt;br&gt;
&lt;strong&gt;Total&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F778973hfmmhr2xxny677.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F778973hfmmhr2xxny677.png" alt=" " width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faaa1bisp59tzf4kvv3sz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faaa1bisp59tzf4kvv3sz.png" alt=" " width="800" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Count&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibxzt9w0iz7v9nus8vl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibxzt9w0iz7v9nus8vl.png" alt=" " width="800" height="152"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Max&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl30z85z4b781x5j3gvq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl30z85z4b781x5j3gvq8.png" alt=" " width="800" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3: VALUE/OFFSET&lt;/strong&gt; - lag, lead - Accesses data from rows before, after or at specific points in the window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;lag&lt;/strong&gt;&lt;br&gt;
Used to compare rows i.e. comparing current value to previous value&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f3b4k0nfzmwkesqxsiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f3b4k0nfzmwkesqxsiu.png" alt=" " width="800" height="169"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;lead&lt;/strong&gt;&lt;br&gt;
looks forward to the next row or a specified number of rows ahead.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9twpy9wpx3ye5fpgpat2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9twpy9wpx3ye5fpgpat2.png" alt=" " width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Subqueries
&lt;/h2&gt;

&lt;p&gt;A subquery (or nested query) is a SQL query written inside another query. It allows you to &lt;br&gt;
perform an operation that depends on the result of another query. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The SELECT clause - Subqueries in the SELECT clause add an extra computed column to the result set. Each subquery 
executes once for every row of the outer query.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;  
&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt;  
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_projects&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;The FROM clause - A subquery in the FROM clause creates a temporary or derived table. The outer query can then 
select or filter data from it.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt;  
&lt;span class="n"&gt;department_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_employees&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dept_summary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;The WHERE or HAVING clause - Subqueries in the WHERE clause allow the outer query to filter results based on another query’s output.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;  
&lt;span class="n"&gt;department_name&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Correlated vs Non-Correlated Subqueries
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Non-correlated subquery - A non-correlated subquery does not depend on the outer query. It can be executed on its own.&lt;/li&gt;
&lt;li&gt;Correlated subquery - A correlated subquery depends on the outer query. It is evaluated once per row of the outer query.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Common Table Expressions (CTEs)
&lt;/h2&gt;

&lt;p&gt;A CTE (Common Table Expression) is like creating a temporary result set (or virtual table) &lt;br&gt;
that exists only during the execution of a single SQL query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;high_salary&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;45000&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;high_salary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CTE with Joins
&lt;/h2&gt;

&lt;p&gt;You can use CTEs to simplify joins&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;emp_dept&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; 
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;emp_dept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stored Procedures
&lt;/h2&gt;

&lt;p&gt;A stored procedure is a named block of SQL logic stored inside the database that you can &lt;br&gt;
execute using: &lt;code&gt;CALL procedure_name(parameters);&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Procedures are for actions, such as:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Insert data (add customer, add book, add order) &lt;/li&gt;
&lt;li&gt;Update data (update contact, update email) &lt;/li&gt;
&lt;li&gt;Delete data (delete customer safely) &lt;/li&gt;
&lt;li&gt;Validation and enforcement of business rules &lt;/li&gt;
&lt;li&gt;Multi-step operations (create order + check customer + log)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Procedure Syntax
&lt;/h3&gt;

&lt;p&gt;CREATE OR REPLACE PROCEDURE procedure_name(param_name param_type, ...) &lt;br&gt;
LANGUAGE plpgsql &lt;br&gt;
AS $$ &lt;br&gt;
BEGIN -- SQL statements &lt;br&gt;
END; &lt;br&gt;
$$; &lt;/p&gt;
&lt;h3&gt;
  
  
  Explanation of each keyword
&lt;/h3&gt;

&lt;p&gt;CREATE - creates the procedure &lt;br&gt;
OR REPLACE - updates the procedure if it already exists (no need to drop first) &lt;br&gt;
PROCEDURE procedure_name(...) - name + inputs (parameters) &lt;br&gt;
LANGUAGE plpgsql - enables procedural features (IF, LOOP, variables) &lt;br&gt;
AS $$ ... $$ - defines the body of the procedure &lt;br&gt;
BEGIN ... END; - start/end of procedure logic &lt;br&gt;
Call with: CALL procedure_name(...)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plsql"&gt;&lt;code&gt;&lt;span class="kr"&gt;CREATE&lt;/span&gt; &lt;span class="kr"&gt;OR&lt;/span&gt; &lt;span class="kr"&gt;REPLACE&lt;/span&gt; &lt;span class="kr"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;add_customer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="n"&gt;p_first_name&lt;/span&gt; &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_last_name&lt;/span&gt;  &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_email&lt;/span&gt;      &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_contact&lt;/span&gt;    &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_city&lt;/span&gt;       &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt; 
&lt;span class="kr"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt; 
&lt;span class="k"&gt;BEGIN&lt;/span&gt; 
&lt;span class="kr"&gt;INSERT&lt;/span&gt; &lt;span class="kr"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="kr"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_contact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_city&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explanation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CREATE OR REPLACE PROCEDURE add_customer(...) - Creates a reusable procedure named add_customer. &lt;/li&gt;
&lt;li&gt;Parameters like p_first_name VARCHAR - Inputs we pass when calling the procedure. &lt;/li&gt;
&lt;li&gt;LANGUAGE plpgsql - Enables procedural execution. &lt;/li&gt;
&lt;li&gt;BEGIN - Starts execution block. &lt;/li&gt;
&lt;li&gt;INSERT INTO customers(...) -  Specifies the table and columns being inserted. &lt;/li&gt;
&lt;li&gt;VALUES (...) - Inserts the parameter values. &lt;/li&gt;
&lt;li&gt;END;- Ends procedure. &lt;/li&gt;
&lt;li&gt;CALL add_customer(...) - Runs the procedure by inserting values into our table. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Delete Customer Safely
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plsql"&gt;&lt;code&gt;&lt;span class="kr"&gt;CREATE&lt;/span&gt; &lt;span class="kr"&gt;OR&lt;/span&gt; &lt;span class="kr"&gt;REPLACE&lt;/span&gt; &lt;span class="kr"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;delete_customer_safely&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="n"&gt;p_customer_id&lt;/span&gt; &lt;span class="kt"&gt;INT&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt; 
&lt;span class="kr"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt; 
&lt;span class="k"&gt;BEGIN&lt;/span&gt; 
&lt;span class="kr"&gt;DELETE&lt;/span&gt; &lt;span class="kr"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="kr"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="kr"&gt;DELETE&lt;/span&gt; &lt;span class="kr"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; 
&lt;span class="kr"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>Introduction to Python</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Tue, 17 Mar 2026 14:59:28 +0000</pubDate>
      <link>https://forem.com/murimikelvin/introduction-to-python-3ci7</link>
      <guid>https://forem.com/murimikelvin/introduction-to-python-3ci7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjaglv6u2ia8033gxa2m5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjaglv6u2ia8033gxa2m5.png" alt=" " width="300" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Python is  a powerful and flexible programming language created by Guido van Rossum and first released in 1991.Over the years, Python has become one of the most loved and used languages in the world.&lt;/p&gt;

&lt;h1&gt;
  
  
  Python installation.
&lt;/h1&gt;

&lt;p&gt;python installation from the web.&lt;br&gt;
&lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;python&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;
  
  
  checking the version of python installed in your pc
&lt;/h1&gt;

&lt;p&gt;Open your command line and type python --version&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frltrkgcu6y1skj4h1u22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frltrkgcu6y1skj4h1u22.png" alt=" " width="800" height="183"&gt;&lt;/a&gt;&lt;br&gt;
This confirms the version of python installed in your computer.&lt;/p&gt;

&lt;p&gt;example of a python code.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85a2d7pu1b6f0h1l966y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85a2d7pu1b6f0h1l966y.png" alt=" " width="800" height="181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Can You Use Python For&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web Development, and software development. &lt;/li&gt;
&lt;li&gt;Data analysis, machine learning, data engineering, AI, and math. &lt;/li&gt;
&lt;li&gt;System scripting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why Use Python?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Platform-independent - Python works everywhere: Windows, Mac, Linux, Raspberry Pi, or any other platform.&lt;/li&gt;
&lt;li&gt;Simple &amp;amp; Readable Syntax - python is simple to read and understand.&lt;/li&gt;
&lt;li&gt;Fewer Lines of Code - Python lets you do more with less typing.&lt;/li&gt;
&lt;li&gt;Fast Prototyping - Python runs your code line by line, immediately showing results or errors.&lt;/li&gt;
&lt;li&gt;Multiple Programming Styles - Python lets you write code in different ways i.e. procedural, object oriented and functional (using functions like math operations)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Python Syntax.
&lt;/h1&gt;

&lt;p&gt;Python has rules about how code should be written so it can be understood and executed properly. It relies on indentation and clean line-by-line execution and throws errors if the rules are not followed. Python emphasizes readability and simplicity, making it easier for beginners to learn and professionals to maintain. &lt;/p&gt;
&lt;h1&gt;
  
  
  Key Syntax Rules in Python.
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Python uses indentation  to define blocks of code.&lt;/strong&gt;&lt;br&gt;
Indentation means adding spaces at the beginning of a line to show that this line belongs to a group or block of code.&lt;br&gt;
You can technically use any number of spaces - at least one. But the convention is to use four spaces for each indentation level.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Five is greater than two!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mixing indentation levels is not allowed Python needs consistency. Once you choose a number of spaces, stick with it throughout the block. &lt;/p&gt;

&lt;h1&gt;
  
  
  Python Variables.
&lt;/h1&gt;

&lt;p&gt;A variable is a named memory location.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lee&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;here name is a variable for storing the string "lee"&lt;/p&gt;

&lt;p&gt;In Python, you don’t have to declare the data type of a variable like you do in many other languages. Python understands it based on the value you assign.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rules of naming variables.
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A variable name must start with a letter (a–z, A–Z) or an underscore _.&lt;/li&gt;
&lt;li&gt;It can only contain alphanumeric characters and underscores (A–Z, a–z, 0–9, and _)&lt;/li&gt;
&lt;li&gt;Variable names are case-sensitive.&lt;/li&gt;
&lt;li&gt;It cannot start with a number.&lt;/li&gt;
&lt;li&gt;You cannot use reserved words i.e. (int, True ETC) &lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Best Practices.
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Use descriptive names &lt;/li&gt;
&lt;li&gt;Use snake_case for variables&lt;/li&gt;
&lt;li&gt;Avoid short, vague names like x, y, unless in quick examples or math &lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Assigning Multiple Variables
&lt;/h1&gt;

&lt;p&gt;Python lets you assign values to multiple variables at once: a, b, c = 1, 2, 3 &lt;br&gt;
All at once! Or assign the same value to multiple variables: x = y = z = "Python" &lt;/p&gt;

&lt;p&gt;Checking the Type of a Variable - You can use the type() function to find out what kind of value a variable holds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the output will be str (string)&lt;/p&gt;

&lt;h1&gt;
  
  
  Python Constants.
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Values that don’t change during normal program execution&lt;/li&gt;
&lt;li&gt;In real life, think of your birthdate. You store it once and it never changes - that's a constant.&lt;/li&gt;
&lt;li&gt;In Python, we use uppercase letters to indicate that a variable is meant to be a constant e.g.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;3.14159&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Python does not have built-in constants, but we use uppercase variable names to show they are constants.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python Data Types.
&lt;/h2&gt;

&lt;p&gt;A data type is the classification of data that tells the computer what kind of data you are working with and what operations can be performed on it.&lt;/p&gt;

&lt;h1&gt;
  
  
  Python’s Built-in Data Types.
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. Numeric Types
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;int – Integer (whole numbers)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;float – Floating point (decimal numbers)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;5.9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;complex – Complex numbers (used in scientific computing)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;complex_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;2j&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Text Type
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;str – String (a sequence of characters)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python is amazing!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Boolean Type
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;bool - Can only be True or False.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;is_logged_in&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Arithmetic Operators in Python
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Addition (+) -- Adds two numbers&lt;/li&gt;
&lt;li&gt;Subtraction (-)  -- Subtract right from left &lt;/li&gt;
&lt;li&gt;Multiplication (*) -- Multiply numbers &lt;/li&gt;
&lt;li&gt;Division (/) -- Divide and get float &lt;/li&gt;
&lt;li&gt;Modulus (%) -- Get the remainder &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  comparison operators
&lt;/h2&gt;

&lt;p&gt;Greater than (&amp;gt;)&lt;br&gt;
Less than (&amp;lt;)&lt;br&gt;
Equal to (==) &lt;em&gt;note that (=) is an assignment operator&lt;/em&gt;&lt;br&gt;
Not equal to (=!)&lt;br&gt;
Greater than or equal to (&amp;gt;=)&lt;br&gt;
Less than or equal to (&amp;gt;=)&lt;/p&gt;
&lt;h2&gt;
  
  
  Logical operators (and,or,not)
&lt;/h2&gt;

&lt;p&gt;AND - Returns true only if both conditions are true&lt;br&gt;
OR - Returns true if at least one condition is true&lt;br&gt;
NOT - Reverses the value (true becomes false, false becomes true)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data types and data structures&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Datatypes&lt;/strong&gt;&lt;br&gt;
A datatype define the type of a value a variable can store&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Boolean - returns true or false when a certain condition is met.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;String - Sequence of characters or text enclosed in quotation marks.&lt;br&gt;
on command line python dir(str)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;f-string - An f-string (formatted string literal) is a way to insert variables or expressions directly inside a string.&lt;br&gt;
user = 'admin'&lt;br&gt;
password = 12345&lt;br&gt;
database = 'postgres'&lt;/p&gt;

&lt;p&gt;database_url = f"PostgreSQL:// {user:}{password:}&lt;a class="mentioned-user" href="https://dev.to/localhost"&gt;@localhost&lt;/a&gt;:5432{testdb}"&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Int -  whole numbers without decimals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;floats - represents decimals&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Data structures.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Is a way of organizing and storing data so that it can be modified and accessed efficiently.&lt;/p&gt;

&lt;p&gt;on command line python dir(list)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;list&lt;/strong&gt; [] - is a collection of items in a specified order&lt;br&gt;
students =["Sam", "Gloria", "Stacy", "Kevin", "Ornella"]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;characteristics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;(i) They are ordered&lt;br&gt;
(ii) They are mutable - can be changed&lt;br&gt;
(iii) Indexed&lt;br&gt;
(iv) Allows duplicates&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;common list methods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;.append() - students.append("kevin") - adds an item at the end of a list&lt;br&gt;
.remove() - students.remove("kevin") - removes a specific item&lt;br&gt;
.pop() - students.pop(1) - removes an item based on index and last one if no index is provided.&lt;br&gt;
.sort() - students.sort() - sorts items in the list based on a particular order&lt;br&gt;
.reverse() - students.reverse() - reverses the order of items in a list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tuple&lt;/strong&gt; () - is a collection of items that is ordered but cannot be changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;characteristics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;(i) They are immutable - cannot be changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;common tuple methods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;.count() - my_tuple.count(35)&lt;br&gt;
.index() - my_tuple.index(35)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dictionary&lt;/strong&gt; {}- it is a collection of key value pairs &amp;gt;&amp;gt; each key maps to a value&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;characteristics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;(i)Key value pairs&lt;br&gt;
(ii)They are mutable&lt;br&gt;
(iii)The keys must be unique&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;common Dictionary methods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;.items() - student.items() Returns key-value pairs&lt;br&gt;
.keys() - student.keys() Returns all keys&lt;br&gt;
.values() - student.values() Returns all values&lt;br&gt;
.get() - student.get()Safely gets a value&lt;br&gt;
.update() - student.update() Updates/adds items&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sets&lt;/strong&gt; {}- is a collection of Unique items stored in an  unordered structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Characteristics&lt;/strong&gt;&lt;br&gt;
(i) They are unordered - no indexing&lt;br&gt;
(ii) No duplicates&lt;br&gt;
(iii) They are mutable - can add or remove elements&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;common set methods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;.add() fruits.add("mango") - adds mango to the set.&lt;br&gt;
.pop() fruits.pop() removes items&lt;br&gt;
.union() print(fruits.union(veggies)) - joins fruits and veggies&lt;br&gt;
Control Flow, Loops and Functions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Control Flow&lt;/strong&gt;&lt;br&gt;
-if else and elif&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operators and Expressions&lt;/strong&gt;&lt;br&gt;
Arithmetic operators (+,-,*,/,^,%)&lt;br&gt;
comparison operators (&amp;lt;,&amp;gt;,=,=!...)&lt;br&gt;
Logical operators (and,or,not)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conditional statements.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enter score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;C&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;D&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;E&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Repeat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;loops&lt;/strong&gt;&lt;br&gt;
 for loop -- used to iterate over a sequence (list, tuple or a string)&lt;/p&gt;

&lt;p&gt;while loop -- iterates a block of code until a certain condition is met.&lt;/p&gt;

&lt;p&gt;Break and continue&lt;/p&gt;

&lt;p&gt;break stops/ends the loop immediately&lt;br&gt;
continue - skips the current iteration -- continue skips to next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Functions&lt;/strong&gt;&lt;br&gt;
A function is a block of named reusable block of code that perform a specific task when called.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;List comprehension&lt;/strong&gt;&lt;br&gt;
list comprehension provides a concise and efficient way to create a new list by applying an expression to each item in an existing iterable, optionally filtering the items&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Object Oriented Programming (OOP)&lt;/strong&gt;&lt;br&gt;
is a way of programming that organizes code into classes and objects instead of just functions.&lt;/p&gt;

&lt;p&gt;classes - blueprint of an object&lt;br&gt;
object - instance of a class&lt;/p&gt;

&lt;p&gt;It provides a clear structure to write programs&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It allows you to build reusable applications with less code&lt;/li&gt;
&lt;li&gt;Dry principle - Do Not Repeat Yourself - it keeps your code dry&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;CORE CONCEPTS:&lt;/strong&gt;&lt;br&gt;
1 class&lt;br&gt;
2 Objects&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pillars of oop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inheritance -- this allows you to reuse code from another class&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Encapsulation-- this means bundling data and methods together inside a class and controlling access to them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;polymorphism - this means  the same method name behaves differently&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Abstraction -- means hiding unnecessary implementation details and showing only the essential features of an object.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>beginners</category>
      <category>programming</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>sql and databases</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Fri, 13 Mar 2026 13:04:02 +0000</pubDate>
      <link>https://forem.com/murimikelvin/sql-and-databases-368a</link>
      <guid>https://forem.com/murimikelvin/sql-and-databases-368a</guid>
      <description>&lt;h2&gt;
  
  
  What is a database.
&lt;/h2&gt;

&lt;p&gt;A database is an organized collection of data stored electronically. &lt;/p&gt;

&lt;h2&gt;
  
  
  What is SQL
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SQL&lt;/strong&gt; stands for structured query language. It is used for storing data, retrieving data, updating data and deleting data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Schema in SQL
&lt;/h2&gt;

&lt;p&gt;A schema is a logical container or namespace inside a database that groups related objects like tables, views, indexes, stored procedures, functions and sequences.&lt;br&gt;
You can think of a schema as a folder inside a database helping you organize and manage objects.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why use schemas.
&lt;/h2&gt;

&lt;p&gt;Schemas help organize large datasets into sections, control access to different parts of the database and avoiding name conflicts,&lt;/p&gt;
&lt;h3&gt;
  
  
  Creating a schema.
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;sales_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;this creates a schema named sales_data.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting search path.
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;search_path&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;sales_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;this sets the search path to the schema. To check what schema you are using &lt;code&gt;show search_path;&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Creating a table in that schema.
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;this creates a table named customers.&lt;/p&gt;
&lt;h2&gt;
  
  
  Types of Database management systems(dbms).
&lt;/h2&gt;

&lt;p&gt;DBMS means Database management system. It is a software that allows us to create, manage and interact with databases.&lt;/p&gt;
&lt;h2&gt;
  
  
  Types of databases.
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Relational database - uses tables with rows and columns. examples: PostgreSQL, MySQL, Oracle.&lt;/li&gt;
&lt;li&gt;NoSQL DBMS - non tabular (key-value, Json, Graph) examples: MongoDB, Cassandra.&lt;/li&gt;
&lt;li&gt;In-Memory DBMS - stores data in RAM. example: Redis&lt;/li&gt;
&lt;li&gt;Cloud DBMS - cloud-hosted &amp;amp; managed by providers. examples: Amazon RDS, Azure PostreSQL&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  SQL datatypes
&lt;/h2&gt;

&lt;p&gt;Data types define the type of data that can be stored in a column. They enforce data integrity and data accuracy.&lt;/p&gt;
&lt;h2&gt;
  
  
  Main data types categories.
&lt;/h2&gt;
&lt;h2&gt;
  
  
  1. Numeric Types
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Integer/int - whole numbers e.g. 10,50,-4&lt;/li&gt;
&lt;li&gt;Serial - Auto-incrementing ID e.g. 1,2,3&lt;/li&gt;
&lt;li&gt;Numeric - precision numbers (money) e.g. 12345.76&lt;/li&gt;
&lt;li&gt;Decimal - same as numeric - 146505.95&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  2. Character Types
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CHAR - fixed length string e.g. 'Kenya'&lt;/li&gt;
&lt;li&gt;VARCHAR - variable-length string e.g. 'John Kamau'&lt;/li&gt;
&lt;li&gt;TEXT - unlimited string length e.g. long comments, emails&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  3. Date &amp;amp; Time Types
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;DATE - stored date only  e.g. '2020-01-01'&lt;/li&gt;
&lt;li&gt;TIMESTAMP - stores date and time e.g.'2024-10-10 12:20:00'&lt;/li&gt;
&lt;li&gt;TIME - time of the day e.g. '13:30:00'&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  4. Boolean
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Boolean - True or False e.g. True or False&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Creating tables in SQL
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Basic syntax
&lt;/h3&gt;

&lt;p&gt;A table consists of rows and columns&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="n"&gt;Customer_id&lt;/span&gt; &lt;span class="nb"&gt;serial&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;First_name&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;Last_name&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;Email&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;Phone&lt;/span&gt; &lt;span class="nb"&gt;char&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Constraints
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;PRIMARY KEY - uniquely identifies each record&lt;/li&gt;
&lt;li&gt;NOT NULL - cannot be empty&lt;/li&gt;
&lt;li&gt;UNIQUE - all values must be different&lt;/li&gt;
&lt;li&gt;DEFAULT - provides a default value&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;FOREIGN KEY - links to another table's primary key&lt;/p&gt;
&lt;h2&gt;
  
  
  Best practices
&lt;/h2&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use singular table names e.g. employees&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define primary keys&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use meaningful column names&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prefer VARCHAR over CHAR unless fixed &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use foreign keys for relational integrity&lt;/p&gt;
&lt;h2&gt;
  
  
  Inserting Data into a database.
&lt;/h2&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;First_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Phone&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;VALUES&lt;/span&gt;   
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'John'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'john.doe@example.com'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'+254712345678'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Jane'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Smith'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'jane.smith@example.com'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'+254798765432'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Paul'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Otieno'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'paul.otieno@example.com'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'+254701234567'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Mary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Okello'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'mary.okello@example.com'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'+254711223344'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Viewing and querying Tables.
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Viewing databases
&lt;/h2&gt;

&lt;p&gt;-- from postgreSQL&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;datname&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_database&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;datistemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;-- from MySQL&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;DATABASES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Viewing all tables in PostgreSQL
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;schema_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tables&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;table_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sales_data'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  View data from a table.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Table manipulation and common SQL key words.
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Add a column to a table
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;  
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this adds a new city column to the customers table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Updating existing data
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Updating a single row
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;  
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;  
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Updating multiple rows with a CASE statement.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;  
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Mombasa'&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Kisumu'&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Nakuru'&lt;/span&gt;  
&lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;  &lt;span class="c1"&gt;-- Retain original city if no match &lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Updating specific quantity
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;  
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;  
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  
&lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; 
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Updates orders with specific quantity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Delete columns from SQL
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Remove column from a table.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;  
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;published_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Delete rows from SQL
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Delete a specific row
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;  
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Drop a table
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deletes the entire table and all its data. Use with caution!!&lt;/p&gt;

&lt;h2&gt;
  
  
  Rename column
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;RENAME&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;contact_number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Renames a column.&lt;/p&gt;

&lt;h2&gt;
  
  
  MODIFY COLUMN TYPE - Change a Column’s Data Type
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;  
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SET DEFAULT Value for Column
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;  
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  DROP DEFAULT Value
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;  
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Add a NOT NULL Constraint
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;  
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Drop a NOT NULL Constraint
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;  
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Add a Foreign Key Constraint
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;fk_customer&lt;/span&gt; 
&lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Drop a Foreign Key Constraint
&lt;/h2&gt;

&lt;p&gt;You must know the constraint name. You can find it using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;conname&lt;/span&gt;  
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_constraint&lt;/span&gt;  
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;conrelid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then drop it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;fk_customer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  COMMON SQL KEYWORDS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. SELECT - Retrieve Data
&lt;/h3&gt;

&lt;p&gt;SELECT column1, column2 FROM table_name; &lt;br&gt;
Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;book_id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. WHERE - Filter Data
&lt;/h3&gt;

&lt;p&gt;SELECT column FROM table WHERE condition; &lt;br&gt;
Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. ORDER BY - Sort Data
&lt;/h3&gt;

&lt;p&gt;SELECT columns FROM table ORDER BY column ASC|DESC; &lt;br&gt;
Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. GROUP BY - Group and Summarize Data
&lt;/h3&gt;

&lt;p&gt;SELECT column, AGG_FUNCTION(column) FROM table GROUP BY column; &lt;br&gt;
Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. HAVING - Filter After Grouping
&lt;/h3&gt;

&lt;p&gt;SELECT column, COUNT(&lt;em&gt;) FROM table GROUP BY column HAVING COUNT(&lt;/em&gt;) &amp;gt; 1; &lt;br&gt;
Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. LIMIT - Restrict Number of Results
&lt;/h3&gt;

&lt;p&gt;SELECT * FROM table LIMIT number; &lt;br&gt;
Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SELECT -Retrieve data
&lt;/li&gt;
&lt;li&gt;WHERE - Filter records &lt;/li&gt;
&lt;li&gt;ORDER BY - Sort records&lt;/li&gt;
&lt;li&gt;GROUP  BY - Group data &lt;/li&gt;
&lt;li&gt;HAVING - Filter grouped data &lt;/li&gt;
&lt;li&gt;LIMIT - Restrict number of results &lt;/li&gt;
&lt;li&gt;INSERT - Add data &lt;/li&gt;
&lt;li&gt;UPDATE - Modify existing data &lt;/li&gt;
&lt;li&gt;DELETE - Remove rows &lt;/li&gt;
&lt;li&gt;ALTER - Modify table structure&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  SQL Aggregate Functions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;COUNT() Counts the number of rows (or non-NULL values)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;kisumu_customers&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Kisumu'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;SUM() Adds up numeric values in a column
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_book_price&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;AVG() Calculates the average (mean) of numeric values
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_price_2023&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;MAX() Finds the highest value in a column
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;latest_order&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;MIN() Finds the lowest value in a column
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;min_quantity&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SQL Operators
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Arithmetic Operators - Used for mathematical calculations.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Addition &lt;code&gt;SELECT title, price, price + 200 AS new_price 
FROM books;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Subtraction &lt;code&gt;SELECT book_name, price, price - 150 AS discounted_price 
FROM books;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Multiplication &lt;code&gt;SELECT book_name, price, price * 2 AS double_price 
FROM books;&lt;/code&gt;
/ Division &lt;code&gt;SELECT book_name, price, price / 2 AS half_price 
FROM books;&lt;/code&gt;
% Modulo (Remainder) &lt;code&gt;SELECT book_name, price, price % 3 AS remainder 
FROM books;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Comparison Operators - Used to compare values, often in &lt;code&gt;WHERE&lt;/code&gt;.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;= Equals - Selects rows where a column matches an exact value. &lt;br&gt;
&lt;code&gt;SELECT * FROM books WHERE author = 'David Kimani';&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;!= or &amp;lt;&amp;gt; Not equal to - Selects rows where a column value does not match the specified value.&lt;br&gt;
&lt;code&gt;SELECT * FROM orders WHERE order_date &amp;lt;&amp;gt; '2023-04-01';&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;blockquote&gt;
&lt;p&gt;Greater than - Selects rows where column value is greater than the given value.&lt;br&gt;
&lt;code&gt;SELECT * FROM orders WHERE order_date &amp;gt; '2023-04-03';&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;lt; Less than - Selects rows where column value is less than the given value.&lt;br&gt;
&lt;code&gt;SELECT * FROM orders WHERE order_date &amp;lt; '2023-04-04';&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;blockquote&gt;
&lt;p&gt;= Greater than or equal to - Selects rows where column value is greater than or equal to the given value.&lt;br&gt;
&lt;code&gt;SELECT * FROM customer WHERE customer_id &amp;gt;= 2;&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;lt;= Less than or equal to - Selects rows where column value is less than or equal to the given value.&lt;br&gt;
&lt;code&gt;SELECT * FROM books WHERE price &amp;lt;= 2200;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;BETWEEN Between two values - Selects rows with column values between two values (inclusive).&lt;br&gt;
&lt;code&gt;SELECT * FROM orders WHERE order_date BETWEEN '2023-04-01' AND '2023-04-03';&lt;/code&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LIKE Pattern matching - Use LIKE to search for patterns in text data. &lt;br&gt;
&lt;code&gt;SELECT * FROM customer WHERE email LIKE '%gmail.com';&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IN Matches values in a list - Use IN to filter records by matching any value in a list. &lt;br&gt;
&lt;code&gt;SELECT * FROM books WHERE author IN ('David Kimani', 'Grace Achieng');&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Logical Operators - combine multiple conditions.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;AND - All conditions must be true &lt;br&gt;
&lt;code&gt;SELECT * FROM customer &lt;br&gt;
WHERE city = 'Kisumu' AND first_name = 'Paul';&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OR - At least one condition is true &lt;br&gt;
&lt;code&gt;SELECT * FROM customer &lt;br&gt;
WHERE city = 'Nairobi' OR city = 'Kisumu';&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NOT - Reverses the result of a condition&lt;br&gt;
&lt;code&gt;SELECT * FROM orders &lt;br&gt;
WHERE NOT order_date = '2023-04-01';&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Bitwise Operators - Used for bit-level operations.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Bitwise AND &lt;/li&gt;
&lt;li&gt; Bitwise OR &lt;/li&gt;
&lt;li&gt; Bitwise XOR &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Set Operators - Used to combine result sets of two SELECT queries.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;UNION - Combines results, removes duplicates
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;UNION&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;UNION ALL - Combines results, keeps duplicates
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;INTERSECT - Returns common records
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;INTERSECT&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;EXCEPT - Returns records in first query, not in second
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;EXCEPT&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Other Useful Operators
&lt;/h3&gt;

&lt;p&gt;IS NULL - Checks for missing values&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;IS NOT NULL - Checks for non-missing values&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DISTINCT - Removes duplicates&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Aggregate Functions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. COUNT() - Counting Rows
&lt;/h3&gt;

&lt;p&gt;The COUNT() function counts the number of rows that match a condition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;kisumu_customers&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Kisumu'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. SUM() - Total of Numbers
&lt;/h3&gt;

&lt;p&gt;SUM() adds up the values in a numeric column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_book_price&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. AVG() - Average Value
&lt;/h3&gt;

&lt;p&gt;AVG() calculates the average (mean) of values in a numeric column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_price_2023&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. MAX() - Maximum Value
&lt;/h3&gt;

&lt;p&gt;MAX() finds the largest value in a numeric or date column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;latest_order&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. MIN() - Minimum Value
&lt;/h3&gt;

&lt;p&gt;MIN() finds the smallest value in a numeric or date column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;min_quantity&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  String, Date, and Mathematical Functions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SQL String Functions
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;CONCAT(): Combine Strings - Joins multiple strings into one.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;full_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;SUBSTRING(): Extract Part of a String - Extracts a portion of a string
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;POSITION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'@'&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;LENGTH(): String Length -  Returns the number of characters in a string.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LENGTH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;email_length&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;UPPER() and LOWER(): Change Case
UPPER() – Converts text to uppercase.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;upper_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LOWER() – Converts text to lowercase.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;standardized_email&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;TRIM(), LTRIM(), RTRIM(): Remove Spaces -  Cleans unwanted spaces from strings.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;LTRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;RTRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;REPLACE(): Replace Text - Replaces all occurrences of a substring
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Kenya'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'KE'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;short_country&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SQL Date Functions
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;NOW(): Current Date and Time
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;current_datetime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;YEAR(), MONTH(), DAY(): Extract Date Parts
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;YEAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_year&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;MONTH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_month&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_day&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;DATEDIFF() / Subtracting Dates
Note: DATEDIFF() is for MySQL. In PostgreSQL, use subtraction.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;days_since_order&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SQL Mathematical Functions
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;ROUND(): Round Numbers
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rounded_price&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;CEIL() and FLOOR(): Round Up/Down
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CEIL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_weight&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;shipping_cost_per_item&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; 
&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;MOD(): Get Remainder
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;MOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;POWER(): Raise to a Power
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;book_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;POWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cubed_price&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;ABS(): Absolute Value
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;book_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;ABS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;price_difference&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;books&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;1: INNER JOIN&lt;/strong&gt;&lt;br&gt;
INNER JOIN returns only matching rows from both tables. If there is no match between the tables the row is excluded from the result. It is used when you only want records that exist in both tables or want to combine related data.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z1lib7xhjnily5exwe2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z1lib7xhjnily5exwe2.png" alt=" " width="800" height="177"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;2: LEFT JOIN&lt;/strong&gt;&lt;br&gt;
LEFT JOIN returns all rows from the left table and matching rows from the right table. Also known as left outer join. If no match exists null values are returned for the right table columns.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6od3c9kxxp493ru4ebt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6od3c9kxxp493ru4ebt7.png" alt=" " width="800" height="186"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;3: RIGHT JOIN&lt;/strong&gt;&lt;br&gt;
RIGHT JOIN returns all rows from the right table and matching rows from the left table. If no match exists null values are returned for the left table columns.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4s3yhk8j062rl5qbk0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4s3yhk8j062rl5qbk0r.png" alt=" " width="800" height="188"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;4: FULL OUTER JOIN&lt;/strong&gt;&lt;br&gt;
FULL OUTER JOIN returns all rows from both tables. If no match exists nulls appear on the missing side. It is used for combining two datasets and finding mismatches between systems.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1kp5ml9gkz5kjep3ptx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1kp5ml9gkz5kjep3ptx0.png" alt=" " width="800" height="198"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;5: CROSS JOIN&lt;/strong&gt;&lt;br&gt;
CROSS JOIN returns the cartesian product - every row from the first table combined with every row from the second table. No matching is required.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3wqnbyaijrf52gvsrmr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3wqnbyaijrf52gvsrmr.png" alt=" " width="800" height="161"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  WINDOW FUNCTIONS IN SQL.
&lt;/h2&gt;

&lt;p&gt;Window functions perform calculations across a set of rows related to the current row without grouping the rows into a single output. They preserve the individual row details while providing additional contextual insights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key components.&lt;/strong&gt;&lt;br&gt;
Window functions are defined using the mandatory OVER() clause which specifies how the rows are partitioned and ordered for the calculation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common types of window functions.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;1: RANKING&lt;/strong&gt; - Row_number, rank, dense_rank - It assigns numbers or ranks to rows based on order.&lt;br&gt;
&lt;strong&gt;Rank&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2d9ulhdjtdkjeq7u2hh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2d9ulhdjtdkjeq7u2hh.png" alt=" " width="800" height="179"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Row number&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtosm2mn3bchyvbergpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtosm2mn3bchyvbergpe.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;2: AGGREGATE&lt;/strong&gt; - sum, avg, min, max, count - calculates sum, averages or extremes across the window.&lt;br&gt;
&lt;strong&gt;Total&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F778973hfmmhr2xxny677.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F778973hfmmhr2xxny677.png" alt=" " width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faaa1bisp59tzf4kvv3sz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faaa1bisp59tzf4kvv3sz.png" alt=" " width="800" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Count&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibxzt9w0iz7v9nus8vl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibxzt9w0iz7v9nus8vl.png" alt=" " width="800" height="152"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Max&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl30z85z4b781x5j3gvq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl30z85z4b781x5j3gvq8.png" alt=" " width="800" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3: VALUE/OFFSET&lt;/strong&gt; - lag, lead - Accesses data from rows before, after or at specific points in the window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;lag&lt;/strong&gt;&lt;br&gt;
Used to compare rows i.e. comparing current value to previous value&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f3b4k0nfzmwkesqxsiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f3b4k0nfzmwkesqxsiu.png" alt=" " width="800" height="169"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;lead&lt;/strong&gt;&lt;br&gt;
looks forward to the next row or a specified number of rows ahead.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9twpy9wpx3ye5fpgpat2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9twpy9wpx3ye5fpgpat2.png" alt=" " width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Subqueries
&lt;/h2&gt;

&lt;p&gt;A subquery (or nested query) is a SQL query written inside another query. It allows you to &lt;br&gt;
perform an operation that depends on the result of another query. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The SELECT clause - Subqueries in the SELECT clause add an extra computed column to the result set. Each subquery 
executes once for every row of the outer query.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;  
&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt;  
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_projects&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;The FROM clause - A subquery in the FROM clause creates a temporary or derived table. The outer query can then 
select or filter data from it.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt;  
&lt;span class="n"&gt;department_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_employees&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dept_summary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;The WHERE or HAVING clause - Subqueries in the WHERE clause allow the outer query to filter results based on another query’s output.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;  
&lt;span class="n"&gt;department_name&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Correlated vs Non-Correlated Subqueries
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Non-correlated subquery - A non-correlated subquery does not depend on the outer query. It can be executed on its own.&lt;/li&gt;
&lt;li&gt;Correlated subquery - A correlated subquery depends on the outer query. It is evaluated once per row of the outer query.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Common Table Expressions (CTEs)
&lt;/h2&gt;

&lt;p&gt;A CTE (Common Table Expression) is like creating a temporary result set (or virtual table) &lt;br&gt;
that exists only during the execution of a single SQL query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;high_salary&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;45000&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;high_salary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CTE with Joins
&lt;/h2&gt;

&lt;p&gt;You can use CTEs to simplify joins&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;emp_dept&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; 
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department_id&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;emp_dept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stored Procedures
&lt;/h2&gt;

&lt;p&gt;A stored procedure is a named block of SQL logic stored inside the database that you can &lt;br&gt;
execute using: &lt;code&gt;CALL procedure_name(parameters);&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Procedures are for actions, such as:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Insert data (add customer, add book, add order) &lt;/li&gt;
&lt;li&gt;Update data (update contact, update email) &lt;/li&gt;
&lt;li&gt;Delete data (delete customer safely) &lt;/li&gt;
&lt;li&gt;Validation and enforcement of business rules &lt;/li&gt;
&lt;li&gt;Multi-step operations (create order + check customer + log)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Procedure Syntax
&lt;/h3&gt;

&lt;p&gt;CREATE OR REPLACE PROCEDURE procedure_name(param_name param_type, ...) &lt;br&gt;
LANGUAGE plpgsql &lt;br&gt;
AS $$ &lt;br&gt;
BEGIN -- SQL statements &lt;br&gt;
END; &lt;br&gt;
$$; &lt;/p&gt;
&lt;h3&gt;
  
  
  Explanation of each keyword
&lt;/h3&gt;

&lt;p&gt;CREATE - creates the procedure &lt;br&gt;
OR REPLACE - updates the procedure if it already exists (no need to drop first) &lt;br&gt;
PROCEDURE procedure_name(...) - name + inputs (parameters) &lt;br&gt;
LANGUAGE plpgsql - enables procedural features (IF, LOOP, variables) &lt;br&gt;
AS $$ ... $$ - defines the body of the procedure &lt;br&gt;
BEGIN ... END; - start/end of procedure logic &lt;br&gt;
Call with: CALL procedure_name(...)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plsql"&gt;&lt;code&gt;&lt;span class="kr"&gt;CREATE&lt;/span&gt; &lt;span class="kr"&gt;OR&lt;/span&gt; &lt;span class="kr"&gt;REPLACE&lt;/span&gt; &lt;span class="kr"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;add_customer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="n"&gt;p_first_name&lt;/span&gt; &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_last_name&lt;/span&gt;  &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_email&lt;/span&gt;      &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_contact&lt;/span&gt;    &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;p_city&lt;/span&gt;       &lt;span class="kt"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt; 
&lt;span class="kr"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt; 
&lt;span class="k"&gt;BEGIN&lt;/span&gt; 
&lt;span class="kr"&gt;INSERT&lt;/span&gt; &lt;span class="kr"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="kr"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_contact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_city&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explanation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CREATE OR REPLACE PROCEDURE add_customer(...) - Creates a reusable procedure named add_customer. &lt;/li&gt;
&lt;li&gt;Parameters like p_first_name VARCHAR - Inputs we pass when calling the procedure. &lt;/li&gt;
&lt;li&gt;LANGUAGE plpgsql - Enables procedural execution. &lt;/li&gt;
&lt;li&gt;BEGIN - Starts execution block. &lt;/li&gt;
&lt;li&gt;INSERT INTO customers(...) -  Specifies the table and columns being inserted. &lt;/li&gt;
&lt;li&gt;VALUES (...) - Inserts the parameter values. &lt;/li&gt;
&lt;li&gt;END;- Ends procedure. &lt;/li&gt;
&lt;li&gt;CALL add_customer(...) - Runs the procedure by inserting values into our table. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Delete Customer Safely
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plsql"&gt;&lt;code&gt;&lt;span class="kr"&gt;CREATE&lt;/span&gt; &lt;span class="kr"&gt;OR&lt;/span&gt; &lt;span class="kr"&gt;REPLACE&lt;/span&gt; &lt;span class="kr"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;delete_customer_safely&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
&lt;span class="n"&gt;p_customer_id&lt;/span&gt; &lt;span class="kt"&gt;INT&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt; 
&lt;span class="kr"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt; 
&lt;span class="k"&gt;BEGIN&lt;/span&gt; 
&lt;span class="kr"&gt;DELETE&lt;/span&gt; &lt;span class="kr"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="kr"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="kr"&gt;DELETE&lt;/span&gt; &lt;span class="kr"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; 
&lt;span class="kr"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>beginners</category>
      <category>database</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Connecting Power BI to SQL databases.</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Fri, 13 Mar 2026 11:27:30 +0000</pubDate>
      <link>https://forem.com/murimikelvin/connecting-power-bi-to-sql-databases-1kfi</link>
      <guid>https://forem.com/murimikelvin/connecting-power-bi-to-sql-databases-1kfi</guid>
      <description>&lt;p&gt;&lt;strong&gt;Power BI&lt;/strong&gt;&lt;br&gt;
Power BI is a business intelligence and data visualization tool that helps organizations collect, analyze and visualize data so as to make better and informed business decisions. It converts raw data from multiple sources into interactive dashboards, charts and reports that are used in decision making.&lt;/p&gt;

&lt;p&gt;Power BI is used in data analysis and business intelligence because it allows the users  to import data i.e. from MS Excel, databases and cloud services. Power BI has tools like power query which is used for data cleaning and Data Analysis Expressions(DAX) functions which are used to perform calculations. It has charts that are used for data visualization too.&lt;/p&gt;

&lt;p&gt;Databases are used by large organizations to store large volumes of data. These databases are connected to Power BI to retrieve the data which is then used for analysis. Companies connect power bi to databases because the provide data security and integrity, can handle large sets of data and provide real time data access which allows dashboards to show regularly refreshed data.&lt;/p&gt;

&lt;p&gt;SQL databases are important for storing and managing analytical data because they provide data storage, facilitate data management i.e. inserting new data, updating and deleting outdated data which helps maintain data consistency. databases also facilitate data retrieval which is used for analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connecting Power BI to a local postgreSQL database.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60dyhy91copzmlu4ljs2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60dyhy91copzmlu4ljs2.png" alt=" " width="800" height="390"&gt;&lt;/a&gt;&lt;br&gt;
then open a blank report&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnnieci5c4hvtp7gmvav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnnieci5c4hvtp7gmvav.png" alt=" " width="800" height="154"&gt;&lt;/a&gt;&lt;br&gt;
from the home tab click on get data&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffztwnhuesqw54nsugbl5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffztwnhuesqw54nsugbl5.png" alt=" " width="800" height="108"&gt;&lt;/a&gt;&lt;br&gt;
from get data click on database then select postgreSQL database then click on connect.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ivz4rezes4o2q4mv4v6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ivz4rezes4o2q4mv4v6.png" alt=" " width="800" height="783"&gt;&lt;/a&gt;&lt;br&gt;
On the connection window that appears enter the server and database details.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcoa6z565gfp7lk99iyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcoa6z565gfp7lk99iyz.png" alt=" " width="800" height="404"&gt;&lt;/a&gt;&lt;br&gt;
from the database window provide the authentication credentials then click on connect &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fcqiv1pxpe438bo59so.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fcqiv1pxpe438bo59so.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
from the navigator select the data you want to upload then load from the available tables. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajsor6n2bknfe89g8itn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajsor6n2bknfe89g8itn.png" alt=" " width="800" height="637"&gt;&lt;/a&gt;&lt;br&gt;
wait for the data to load&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6jt5m3nps264lsogd0v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6jt5m3nps264lsogd0v.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connecting Power BI to Cloud Databases.&lt;/strong&gt;&lt;br&gt;
From your browser search Aiven&lt;a href="https://aiven.io/" rel="noopener noreferrer"&gt;&lt;/a&gt; then create an account or login if having an existing account.&lt;br&gt;
Once on the Aiven console click on services to create a new service&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyokmadyre43n2qkgj0hq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyokmadyre43n2qkgj0hq.png" alt=" " width="800" height="195"&gt;&lt;/a&gt;&lt;br&gt;
select service type&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4fzyzmp7oy5yijx9b1h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4fzyzmp7oy5yijx9b1h.png" alt=" " width="800" height="356"&gt;&lt;/a&gt;&lt;br&gt;
select service tier, cloud, plan then on service basic enter a name then create service.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchn3y46qvv7al3apw773.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchn3y46qvv7al3apw773.png" alt=" " width="800" height="356"&gt;&lt;/a&gt;&lt;br&gt;
open the service and copy the connection information.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnle3rjshkr6ebuspft2q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnle3rjshkr6ebuspft2q.png" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;br&gt;
open dbeaver to connect to the cloud database &lt;br&gt;
from dbeaver click on database then new database connection&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4j73hr7pnva29rt1vnt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4j73hr7pnva29rt1vnt7.png" alt=" " width="800" height="490"&gt;&lt;/a&gt;&lt;br&gt;
select the database you want to connect to. Ensure it matches the cloud database. Then click on next.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgvbzih9krl0p57ytt27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgvbzih9krl0p57ytt27.png" alt=" " width="800" height="679"&gt;&lt;/a&gt;&lt;br&gt;
on the connection settings carefully replace them with the connection information copied from the cloud database (Aiven). i.e. host, database, port, username and password then test connection.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitt5eqoxbdg1zmfwy8ag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitt5eqoxbdg1zmfwy8ag.png" alt=" " width="800" height="678"&gt;&lt;/a&gt;&lt;br&gt;
after testing the connection then click on finish.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frub7n5iqupv9owpgigar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frub7n5iqupv9owpgigar.png" alt=" " width="800" height="681"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downloading secure sockets layer (SSL) certificates.&lt;/strong&gt;&lt;br&gt;
From the Aiven connection information click on the download button. A file named ca.pem is downloaded to your local computer. when connecting on Power BI when configuring SSL the SSL mode should be verify.ca then on root certificate select the downloaded ca.pem file from your downloads then click connect. SSL certificates are important because they encrypt data sent between Power BI and the postgreSQL server, verifies authentication and ensures data integrity between the client and the server.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Is an SSL Certificate Required?
&lt;/h3&gt;

&lt;p&gt;When connecting to a cloud database over the internet, your credentials and data travel across public networks. An SSL (Secure Sockets Layer) certificate encrypts this connection, ensuring that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your data cannot be intercepted by third parties (man-in-the-middle attacks)&lt;/li&gt;
&lt;li&gt;The server you're connecting to is verified and legitimate&lt;/li&gt;
&lt;li&gt;Your username and password are transmitted securely&lt;/li&gt;
&lt;li&gt;Aiven enforces SSL by default on all connections, which is a security best practice.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Using SSL certificate
&lt;/h3&gt;

&lt;p&gt;step 1 -&amp;gt; download the ssh file from Aiven.&lt;br&gt;
step 2 -&amp;gt; search for (certlm.msc from windows R) or (certmgr from search) then click on &amp;gt;Trusted Root Certification Authorities&lt;br&gt;
right on certificates &amp;gt;&amp;gt; all tasks &amp;gt;&amp;gt; browse the downloaded file from your downloads &amp;gt;&amp;gt; next &amp;gt;&amp;gt; import file name &amp;gt;&amp;gt; next &amp;gt;&amp;gt; then finish.&lt;br&gt;
From Power BI you can now connect to your cloud database&lt;br&gt;
check below&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 3: Connect in Power BI
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Follow the same steps as the local connection:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Click Get Data - PostgreSQL Database&lt;br&gt;
In the Server field, enter your Aiven host:port (e.g., &lt;code&gt;pg-yourservice.aivencloud.com:12345&lt;/code&gt;)&lt;br&gt;
Enter your database name&lt;br&gt;
Expand Advanced Options and paste the following into the Additional Connection Parameters field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;sslmode&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;require;sslrootcert=C:&lt;/span&gt;&lt;span class="se"&gt;\c&lt;/span&gt;&lt;span class="s"&gt;erts&lt;/span&gt;&lt;span class="se"&gt;\c&lt;/span&gt;&lt;span class="s"&gt;a.pem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enter your Aiven username and password when prompted&lt;br&gt;
Click Connect&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Relationships in Power BI.&lt;/strong&gt;&lt;br&gt;
After loading data to Power BI click on model view to view the relationship between the tables&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tsev833uh84354x9w89.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tsev833uh84354x9w89.png" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;br&gt;
The primary keys connect with the foreign keys in the other tables.&lt;br&gt;
Data modelling is the process of organizing data tables and defining how they relate to each other so that data can be interpreted correctly. Relationships define how records from one table relate to records in another allowing Power BI to correctly combine data. The common types of relationships are one to many, many to one and many to many. A star schema is the most common structure that is used in Power BI. The star schema contains the fact tables which contain measurable data and Dimension tables which contain descriptive data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why SQL skills are important for Analysts.&lt;/strong&gt;&lt;br&gt;
SQL skills enable analysts to retrieve, organize and prepare data efficiently before using it for visualization and reporting. SQL enables analysts extract specific information from large datasets using queries instead of importing the entire datasets which can be tiresome and time consuming. It also enables analysts to filter relevant records that are needed at a particular time using conditions like where. Aggregation functions like sum, count, avg, max and min help analysts to quickly summarize large datasets and identify trends before building visualizations. SQL also enables joining of tables, removing duplicates, sorting and grouping data which improve Power BI performance.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>database</category>
      <category>microsoft</category>
      <category>sql</category>
    </item>
    <item>
      <title>SQL Joins</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Sun, 01 Mar 2026 16:18:49 +0000</pubDate>
      <link>https://forem.com/murimikelvin/sql-joins-mli</link>
      <guid>https://forem.com/murimikelvin/sql-joins-mli</guid>
      <description>&lt;p&gt;Joins are used to combine rows from two or more tables based on a related column between them. This allows you to retrieve connected data stored across multiple tables in a single result. &lt;/p&gt;

&lt;h2&gt;
  
  
  Types of joins.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1: INNER JOIN&lt;/strong&gt;&lt;br&gt;
INNER JOIN returns only matching rows from both tables. If there is no match between the tables the row is excluded from the result. It is used when you only want records that exist in both tables or want to combine related data.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z1lib7xhjnily5exwe2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z1lib7xhjnily5exwe2.png" alt=" " width="800" height="177"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;2: LEFT JOIN&lt;/strong&gt;&lt;br&gt;
LEFT JOIN returns all rows from the left table and matching rows from the right table. Also known as left outer join. If no match exists null values are returned for the right table columns.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6od3c9kxxp493ru4ebt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6od3c9kxxp493ru4ebt7.png" alt=" " width="800" height="186"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;3: RIGHT JOIN&lt;/strong&gt;&lt;br&gt;
RIGHT JOIN returns all rows from the right table and matching rows from the left table. If no match exists null values are returned for the left table columns.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4s3yhk8j062rl5qbk0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4s3yhk8j062rl5qbk0r.png" alt=" " width="800" height="188"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;4: FULL OUTER JOIN&lt;/strong&gt;&lt;br&gt;
FULL OUTER JOIN returns all rows from both tables. If no match exists nulls appear on the missing side. It is used for combining two datasets and finding mismatches between systems.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1kp5ml9gkz5kjep3ptx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1kp5ml9gkz5kjep3ptx0.png" alt=" " width="800" height="198"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;5: CROSS JOIN&lt;/strong&gt;&lt;br&gt;
CROSS JOIN returns the cartesian product - every row from the first table combined with every row from the second table. No matching is required.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3wqnbyaijrf52gvsrmr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3wqnbyaijrf52gvsrmr.png" alt=" " width="800" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  WINDOW FUNCTIONS IN SQL.
&lt;/h2&gt;

&lt;p&gt;Window functions perform calculations across a set of rows related to the current row without grouping the rows into a single output. They preserve the individual row details while providing additional contextual insights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key components.&lt;/strong&gt;&lt;br&gt;
Window functions are defined using the mandatory OVER() clause which specifies how the rows are partitioned and ordered for the calculation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common types of window functions.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;1: RANKING&lt;/strong&gt; - Row_number, rank, dense_rank - It assigns numbers or ranks to rows based on order.&lt;br&gt;
&lt;strong&gt;Rank&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2d9ulhdjtdkjeq7u2hh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2d9ulhdjtdkjeq7u2hh.png" alt=" " width="800" height="179"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Row number&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtosm2mn3bchyvbergpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtosm2mn3bchyvbergpe.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;2: AGGREGATE&lt;/strong&gt; - sum, avg, min, max, count - calculates sum, averages or extremes across the window.&lt;br&gt;
&lt;strong&gt;Total&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F778973hfmmhr2xxny677.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F778973hfmmhr2xxny677.png" alt=" " width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faaa1bisp59tzf4kvv3sz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faaa1bisp59tzf4kvv3sz.png" alt=" " width="800" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Count&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibxzt9w0iz7v9nus8vl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibxzt9w0iz7v9nus8vl.png" alt=" " width="800" height="152"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Max&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl30z85z4b781x5j3gvq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl30z85z4b781x5j3gvq8.png" alt=" " width="800" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3: VALUE/OFFSET&lt;/strong&gt; - lag, lead - Accesses data from rows before, after or at specific points in the window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;lag&lt;/strong&gt;&lt;br&gt;
Used to compare rows i.e. comparing current value to previous value&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f3b4k0nfzmwkesqxsiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f3b4k0nfzmwkesqxsiu.png" alt=" " width="800" height="169"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;lead&lt;/strong&gt;&lt;br&gt;
looks forward to the next row or a specified number of rows ahead.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9twpy9wpx3ye5fpgpat2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9twpy9wpx3ye5fpgpat2.png" alt=" " width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>database</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Translating Messy Data into Action Using Power BI.</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Sun, 08 Feb 2026 10:58:03 +0000</pubDate>
      <link>https://forem.com/murimikelvin/translating-messy-data-into-action-using-power-bi-3a02</link>
      <guid>https://forem.com/murimikelvin/translating-messy-data-into-action-using-power-bi-3a02</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qk8zrz8lv72fjnodwcg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qk8zrz8lv72fjnodwcg.png" alt=" " width="152" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Power BI Desktop is a free Windows application where you create reports and &lt;br&gt;
dashboards. You connect your data, clean it, build visualizations, and save your reports.&lt;/p&gt;

&lt;h2&gt;
  
  
  Power BI Interface.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Ribbon&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Home tab: Get data, Transform data, New Measure, Publish.&lt;/li&gt;
&lt;li&gt;modelling tab: Manage relationships, Insert tab, Add visuals. &lt;/li&gt;
&lt;li&gt;view tab: Control layout, options.&lt;/li&gt;
&lt;li&gt;Format tab: Customize visuals. &lt;/li&gt;
&lt;li&gt;Insert: used to add visual and design elements to your report page&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fields pane&lt;/strong&gt;&lt;br&gt;
Displays all data fields in your model. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visualizations Pane&lt;/strong&gt; &lt;br&gt;
Select and configure charts, tables, visuals.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Views&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Report View: Create and visualize reports.
&lt;/li&gt;
&lt;li&gt;Table View: View data in tabular format.
&lt;/li&gt;
&lt;li&gt;Model View: Manage table relationships. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Uploading Excel files on Power BI.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Load the Data&lt;/strong&gt; &lt;br&gt;
Open Power BI desktop then click on get data then Excel workbook. Select the excel file you intend to use then load the file. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkt9d5qnxol26x2j9shp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkt9d5qnxol26x2j9shp.png" alt=" " width="800" height="537"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transform Data&lt;/strong&gt;&lt;br&gt;
Power Query is used to clean, transform, and prepare raw data before analysis.&lt;br&gt;
From the home tab, click on transform data to open power query.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47eigzgvgonk9htysvv4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47eigzgvgonk9htysvv4.png" alt=" " width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Transformations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Rename Queries&lt;/strong&gt; &lt;br&gt;
Give your query a meaningful name for easier reference.&lt;br&gt;
In Power Query right-click the default query name (e.g., Sheet1). Click rename then type a meaningful name like KenyaCropsData or HospitalAdmissionsData. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remove Rows&lt;/strong&gt; &lt;br&gt;
Helps clean blank or unwanted rows.&lt;br&gt;&lt;br&gt;
Options under remove rows:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove top rows – e.g., remove header rows accidentally repeated.
&lt;/li&gt;
&lt;li&gt;Remove bottom rows – to remove totals or notes at the bottom.
&lt;/li&gt;
&lt;li&gt;Remove blank rows – for completely empty rows.
&lt;/li&gt;
&lt;li&gt;Remove errors – from a column after converting to correct data types.
&lt;/li&gt;
&lt;li&gt;Remove duplicates – ensure data uniqueness. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Keep Rows&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Opposite of "Remove"; helps retain only what you need.&lt;br&gt;&lt;br&gt;
Options under keep rows  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep Top Rows – e.g., top 10 hospitals or counties.
&lt;/li&gt;
&lt;li&gt;Keep Bottom Rows
&lt;/li&gt;
&lt;li&gt;Keep Range of Rows
&lt;/li&gt;
&lt;li&gt;Keep Duplicates – to analyze duplicate entries. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filter Rows&lt;/strong&gt; &lt;br&gt;
Exclude invalid entries like “N/A”, “None”, or “Error”.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the filter icon on the column.
&lt;/li&gt;
&lt;li&gt;Uncheck values like Error, N/A, None.
&lt;/li&gt;
&lt;li&gt;You can also apply Text Filters, Number Filters, or Date Filters for specific conditions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Remove Blank or Error Rows&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select the first few rows to check for any completely blank rows or column headers repeated.
&lt;/li&gt;
&lt;li&gt;Use "Remove Rows" &amp;gt; "Remove Blank Rows" if any are found.
&lt;/li&gt;
&lt;li&gt;Also use "Remove Errors" on numeric columns like "Yield", "Market Price" etc., after converting them to proper types 
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsc34hmn6kzubkcax5u4j.png" alt=" " width="800" height="190"&gt;
you can check for blank rows by clicking on the dropdown next to the column name and checking from the list.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Data Types&lt;/strong&gt; &lt;br&gt;
For each column, set the correct data type:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text: Farmer Name, County, Crop Type, Season, Crop Variety, Soil Type, Pest Control, etc.
&lt;/li&gt;
&lt;li&gt;Decimal Number: Planted Area, Yield (Kg), Market Price, Revenue, Cost of Production, Profit.
&lt;/li&gt;
&lt;li&gt;Whole Number or Decimal: Farmer Code, depending on use.
&lt;/li&gt;
&lt;li&gt;Date: Planting Date, Harvest Date.
&lt;/li&gt;
&lt;li&gt;Use Transform &amp;gt; Detect Data Type to auto-detect, then adjust manually if needed. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rename Columns&lt;/strong&gt;&lt;br&gt;
Click on each column header and rename using friendly names&lt;br&gt;&lt;br&gt;
Fix typos like Crop Varie to crop variety, farmer Na to Farmer Name. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handling Errors.&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Columns like “Yield”, “Crop Variety”, “Fertilizer Used” have errors like "Error" or invalid values.
&lt;/li&gt;
&lt;li&gt;For each column click the filter dropdown, deselect "Error", "N/A", or "None" if those aren't valid options. &lt;/li&gt;
&lt;li&gt;Or use Replace Values to change "Error" to null.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Replacing problematic values&lt;/strong&gt;&lt;br&gt;
Use Transform - replace Values:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replace "Error" with "N/A" or null.
&lt;/li&gt;
&lt;li&gt;Replace "None" with null (if not meaningful)
&lt;/li&gt;
&lt;li&gt;"Short Rain" with Short Rains, "Long Rain" with Long Rains for consistency.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Format Text Data
&lt;/h2&gt;

&lt;p&gt;Standardize formatting of names and categories. &lt;br&gt;
Tools under Transform &amp;gt; Format:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uppercase – for consistency (e.g., county names).
&lt;/li&gt;
&lt;li&gt;Lowercase
&lt;/li&gt;
&lt;li&gt;Capitalize Each Word – best for names like "Crop Type", "County".
&lt;/li&gt;
&lt;li&gt;Trim – removes leading/trailing spaces.
&lt;/li&gt;
&lt;li&gt;Clean – removes non-printable characters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Standardize text capitalization.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
To capitalize the first letter, whole word in uppercase or lower case, clean or trim.&lt;br&gt;
Use Transform - Format - capitalize each word or select the preferred format.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remove Duplicates&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If each row must be unique (e.g., by Farmer + Crop Type + Season), select those columns then remove duplicates. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reorder or group columns&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can move related columns together (e.g., financials: Revenue, Cost, Profit). &lt;/li&gt;
&lt;li&gt;Click and drag columns to reorder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Remove Columns/Rows&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Right click column header then remove
&lt;/li&gt;
&lt;li&gt;Filter column (e.g. exclude "Paid" rows) &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Split Columns&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
• Split Name column by space to get First and Last Name&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Merge Columns&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
• Select First and Last Name &amp;gt; Merge &amp;gt; Add separator then OK &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Group By&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Aggregate data (e.g., total Revenue by County).  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select the column to group by (e.g., County).
&lt;/li&gt;
&lt;li&gt;Go to Transform &amp;gt; Group By.
&lt;/li&gt;
&lt;li&gt;Choose aggregation: Sum, Count, Average, etc.
&lt;/li&gt;
&lt;li&gt;Add additional groupings if needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sort&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Organize your data logically.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click column &amp;gt; Home &amp;gt; Sort Ascending or Descending.
&lt;/li&gt;
&lt;li&gt;Can sort alphabetically (A-Z), numerically, or by date.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Deal with Blanks in Power Query
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Identify Blank Cells&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Blanks show up as:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Empty cells (null values).
&lt;/li&gt;
&lt;li&gt;Cells with “N/A”, “None”, “Error” — these are not technically blank, but should be treated as such. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to Find Them:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the filter icon on any column.
&lt;/li&gt;
&lt;li&gt;If blanks exist, you'll see (null) as a filter option.
&lt;/li&gt;
&lt;li&gt;look for custom errors like "Error", "N/A", "None" especially in text columns. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Replace Blanks with Default or Meaningful Values&lt;/strong&gt; &lt;br&gt;
Use when the column is important and must not be left empty.&lt;br&gt;&lt;br&gt;
Select the column - go to Transform - Replace Values, Replace null with:&lt;br&gt;&lt;br&gt;
"Unknown" for text, 0 for numbers or "No Data" or "Not Provided" for descriptions.&lt;br&gt;&lt;br&gt;
&lt;em&gt;You can also use Transform &amp;gt; Replace Errors for error-based blanks.&lt;/em&gt;  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fill Down or Fill Up&lt;/strong&gt; &lt;br&gt;
Use when the blank value should be copied from the row above or below.&lt;br&gt;&lt;br&gt;
Example Use Case: If a “County” or “Farmer Name” is listed only once and applies to several &lt;br&gt;
rows.&lt;br&gt;&lt;br&gt;
Select the column.&lt;br&gt;&lt;br&gt;
Go to Transform &amp;gt; Fill &amp;gt; Down or Up.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remove Rows With Blanks&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Use when:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The blank row has missing critical info (like Revenue, Yield, or Crop Type). &lt;/li&gt;
&lt;li&gt;The row has too many blanks and is not usable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After cleaning&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click Close &amp;amp; Apply to load your cleaned data into Power BI.
&lt;/li&gt;
&lt;li&gt;Always check the resulting table in the data view for issues.
&lt;/li&gt;
&lt;li&gt;Ensure your work is saved.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  DAX
&lt;/h1&gt;

&lt;p&gt;DAX (Data Analysis Expressions) is a formula language used in Power BI to create &lt;br&gt;
calculations and data analysis logic. DAX is used to build measures, calculated columns, and &lt;br&gt;
calculated tables that help transform raw data into meaningful insights. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aggregation&lt;/strong&gt;&lt;br&gt;
Aggregation is the process of combining multiple rows of data into a single summarized value. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Types of Aggregation Functions in DAX&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple aggregation functions (work directly on a column) &lt;/li&gt;
&lt;li&gt;Iterator aggregation functions (end with X and work row by row on an expression)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SUM&lt;/strong&gt;&lt;br&gt;
SUM adds all numeric values in a column.&lt;br&gt;
SUM works on one numeric column and returns the total based on the current filters. &lt;br&gt;
&lt;code&gt;Total Revenue = SUM('Kenya Crops'[Revenue (KES)])&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;SUMX&lt;/strong&gt; &lt;br&gt;
SUMX evaluates an expression for each row in a table and then sums those results.&lt;br&gt;
SUMX is used when the value you want to sum is not stored as a single column but must be calculated row by row first. &lt;br&gt;
&lt;code&gt;Total Revenue (SUMX) = SUMX('Kenya Crops', 'Kenya Crops'[Yield (Kg)] * 'Kenya Crops'[Market Price (KES/Kg)])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AVERAGE&lt;/strong&gt; &lt;br&gt;
AVERAGE calculates the mean of a numeric column. &lt;br&gt;
&lt;code&gt;Average Yield = AVERAGE('Kenya Crops'[Yield (Kg)])&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;AVERAGEX&lt;/strong&gt; &lt;br&gt;
AVERAGEX evaluates an expression for each row and then returns the average of those results.&lt;br&gt;
&lt;code&gt;Average Profit per Acre = &lt;br&gt;
AVERAGEX('Kenya Crops', DIVIDE('Kenya Crops'[Profit (KES)], 'Kenya Crops'[Planted Area (Acres)]))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MEDIAN&lt;/strong&gt; &lt;br&gt;
MEDIAN returns the middle value in a column when the values are sorted. &lt;br&gt;
&lt;code&gt;Median Yield = MEDIAN('Kenya Crops'[Yield (Kg)])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MEDIANX&lt;/strong&gt;&lt;br&gt;
MEDIANX evaluates an expression for each row and then returns the middle value of the results&lt;br&gt;
&lt;code&gt;Median Revenue = MEDIANX('Kenya Crops', 'Kenya Crops'[Revenue (KES)])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MIN&lt;/strong&gt; &lt;br&gt;
MIN returns the smallest value in a column.&lt;br&gt;
 &lt;code&gt;Minimum Yield = MIN('Kenya Crops'[Yield (Kg)])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MINX&lt;/strong&gt; &lt;br&gt;
MINX evaluates an expression for each row and returns the smallest result. &lt;br&gt;
&lt;code&gt;Minimum Profit per Acre = MINX( 'Kenya Crops', DIVIDE( 'Kenya Crops'[Profit (KES)], 'Kenya Crops'[Planted Area(Acres)]))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAX&lt;/strong&gt; &lt;br&gt;
MAX returns the largest value in a column.&lt;br&gt;
&lt;code&gt;Maximum Yield = MAX('Kenya Crops'[Yield (Kg)])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAXX&lt;/strong&gt;&lt;br&gt;
MAXX evaluates an expression for each row and returns the largest result. &lt;br&gt;
&lt;code&gt;Maximum Profit per Acre = MAXX('Kenya Crops', DIVIDE('Kenya Crops'[Profit (KES)], 'Kenya Crops'[Planted Area (Acres)]))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;COUNT&lt;/strong&gt; counts numeric values in a single column.&lt;br&gt;
&lt;strong&gt;COUNTROWS&lt;/strong&gt; counts the number of rows in a table.&lt;br&gt;
&lt;strong&gt;COUNTX&lt;/strong&gt; counts rows where an expression returns a non-blank value.&lt;br&gt;
&lt;strong&gt;ABS&lt;/strong&gt; (Absolute Value) returns the absolute (positive) value of a number. &lt;br&gt;
&lt;strong&gt;POWER&lt;/strong&gt; raises a number to a given power.&lt;br&gt;
&lt;strong&gt;SQRT&lt;/strong&gt; (Square Root) returns the square root of a number.&lt;br&gt;
&lt;strong&gt;MOD&lt;/strong&gt; returns the remainder after division. &lt;/p&gt;

&lt;h1&gt;
  
  
  LOGICAL FUNCTIONS IN DAX.
&lt;/h1&gt;

&lt;p&gt;Logical functions in DAX are used to make decisions based on conditions. They allow Power BI &lt;br&gt;
to answer “yes or no” questions, classify data into categories, apply business rules, and control &lt;br&gt;
how results are calculated and displayed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IF FUNCTION&lt;/strong&gt;&lt;br&gt;
 Evaluates a condition and returns one value if the condition is true and another value if the condition is false.&lt;br&gt;
&lt;code&gt;Profit Status = IF(SUM('Kenya Crops'[Profit (KES)]) &amp;gt; 0, "Profitable", "Loss")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NESTED IF STATEMENTS&lt;/strong&gt;&lt;br&gt;
A nested IF occurs when one IF function is placed inside another IF. This is used when more than two outcomes are required. &lt;code&gt;Profit Level = IF('Kenya Crops'[Profit (KES)] &amp;lt; 0, "Loss", IF('Kenya Crops'[Profit (KES)] &amp;lt; 50000, "Low Profit", "High Profit"))&lt;/code&gt;&lt;br&gt;
This logic first checks for losses. If the farm is not making a loss, it then checks whether profit is &lt;br&gt;
low or high. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IF WITH AND (AND FUNCTION)&lt;/strong&gt; &lt;br&gt;
The AND function is used when all conditions must be true for a result to be returned. &lt;br&gt;
&lt;code&gt;High Yield &amp;amp; Profitable = IF( AND( 'Kenya Crops'[Yield (Kg)] &amp;gt; 2000, 'Kenya Crops'[Profit (KES)] &amp;gt; 0 ), "Yes", "No")&lt;/code&gt;&lt;br&gt;
Here, a farm is marked “Yes” only if it has both high yield and positive profit. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IF WITH &amp;amp;&amp;amp; (LOGICAL AND OPERATOR)&lt;/strong&gt;&lt;br&gt;
The &amp;amp;&amp;amp; operator performs the same function as AND, but it is more concise and commonly used in professional DAX code.&lt;br&gt;
&lt;code&gt;High Yield Large Farm = IF('Kenya Crops'[Yield (Kg)] &amp;gt; 2000 &amp;amp;&amp;amp;'Kenya Crops'[Planted Area (Acres)] &amp;gt; 10, "Qualified", "Not Qualified")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IF WITH OR (OR FUNCTION)&lt;/strong&gt;&lt;br&gt;
The OR function is used when at least one condition must be true. &lt;br&gt;
&lt;code&gt;Risk Category = IF(OR('Kenya Crops'[Profit (KES)] &amp;lt; 0, 'Kenya Crops'[Weather Impact] = "Severe" ), "High Risk", "Normal")&lt;/code&gt;&lt;br&gt;
A farm is classified as high risk if it made a loss or experienced severe weather.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IF WITH || (LOGICAL OR OPERATOR)&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;High Risk Farm = IF('Kenya Crops'[Profit (KES)] &amp;lt; 0 || 'Kenya Crops'[Weather Impact] = "Severe", "High Risk", "Low Risk")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;THE SWITCH FUNCTION&lt;/strong&gt;&lt;br&gt;
The SWITCH function is a cleaner and more readable alternative to nested IF statements. It is &lt;br&gt;
ideal when there are many possible outcomes.&lt;br&gt;
&lt;code&gt;Profit Category = SWITCH(TRUE(), 'Kenya Crops'[Profit (KES)] &amp;lt; 0, "Loss", 'Kenya Crops'[Profit (KES)] &amp;lt; 50000, "Low Profit", 'Kenya Crops'[Profit (KES)] &amp;lt; 200000, "Medium Profit", "High Profit")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOT FUNCTION&lt;/strong&gt; &lt;br&gt;
The NOT function reverses a logical condition.&lt;br&gt;
&lt;code&gt;Irrigation Type = IF(NOT('Kenya Crops'[Irrigation Method] = "Irrigated"), "Rain-fed", "Irrigated")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ISBLANK FUNCTION&lt;/strong&gt; &lt;br&gt;
The ISBLANK function checks whether a value is blank.&lt;br&gt;
&lt;code&gt;Yield Availability = IF(ISBLANK('Kenya Crops'[Yield (Kg)]), "Missing", "Available")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CALCULATE FUNCTION&lt;/strong&gt;&lt;br&gt;
The CALCULATE function is the most important function in DAX. It evaluates an expression under a modified filter context. &lt;br&gt;
In simple terms, CALCULATE changes “what data is being used” for a calculation.&lt;br&gt;
&lt;code&gt;Maize Revenue = CALCULATE( SUM('Kenya Crops'[Revenue (KES)]), 'Kenya Crops'[Crop Type] = "Maize")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CALCULATE WITH MULTIPLE FILTERS&lt;/strong&gt; &lt;br&gt;
CALCULATE can accept more than one filter. All filters are combined using AND logic, &lt;br&gt;
meaning all conditions must be true.&lt;br&gt;
&lt;code&gt;Maize Long Rains Revenue = CALCULATE(SUM('Kenya Crops'[Revenue (KES)]), 'Kenya Crops'[Crop Type] = "Maize", 'Kenya Crops'[Season] = "Long Rains")&lt;/code&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  DAX TEXT FUNCTIONS
&lt;/h1&gt;

&lt;p&gt;They help clean text, create readable labels, combine fields, and standardize values for analysis and &lt;br&gt;
reporting.&lt;br&gt;
&lt;strong&gt;CONCATENATE / CONCATENATEX&lt;/strong&gt;&lt;br&gt;
CONCATENATE joins two text values into one. Used to create readable labels, combine fields, or build descriptive columns.&lt;br&gt;
&lt;code&gt;County Crop = 'Kenya Crops'[County] &amp;amp; " - " &amp;amp; 'Kenya Crops'[Crop Type]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LEFT, RIGHT, MID&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LEFT extracts characters from the start. &lt;/li&gt;
&lt;li&gt;RIGHT extracts characters from the end. &lt;/li&gt;
&lt;li&gt;MID extracts text from the middle.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Extract first 3 letters &lt;br&gt;
&lt;code&gt;Crop Code = LEFT('Kenya Crops'[Crop Type], 3)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Extracts last 4 digits&lt;br&gt;
&lt;code&gt;Farmer Code Short = RIGHT('Kenya Crops'[Farmer Code], 4)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UPPER, LOWER, PROPER&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UPPER → all caps &lt;/li&gt;
&lt;li&gt;LOWER → all lowercase &lt;/li&gt;
&lt;li&gt;PROPER → first letter capitalized&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Standardize county names&lt;br&gt;
&lt;code&gt;County Clean = UPPER('Kenya Crops'[County])&lt;/code&gt;&lt;br&gt;
Make crop types readable&lt;br&gt;
&lt;code&gt;Crop Type Clean = PROPER('Kenya Crops'[Crop Type])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LEN, TRIM, CLEAN&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LEN counts characters &lt;/li&gt;
&lt;li&gt;TRIM removes extra spaces &lt;/li&gt;
&lt;li&gt;CLEAN removes hidden non-printable characters &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Removes extra spaces&lt;br&gt;
&lt;code&gt;County Cleaned = TRIM('Kenya Crops'[County])&lt;/code&gt;&lt;br&gt;
Checks text length&lt;br&gt;
&lt;code&gt;County Length = LEN('Kenya Crops'[County])&lt;/code&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  DATE AND TIME FUNCTIONS
&lt;/h1&gt;

&lt;p&gt;Date and time functions are used to work with dates, extract parts of dates, and perform date&lt;br&gt;
based calculations. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DATE, YEAR, MONTH, DAY&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DATE creates a date &lt;/li&gt;
&lt;li&gt;YEAR extracts the year &lt;/li&gt;
&lt;li&gt;MONTH extracts the month &lt;/li&gt;
&lt;li&gt;DAY extracts the day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Extracts tear from planting date&lt;br&gt;
&lt;code&gt;Planting Year = YEAR('Kenya Crops'[Planting Date])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TODAY and NOW&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TODAY returns today’s date &lt;/li&gt;
&lt;li&gt;NOW returns current date and time&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  TIME INTELLIGENCE FUNCTIONS
&lt;/h2&gt;

&lt;p&gt;Time intelligence functions allow you to compare performance across time periods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DATEADD&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
DATEADD shifts the current date context backward or forward.&lt;br&gt;
&lt;code&gt;Revenue Last Year = CALCULATE([Total Revenue], DATEADD('Date'[Date], -1, YEAR))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DATEDIFF&lt;/strong&gt;&lt;br&gt;
Calculates the difference between two dates&lt;br&gt;
&lt;code&gt;Growth Days = DATEDIFF( 'Kenya Crops'[Planting Date], 'Kenya Crops'[Harvest Date], DAY )&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SAMEPERIODLASTYEAR&lt;/strong&gt;&lt;br&gt;
Returns the same period as the current one, but last year.&lt;br&gt;
&lt;code&gt;Revenue last year = CALCULATE([Total Revenue], SAMEPERIODLASTYEAR('Date'[Date]))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TOTALYTD, TOTALMTD, TOTALQTD&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Year-to-date &lt;/li&gt;
&lt;li&gt;Month-to-date &lt;/li&gt;
&lt;li&gt;Quarter-to-date&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Year-to-date revenue&lt;br&gt;
&lt;code&gt;Revenue YTD = TOTALYTD([Total Revenue], 'Date'[Date])&lt;/code&gt;&lt;br&gt;
Month-to-date profit &lt;br&gt;
&lt;code&gt;Profit MTD = TOTALMTD( [Total Profit], 'Date'[Date])&lt;/code&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Dashboards
&lt;/h1&gt;

&lt;p&gt;A dashboard is a single-page, interactive summary view of your most important data and KPIs.&lt;br&gt;
Dashboards helps you monitor performance, track metrics, make quick decisions and share insights easily.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjj0lv39t297mxfsda0n5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjj0lv39t297mxfsda0n5.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From Report view on visualizations select the presentation you would like to use. i.e. column charts, pie charts, slicers, bar charts, tables, etc.&lt;br&gt;
From data select the data you want on the x and y axis. i.e. county, Total revenue, Harvest date.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5jmtrm78s2k499hyztw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5jmtrm78s2k499hyztw.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add KPIs using the card visualization and slicers using the slicer.&lt;/li&gt;
&lt;li&gt;Create a dashboard by copying visualizations from one sheet to the Dashboard sheet. &lt;/li&gt;
&lt;li&gt;Ensure your dashboard is not redundant. The visualizations should show different comparisons and should have a theme.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpz0q0xbcfo7nu66y2mq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpz0q0xbcfo7nu66y2mq.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>beginners</category>
      <category>microsoft</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Schemas and Data modelling in Power BI.</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Sun, 01 Feb 2026 16:56:17 +0000</pubDate>
      <link>https://forem.com/murimikelvin/schemas-and-data-modelling-in-power-bi-4184</link>
      <guid>https://forem.com/murimikelvin/schemas-and-data-modelling-in-power-bi-4184</guid>
      <description>&lt;p&gt;In Power BI, schemas and data modelling are about how you structure tables and define relationships so your data is accurate, fast, and easy to analyze.&lt;/p&gt;

&lt;p&gt;A &lt;em&gt;&lt;strong&gt;schema&lt;/strong&gt;&lt;/em&gt; describes how tables are organized and related. It is framework that defines how data is structured, organized, and related within a system.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Data modelling&lt;/strong&gt;&lt;/em&gt; is the process of connecting multiple data sources, defining relationships between them, and creating calculations to transform raw data into a structured model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Dimensional schemas&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Star schema.&lt;/li&gt;
&lt;li&gt;Snowflake schema.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Star schema.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F555cd1cn0p4jlrjgpvng.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F555cd1cn0p4jlrjgpvng.png" alt=" " width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Star Schema is the industry standard data modelling approach for Power BI. It gets its name from its visual layout: a central fact table surrounded by dimension tables, resembling a star.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft14bj9p3c9mj9rix2aj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft14bj9p3c9mj9rix2aj0.png" alt=" " width="587" height="579"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Components of a Star schema&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Fact table stores&lt;/strong&gt; quantitative or measurable data.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Dimension table&lt;/strong&gt; stores descriptive data e.g. date, product name.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Relationships&lt;/strong&gt; describe how fact tables and dimension tables are connected and how filters flow between them.
Features of Dimension Tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fields.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Primary Keys&lt;/strong&gt;&lt;br&gt;
Every record in a dimension table is uniquely identified by its main key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attributes&lt;/strong&gt;&lt;br&gt;
These are descriptive fields that provide context.&lt;br&gt;
Features of Fact Tables&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Foreign Keys&lt;/strong&gt;&lt;br&gt;
The keys connect to dimension tables and offer context for the measures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measures&lt;/strong&gt;&lt;br&gt;
The provided figures are numerical data that quantify business performance indicators, such as sales income and units sold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Relationships in star schema&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;One-to-Many - one row from a table is linked to many rows in another table.&lt;/li&gt;
&lt;li&gt;One-to-One - Each row in one table matches one row in another table, and vice versa.&lt;/li&gt;
&lt;li&gt;Many-to-One - many rows in a table relate to one row in another table.&lt;/li&gt;
&lt;li&gt;Many-to-Many - multiple rows from one table match multiple rows in another table.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Normalization and Denormalization.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Normalization&lt;/strong&gt; is the term used to describe data that's stored in a way that reduces redundancy.&lt;br&gt;
&lt;strong&gt;Denormalization&lt;/strong&gt; is the process of flattening related dimension data into single wide tables to improve analytical performance and simplify reporting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Snowflake schema.&lt;/strong&gt;&lt;br&gt;
A snowflake schema is a variation of the star schema where dimension tables are normalized into multiple related tables instead of being kept as one wide table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2cl82hxfads3l3cf71x6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2cl82hxfads3l3cf71x6.png" alt=" " width="244" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features of snowflake schema&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normalized dimension tables - dimension tables are normalized into third normal form (3NF) or higher splitting them into smaller and granular sub tables.&lt;/li&gt;
&lt;li&gt;Hierarchical structure - dimensions are organized into hierarchical layers (e.g. product - category -manufacturer) which naturally represents complex business relationships.&lt;/li&gt;
&lt;li&gt;Reduced data redundancy - by eliminating repeated values in dimension tables, this schema significantly reduces the risk of data inconsistencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Components of snowflake schema.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fact table is the core of the schema storing the primary quantitative data or measures of business processes. It contains multiple foreign keys that link directly to the primary dimension tables. It captures events at their most atomic level, typically resulting in the largest table in the warehouse.&lt;/li&gt;
&lt;li&gt;Normalized dimension tables that surround the fact table and provide descriptive context for the recorded facts.
Unlike the star schema these tables are normalized meaning repeated descriptive values are moved into separate tables reducing redundancy.&lt;/li&gt;
&lt;li&gt;Lookup function that acts as the first level of description for the fact table.&lt;/li&gt;
&lt;li&gt;Sub dimension tables are formed by further splitting or snowflaking the main dimension tables.&lt;/li&gt;
&lt;li&gt;Hierarchical layers represent multiple levels of a hierarchy. e.g. a product dimension may link to a category table which then links to a manufacturer table.&lt;/li&gt;
&lt;li&gt;Maintenance enhance data integrity because updates to a category name that only need to happen in one record rather than multiple records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key fields in Snowflake schema.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primary keys&lt;/strong&gt; are unique identifiers in dimension and sub dimension tables e.g. ProductID.&lt;br&gt;
&lt;strong&gt;Foreign Keys&lt;/strong&gt; are fields in the fact table or dimension tables that reference a primary key in a related table to form a join.&lt;br&gt;
&lt;strong&gt;Surrogate keys&lt;/strong&gt; are system generated integers used as keys instead of natural business keys to improve performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Importance of good modelling.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proper modelling prevents logical errors in relationships that can lead to double counting or missing data.&lt;/li&gt;
&lt;li&gt;Clear naming conventions and hidden technical fields make the model easier for non technical users to navigate.&lt;/li&gt;
&lt;li&gt;Simple relationships between a central fact table and surrounding dimension tables reduce ambiguity.&lt;/li&gt;
&lt;li&gt;Centralized dimension tables ensure every visual uses the same definitions preventing different charts from showing conflicting totals for the same metrics.&lt;/li&gt;
&lt;li&gt;A modular model allows you to add new data sources or dimensions without rebuilding the entire system.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>analytics</category>
      <category>architecture</category>
      <category>data</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Introduction to MS Excel for data analytics</title>
      <dc:creator>Kelvin</dc:creator>
      <pubDate>Fri, 23 Jan 2026 17:24:07 +0000</pubDate>
      <link>https://forem.com/murimikelvin/introduction-to-ms-excel-for-data-analytics-2n1f</link>
      <guid>https://forem.com/murimikelvin/introduction-to-ms-excel-for-data-analytics-2n1f</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxe73k6elv4t5a7vlw56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxe73k6elv4t5a7vlw56.png" alt=" " width="300" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Overview.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Microsoft Excel is a spreadsheet program used to organize, analyze, calculate, and visualize data in rows and columns. Its core structure uses cells, where data is entered, allowing for complex calculations and data insights, making it essential for professionals in many fields. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Excel interface&lt;/strong&gt;.
&lt;/h3&gt;

&lt;p&gt;Excel's user interface is designed for efficiency, with a grid-based workspace where data is organized into rows and columns. &lt;br&gt;
A &lt;strong&gt;workbook&lt;/strong&gt; is the file containing multiple tabs at the bottom&lt;br&gt;
A &lt;strong&gt;worksheet&lt;/strong&gt; is a grid of cells identified by column letters A,B,C and row numbers 1,2,3.forming addresses like A2,b3.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uta4ih1olrwm9h3jaqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uta4ih1olrwm9h3jaqk.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Ribbon&lt;/strong&gt; is the top toolbar with tabs like Home, Insert, Formulas, Data, and Review.&lt;br&gt;
The &lt;strong&gt;Formula bar&lt;/strong&gt; displays the content of the selected cell.&lt;br&gt;
The &lt;strong&gt;Name box&lt;/strong&gt; shows the active cell's address or named ranges.&lt;br&gt;
The &lt;strong&gt;Status bar&lt;/strong&gt; at the bottom, it provides quick stats like sum or average of selected cells.&lt;/p&gt;

&lt;h2&gt;
  
  
  Removing duplicates.
&lt;/h2&gt;

&lt;p&gt;In Excel, you can remove duplicates by selecting your data, going to the Data tab, clicking Remove Duplicates, choosing the columns to check, and confirming to keep only unique records&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7tsyxyl9nmb10ebjoo0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7tsyxyl9nmb10ebjoo0s.png" alt=" " width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conditional formatting.
&lt;/h2&gt;

&lt;p&gt;Conditional formatting in Excel lets you automatically change the appearance of cells (colour, icons, data bars) based on rules, making patterns and issues easy to spot. You apply it by selecting your data, going to Home then conditional Formatting, choosing a rule such as highlighting values greater than a target, and setting the desired format.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgan2gtpynp7hd70thf7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgan2gtpynp7hd70thf7o.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Freeze panes.
&lt;/h2&gt;

&lt;p&gt;Freeze Panes in Excel keeps specific rows or columns visible while you scroll through your worksheet, making large datasets easier to read. You use it by selecting a cell below the row and to the right of the column you want to keep visible.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4lm2nj04t67p0mpt9jn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4lm2nj04t67p0mpt9jn.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data filter
&lt;/h2&gt;

&lt;p&gt;A data filter in Excel lets you display only the rows that meet specific criteria, making it easier to analyze large datasets. You apply it by selecting your data and clicking Data then Filter, then using the dropdown arrows in the column headers to filter by values, text, numbers, dates, or conditions.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppd97ro99j6vt2si1mw6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppd97ro99j6vt2si1mw6.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data sorting.
&lt;/h2&gt;

&lt;p&gt;Data sorting in Excel arranges data in a specific order—such as A to Z, Z to A, smallest to largest, or by date—to make information easier to analyze. You sort data by selecting a column, going to Data then Sort, choosing the sort order and if needed multiple levels for more complex sorting&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sgi4evrhe4mliadjbvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sgi4evrhe4mliadjbvq.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Excel functions.
&lt;/h2&gt;

&lt;p&gt;Microsoft Excel functions are predefined formulas that perform specific calculations or operations on data. They are essential for data analytics, enabling users to manipulate, analyze, and visualize datasets efficiently without writing custom code from scratch&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of excel functions.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Aggregate functions
&lt;/h2&gt;

&lt;p&gt;Aggregate functions summarize multiple rows of data into a single meaningful value.&lt;br&gt;
&lt;code&gt;sum =SUM(B2:B100)&lt;/code&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngma0al0lg58xgrwbqw4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngma0al0lg58xgrwbqw4.png" alt=" " width="800" height="614"&gt;&lt;/a&gt;&lt;br&gt;
average &lt;code&gt;=AVERAGE(B2:B100)&lt;/code&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdt5xp0yca619lew4ec4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdt5xp0yca619lew4ec4i.png" alt=" " width="800" height="685"&gt;&lt;/a&gt;&lt;br&gt;
maximum &lt;code&gt;=MAX(B2:B100)&lt;/code&gt;&lt;br&gt;
minimum &lt;code&gt;=MIN(B2:B100)&lt;/code&gt;&lt;br&gt;
count &lt;code&gt;=COUNT(B2:B100)&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Conditional aggregation&lt;/strong&gt;&lt;br&gt;
Conditional aggregation is the process of summarizing data only when specific conditions are met.&lt;br&gt;
SUMIF() aggregates values based on a single condition.&lt;br&gt;
SUMIFS() aggregates values based on multiple conditions.&lt;br&gt;
COUNTIF Used to count records that meet one or more conditions.&lt;br&gt;
COUNTIFS Used to count records that meet multiple conditions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logical functions
&lt;/h2&gt;

&lt;p&gt;logical functions automate decision making.&lt;br&gt;
&lt;strong&gt;IF&lt;/strong&gt;&lt;br&gt;
If function performs a logical test and returns value if the test is true.&lt;br&gt;
     &lt;code&gt;=IF(J2&amp;gt;6, "high performance", "low performance")&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Nested IF&lt;/strong&gt;&lt;br&gt;
A nested if is simply an if formula inside another if formula. It lets you test multiple conditions.&lt;br&gt;
  &lt;code&gt;=IF(R2&amp;gt;30,"Highly Experienced",IF(R2&amp;gt;20,"Moderately Experienced",IF(R2&amp;gt;10,"Low Experience","Very Low Experience")))&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;And&lt;/strong&gt;&lt;br&gt;
And function returns value when two conditions have to be met.&lt;br&gt;
   &lt;code&gt;=IF(AND(R2&amp;gt;30,W2&amp;gt;10),"assign bonus", "do not assign")&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Or&lt;/strong&gt; &lt;br&gt;
Or function returns value when either of the conditions have been met&lt;br&gt;
   &lt;code&gt;=IF(AND(G2&amp;gt;1/1/2006, L2&amp;gt;6), "promoted", "not promoted")&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lookup functions.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;VLOOKUP&lt;/strong&gt; (vertical lookup) function is used to look up a value in the first column of a table and return a related value from another column.&lt;br&gt;
        &lt;code&gt;=VLOOKUP(10871, A2:AA877, 5, FALSE)&lt;/code&gt; 5 is column index number.&lt;br&gt;
&lt;strong&gt;HLOOKUP&lt;/strong&gt; (Horizontal Lookup) is used when your lookup values are in the top row of a table and you want to return a value from a row below.&lt;br&gt;
        &lt;code&gt;=HLOOKUP(10011, A1:AGS21, 4, FALSE)&lt;/code&gt; &lt;br&gt;
&lt;em&gt;10011 is data in a cell&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;INDEX&lt;/strong&gt;&lt;br&gt;
  Index function returns the value from a actual position in a range.&lt;br&gt;
        &lt;code&gt;=INDEX(G2:G877, MATCH(10871, F2:F877, 0))&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;MATCH&lt;/strong&gt;&lt;br&gt;
Match function Finds the position of a value in a range&lt;/p&gt;

&lt;h2&gt;
  
  
  Text functions.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trim&lt;/strong&gt; removes extra spaces. &lt;code&gt;=TRIM(A2)&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Len&lt;/strong&gt; returns length of a string.&lt;code&gt;=LEN(A2)&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Left&lt;/strong&gt; extracts the leftmost characters of a string. &lt;code&gt;=LEFT(A2,4)&lt;/code&gt;returns first 4 characters.&lt;br&gt;
&lt;strong&gt;Right&lt;/strong&gt; function extracts the rightmost characters of a string. &lt;code&gt;=RIGHT(A2,2)&lt;/code&gt; returns last 2 characters.&lt;br&gt;
&lt;strong&gt;Mid&lt;/strong&gt; extracts characters from the middle of a string. &lt;br&gt;
&lt;code&gt;=MID(b2,2,3)&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Concatenation&lt;/strong&gt;  used to combine two texts. &lt;br&gt;
&lt;code&gt;=CONCAT(C2, " ", C3&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Date and Time functions.
&lt;/h2&gt;

&lt;p&gt;Now function returns the current date and time and updates automatically. &lt;br&gt;
&lt;code&gt;=NOW()&lt;/code&gt;&lt;br&gt;
today function returns the current date &lt;br&gt;
&lt;code&gt;=today()&lt;/code&gt;&lt;br&gt;
end of month returns end of 3 months &lt;code&gt;=eomonth(f2,3)&lt;/code&gt; i.e. 22/1/2026 returns 30/4/2026&lt;br&gt;
network function returns difference between two days excluding weekends.&lt;code&gt;=networkdays(k2,j2)&lt;/code&gt;&lt;br&gt;
date difference returns difference between two dates. &lt;code&gt;=datedif(k2,j2, "M")&lt;/code&gt;&lt;br&gt;
edate returns today's date added 3 months &lt;code&gt;=edate(f2,3)&lt;/code&gt;  ie 22/1/2026 returns 22/4/2026&lt;/p&gt;

&lt;h2&gt;
  
  
  Pivot tables, Pivot charts &amp;amp; Dashboards
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Pivot table.
&lt;/h2&gt;

&lt;p&gt;A Pivot Table is an Excel tool used to summarize, analyze, and explore data quickly without writing formulas.&lt;br&gt;
To insert a pivot table on excel click on the data you want to create a pivot table then click on insert on the toolbar, then click on pivot table.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstlolbsbagrr5fkyhkze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstlolbsbagrr5fkyhkze.png" alt=" " width="800" height="451"&gt;&lt;/a&gt;&lt;br&gt;
From the pivot table fields on the right, drag fields to the desired area i.e. filter, rows, columns, and values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwphrhkxe7dmqa6l7un3n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwphrhkxe7dmqa6l7un3n.png" alt=" " width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Slicers.
&lt;/h2&gt;

&lt;p&gt;A Slicer is a visual filter that lets you interactively filter data in Pivot Tables and using clickable buttons.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzxoxzfcqdlkuxo1byt6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzxoxzfcqdlkuxo1byt6.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pivort charts.
&lt;/h2&gt;

&lt;p&gt;Pivot Charts are charts that are directly linked to Pivot Tables. When the Pivot Table changes the chart updates automatically.&lt;br&gt;
From the pivot table analyze menu click on the pivot chart and select the desired chart.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3cstmpyot9xzu4qw1r4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3cstmpyot9xzu4qw1r4.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Dashboards.
&lt;/h2&gt;

&lt;p&gt;A Dashboard is a single screen report that shows key metrics and insights using charts, PivotTables, and visuals so decision makers can understand performance at a glance.&lt;br&gt;
To create a dashboard create a new worksheet, rename it to &lt;em&gt;Dashboard&lt;/em&gt;. open the worksheet and select a wide area or &lt;code&gt;ctrl+A&lt;/code&gt; then fill with your desired background colour. Copy your pivot charts and slicers from the separate worksheets and paste them on your dashboard worksheet. Add shapes to add text e.g. "dashboard title" then resize them to the desired sizes.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyi881qu9xh1s1wjthhx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyi881qu9xh1s1wjthhx.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>beginners</category>
      <category>microsoft</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
