<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Kaira Kelvin. </title>
    <description>The latest articles on Forem by Kaira Kelvin.  (@gateru).</description>
    <link>https://forem.com/gateru</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1174838%2F5d51c4b7-4a44-4df2-8a04-fd3f2c7b4e14.jpg</url>
      <title>Forem: Kaira Kelvin. </title>
      <link>https://forem.com/gateru</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/gateru"/>
    <language>en</language>
    <item>
      <title>Putting it All together: Dashboard Design Tips</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Thu, 18 Dec 2025 07:11:53 +0000</pubDate>
      <link>https://forem.com/gateru/formatting-in-power-bi-4kj9</link>
      <guid>https://forem.com/gateru/formatting-in-power-bi-4kj9</guid>
      <description>&lt;ul&gt;
&lt;li&gt;A complete guide that will help BI developers create Powerful dashboards. 
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfcjobwv6vprsb5two3l.png" alt=" " width="600" height="342"&gt;
&lt;/li&gt;
&lt;li&gt;At the top, we can see KPI cards, pie chart and slicers in tile form are scattered around. It seems that elements with different purposes are visually together on the same level and we perceive them together ,which can be confusing.&lt;/li&gt;
&lt;li&gt;The table seems to be placed there at the last moment, roughly compressing the column chart whose labels become rotated and dispro-portionately compressing the line chart showing sales dynamics.&lt;/li&gt;
&lt;li&gt;The visuals have different proportions,backgrounds and borders, with uneven spacing between them thus creating a sense of chaos.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Visuals also shows lack of organization, a failure to separate elements into logical blocks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To Fix this we will get a version where the same visualization are arranged in a logical structure from general to specific look at the image below. It is quite sufficient for corporate reporting.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzsjtuw4q1g5mdi3mmd1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzsjtuw4q1g5mdi3mmd1.png" alt=" " width="600" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Making the same dashboard more attractive.
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Change the standard cards to new ones with  category labels at the top and align all elements to the left edge.&lt;/li&gt;
&lt;li&gt;Add the planned value to the first card.&lt;/li&gt;
&lt;li&gt;Apply conditional formatting to the deviation indicator in the card.&lt;/li&gt;
&lt;li&gt;Apply the same font style to all the charts and the table.&lt;/li&gt;
&lt;li&gt;While working with the table: the header in the previous version(the table header is not the same as the title).&lt;/li&gt;
&lt;li&gt;Applied  the conditional formatting to the cell background for the average price indicator to highlight the highest prices; formatting minimum values with a different background color.&lt;/li&gt;
&lt;li&gt;Set the background of the main part or set it to the background color of the block.&lt;/li&gt;
&lt;li&gt; Also add a separator line for the header row and totals.&lt;/li&gt;
&lt;li&gt; Align all the totals in all the columns exactly below the values (performed in the specific column setting for each indicator separately)&lt;/li&gt;
&lt;li&gt; Finally the set the background for the page.&lt;/li&gt;
&lt;li&gt;Place a block shape below the header row to highlight it against the background.&lt;/li&gt;
&lt;li&gt;After Finalizing on those we get this dashboard &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn83xg2h75uqx4fpwox3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn83xg2h75uqx4fpwox3e.png" alt=" " width="600" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dashboard is divided into several zones, each designated for specific purposes and charts.&lt;/p&gt;

&lt;p&gt;a. Header(level zero)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This includes the title, navigation buttons and individual slicers( or a button to invoke the slicer panel)&lt;/li&gt;
&lt;li&gt;The header can also contain the company logo, report generation date, and other common elements related to the entire dashboard.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;b. The first level -KPIS&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Here the cards are placed. The rule is that we always start the dashboard with card at the top and only cards.&lt;/li&gt;
&lt;li&gt;Remember that the uses will need to pay attention to the most important information(main KPIs) first and add details to the KPIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;c.  The Second level- Visulaized subtotals,trends &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Below the cards on the upper half of the working area.The high level visualizations, categorized by items such as regions and timelines.These visuals also serve as filters and clicking on a column reveals details about it on the next level.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;d. The Third level - Details&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In this section we present more granular data usually in tables with conditional formatting.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is a dashboard information level scheme that guides on how to create a good dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1l5seg6fre2r0nw7scgq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1l5seg6fre2r0nw7scgq.png" alt=" " width="600" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Steps-by-step Guide for a dashboard.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Place KPI cards on Top
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;It may seem obvious but not everyone adheres to it. I recommend placing no more than six cards at the top as they are key indicators. Do not overload the section , like the image below shows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwab2jpxb7a8v1h8rwz3p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwab2jpxb7a8v1h8rwz3p.png" alt=" " width="600" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead have the KPIs follow the rule below ...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzpx9k7choeoyaabelel1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzpx9k7choeoyaabelel1.png" alt=" " width="600" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step Two Define Dashboard Grid.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The workspace refers to the page area below the cards, where the tables and charts do sit.The grid is the division of the workspace into blocks(modules) where we place visual elements.&lt;/li&gt;
&lt;li&gt;A symmetric grid should be achieved. The Image below&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidygxf9j7skxoc7gkxrn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidygxf9j7skxoc7gkxrn.png" alt=" " width="600" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The most second popular grid for a dashboard consists of six modules which we call the 3X2 meaning three columns plus two rows.The image below shows the grid layouts &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1pbkxm3gdqns1t69dsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1pbkxm3gdqns1t69dsr.png" alt=" " width="600" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The image below shows an example of a 3X2 grid.The visual shows three visual elements in its workspace but the chart on the left the sales plan execution chart. The chart is well stretched vertically and occupies two modules while the sales dynamic chart and the table on the right are stretched horizontally and each occupy two modules.&lt;/li&gt;
&lt;li&gt;The KPIs cards are also well postioned and grouped into three blocks, they are positioned according to the modular grid and aligned relative to the charts below them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1m0jq1icc4derrtfwtdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1m0jq1icc4derrtfwtdw.png" alt=" " width="600" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step three Adjust the Page Background.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Power BI is not a tool for advanced UI design it is for data and visualization. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Customize the color Theme.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Under the View tab you will find the various premade color theme options. When you click one of them, all the visuals elements on all the report pages will automaticlly  adjust to match the selected theme.&lt;/li&gt;
&lt;li&gt;Remember the company/corporate colors(brand book) are often used as the working palette.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I conclude the dashboard assembly.&lt;/p&gt;

</description>
      <category>baddesign</category>
      <category>dataviz</category>
      <category>uidesign</category>
    </item>
    <item>
      <title>Analyzing Machine downtime</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Thu, 30 Oct 2025 15:23:52 +0000</pubDate>
      <link>https://forem.com/gateru/analyzing-machine-downtime-21k0</link>
      <guid>https://forem.com/gateru/analyzing-machine-downtime-21k0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In Manufacturing,keeping machines running efficiently is essential for meeting production targets and maintaining day to day operations. By analyzing operational data, manufactures can identify patterns that lead to downtime and take proactive steps to prevent it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I focused on exploring and understanding machine performance data to uncover insights about what factors contribute to downtime.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  overview.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The company operates three different machines on its shop floor that produce different sized components, so minimizing the downtime of these machines is vital for meeting production deadlines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💾 The data&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The company has stored the machine operating data in a single table, av
Each row in the table represents the operational data for a single machine on a given day:&lt;/li&gt;
&lt;li&gt;"Date" - the date the reading was taken on.&lt;/li&gt;
&lt;li&gt;"Machine_ID" - the unique identifier of the machine being read.&lt;/li&gt;
&lt;li&gt;"Assembly_Line_No" - the unique identifier of the assembly line the machine is located on.&lt;/li&gt;
&lt;li&gt;"Hydraulic_Pressure(bar)", "Coolant_Pressure(bar)", and "Air_System_Pressure(bar)" - pressure measurements at different points in the machine.&lt;/li&gt;
&lt;li&gt;"Coolant_Temperature", "Hydraulic_Oil_Temperature", and "Spindle_Bearing_Temperature" - temperature measurements (in Celsius) at different points in the machine.&lt;/li&gt;
&lt;li&gt;Spindle_Vibration, "Tool_Vibration", and "Spindle_Speed(RPM)" - vibration (measured in micrometers) and rotational speed measurements for the spindle and tool.&lt;/li&gt;
&lt;li&gt;Voltage(volts) - the voltage supplied to the machine.&lt;/li&gt;
&lt;li&gt;Torque(Nm) - the torque being generated by the machine.&lt;/li&gt;
&lt;li&gt;Cutting(KN) - the cutting force of the tool.&lt;/li&gt;
&lt;li&gt;Downtime - an indicator of whether the machine was down or not on the given day.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Primary Objectives.
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;What is the first and last date readings were taken on?&lt;/li&gt;
&lt;li&gt;What is the average Torque?&lt;/li&gt;
&lt;li&gt;Which assembly line has the highest readings of machine downtime?&lt;/li&gt;
&lt;li&gt;Explore correlations between the various operational data in the dataset.&lt;/li&gt;
&lt;li&gt;Do you see a pattern in machine downtime over time?&lt;/li&gt;
&lt;li&gt;Which factors (visually) seem to be connected to machine downtime?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Findings and insights
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Average spindle speed across all machines
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select avg(spindle_speed_rpm)::numeric(10,2) as avg_spindle_speed
from machine_downtime;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;✅ total machines in the industry
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select count(distinct machine_id) from machine_downtime;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;✅What is the first and last date readings were taken on?
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select min(recorded_date) as first_date, max(recorded_date) as last_date
from machine_downtime;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;✅ What is the average Torque?
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select (avg(torque_nm))::numeric(10,2) as average_torque
from machine_downtime;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;✅ Which assembly line has the highest readings of machine downtime?
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select assembly_line_no,count(downtime) as total_downtime 
from machine_downtime
where downtime= 'Machine_Failure'
group by assembly_line_no
order by total_downtime desc;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select assembly_line_no,machine_id,count(downtime) as total_downtime 
from machine_downtime
where downtime= 'Machine_Failure'
group by assembly_line_no, machine_id 
order by total_downtime desc;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F10iqeaws1pxv5jwko0ez.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F10iqeaws1pxv5jwko0ez.png" alt=" " width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>💡 Understanding Customer Churn Through SQL: A Data-Driven Exploration</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Thu, 30 Oct 2025 10:23:41 +0000</pubDate>
      <link>https://forem.com/gateru/understanding-customer-churn-through-sql-a-data-driven-exploration-3c3a</link>
      <guid>https://forem.com/gateru/understanding-customer-churn-through-sql-a-data-driven-exploration-3c3a</guid>
      <description>&lt;h1&gt;
  
  
  Introduction.
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;In today's competitive telecom market,customer retention is everything. Churn the rate at which customers stop using a service can make or break profitability.&lt;/li&gt;
&lt;li&gt;I will utilize SQL to explore a dataset of telecom customers,analyze churn patterns and uncover key insights that can help businesses take action.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Overview.
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The dataset is from a telecom company databel and i used it to investigate customer churn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💾 The data&lt;br&gt;
The data consists of four parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer status: the status and reason why customers churned&lt;/li&gt;
&lt;li&gt;Demographics: demographic data on each customer&lt;/li&gt;
&lt;li&gt;Contract information: information on the type of contract&lt;/li&gt;
&lt;li&gt;Subscription types &amp;amp; charges: numerical data on how the customer uses his subscription&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Project Overview.
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The project was done entirely using PostgreSQL,focusing on data cleaning, transformation, and exploratory queries to answer specific business questions, including:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1️⃣ What is the average churn rate?&lt;/p&gt;

&lt;p&gt;2️⃣ What are the top reasons for churn?&lt;/p&gt;

&lt;p&gt;3️⃣ Do data usage and unlimited plans affect churn?&lt;/p&gt;

&lt;p&gt;4️⃣ Are there demographic differences in churn (gender, age)?&lt;/p&gt;

&lt;p&gt;5️⃣ How do contract types and payment methods influence customer retention?&lt;/p&gt;

&lt;p&gt;🧹 Step 1: Data Cleaning and Preparation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first step was to prepare the raw dataset. Many column names were inconsistent they contained spaces,uppercase letters) I renamed them using snake_case for easier querying.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ALTER TABLE databel RENAME COLUMN "Customer ID" TO customer_id;
ALTER TABLE databel RENAME COLUMN "Churn Label" TO churn_label;
ALTER TABLE databel RENAME COLUMN "Account Length (in months)" TO account_length_months;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;I repeated this for all the columns, ensuring the dataset was clean and consistent.&lt;/li&gt;
&lt;li&gt;I handled missing values in churn_reason by replacing blanks with 'not defined'
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPDATE databel
SET churn_reason ='not defined'
WHERE churn_reason = '';

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;📊 Step 2: Calculating the Average Churn Rate&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To understand the performance , I computed the average churn rate
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS avg_churn
FROM databel;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;it shows the percentage of customers who left the company - a key health indicator for any telecom.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔍 Step 3: Identifying Key Churn Reasons&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To reveal the top churn drivers such as poor customer service,high data charges and lack of network reliability. I used this formula
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT churn_reason, COUNT(churn_reason) AS reasons_for_count
FROM databel
GROUP BY churn_reason
ORDER BY reasons_for_count DESC
LIMIT 5;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;📶 Step 4: Churn by Data Usage.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finding: Customers with low data usage (≤ 3 GB) showed a higher churn rate — possibly due to underutilization or perceived lack of value.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT usage_category,
       AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS churned_proportion
FROM (
    SELECT *,
        CASE 
            WHEN avg_montly_gb_download &amp;lt;= 3 THEN 'low usage (&amp;lt;=3)'
            ELSE 'high usage (&amp;gt;3)'
        END AS usage_category
    FROM databel
) sub
GROUP BY usage_category;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;🌐 Step 5: The Impact of Unlimited Data Plans.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT usage_category,
       AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS churned_proportion
FROM (
    SELECT *,
        CASE 
            WHEN avg_montly_gb_download &amp;lt;= 3 THEN 'low usage (&amp;lt;=3)'
            ELSE 'high usage (&amp;gt;3)'
        END AS usage_category
    FROM databel
    WHERE unlimited_data_plan = 'Yes'
) sub
GROUP BY usage_category;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Customers on unlimited data plans had significantly lower churn ,showing the retention power of flexible data offerings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👥 Step 6: Demographic Analysis&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;By Gender.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT gender,
       AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS churned_proportion_per_gender
FROM databel
GROUP BY gender;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;By Age Category.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT age_category,
       AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS avg_churn
FROM (
    SELECT *,
           CASE 
                WHEN age &amp;lt;= 30 THEN 'Youths'
                WHEN age BETWEEN 31 AND 50 THEN 'Adults'
                ELSE 'Seniors'
           END AS age_category
    FROM databel
) sub
GROUP BY age_category;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Younger customers (under 30) tended to churn more frequently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💳 Step 7: Contract and Payment Insights.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;By Contract Type
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT contract_type,
       AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS churned_proportion
FROM databel
GROUP BY contract_type
ORDER BY churned_proportion DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;By Payment Method
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT payment_method,
       AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS avg_churn
FROM databel
GROUP BY payment_method
ORDER BY avg_churn DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Contracts with month to month plans and manual payment methods showed higher churn,indicating that automation and long -term contracts help retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🗺️ Step 8: Geographic Insights&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT state,
       AVG(CASE WHEN churn_label='Yes' THEN 1 ELSE 0 END)::NUMERIC(10,2) AS avg_churn_location
FROM databel
GROUP BY state
ORDER BY avg_churn_location DESC
LIMIT 2;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Low data users churn more -Possibly due to underutilization.&lt;/li&gt;
&lt;li&gt;Unlimited Plans reduce churn - highlighting their customer loyalty effect.&lt;/li&gt;
&lt;li&gt;Young Customers are more likely to leave - engagement programs could help&lt;/li&gt;
&lt;li&gt;Contract type and payment methos are strong churn predictors&lt;/li&gt;
&lt;li&gt;Location -based strategies could address regional retention challenges.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Recommendations to Reduce Churn:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Competitive Analysis: Investigate competitor offers and devices to develop more attractive packages.&lt;/li&gt;
&lt;li&gt;Customer Support Training: Enhance support staff training to improve customer interactions.&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Breaking Barriers: Understanding and Improving HIV Prevention Pill Use in Kenya.</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Wed, 29 Oct 2025 14:52:48 +0000</pubDate>
      <link>https://forem.com/gateru/breaking-barriers-understanding-and-improving-hiv-prevention-pill-use-in-kenya-1emg</link>
      <guid>https://forem.com/gateru/breaking-barriers-understanding-and-improving-hiv-prevention-pill-use-in-kenya-1emg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction.
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Kenya has made tremendous strides in making HIV preventive pills (often referred to as DP pills or PrEP) freely available across public and private clinics. Despite this accessibility, uptake and long-term adherence remain a pressing concern. While many individuals take the first dose, a significant number discontinue soon after the initial month.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This report explores behavioral patterns behind these trends and suggests targeted interventions to boost continuation rates and overall effectiveness of the HIV prevention program.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Who Is Using HIV Prevention Pills?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The majority of those who received the pills were young women, particularly in their teens to early thirties. This age group represents Kenya’s most sexually active demographic and therefore plays a critical role in HIV prevention strategies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Counties such as Nairobi and Kisii recorded the highest distribution numbers, while male participation remained notably lower.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;-Interestingly, the type of health facility—whether public, private, or community-based—did not significantly affect uptake rates. This suggests that accessibility is not the main barrier to initial adoption.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Drives People to Start—and Stop—Taking the Pills?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Referrals Matter&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Peer referrals and Voluntary Counseling and Testing (VCT) sites were strong drivers of initial interest. Many first-time users were encouraged by friends or transferred from VCT centers. However, despite high initial uptake from these sources, continuation rates dropped sharply after the first refill.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Early Drop-Off Rates&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every patient began with a 30-pill starter pack. The steepest decline occurred immediately after the first month—only a fraction returned for subsequent refills. By the sixth month, continuation was extremely low.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Health Factors: BMI, Blood Pressure, and STIs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;BMI (Body Mass Index) showed little impact on adherence. Normal-weight individuals had the highest initial refills, but the trend evened out across all BMI groups over time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Blood Pressure (BP) appeared inconsistent as a predictor—low BP correlated with early refills, while medium-to-high BP patients showed better long-term adherence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;STIs (Sexually Transmitted Infections) temporarily boosted adherence. Patients recently diagnosed with STIs demonstrated high initial motivation, which declined as symptoms improved.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Power—and Limits—of Counseling
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Counseling had one of the strongest early effects on adherence. Patients who received counseling before or during their first month were far more likely to return for refills. Unfortunately, without continued follow-up, motivation waned after the initial phase.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Insights
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The data clearly show that behavioral factors, not clinical ones, play the biggest role in continuation rates. While health metrics such as BMI or BP had minimal long-term influence, motivation, support, and social referral networks had the greatest impact.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Low adherence can thus be viewed not as a medical issue, but as a behavioral challenge—one that requires ongoing engagement, not just a one-time intervention.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommended Interventions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Based on the analysis, the most effective strategy is targeted, ongoing counseling and follow-up support, particularly during the critical first month.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduce peer mentorship programs to sustain motivation beyond initial referrals.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement structured follow-ups via SMS, phone calls, or clinic visits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Track patient feedback to better understand reasons for discontinuation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If applied systematically, these interventions could increase initial uptake by 25% and continuation rates by up to 30%—a meaningful improvement in Kenya’s fight against HIV.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>discuss</category>
      <category>learning</category>
      <category>science</category>
    </item>
    <item>
      <title>Budget Analysis Report — Wote County Watchdog FY 2019</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Wed, 29 Oct 2025 14:03:56 +0000</pubDate>
      <link>https://forem.com/gateru/budget-analysis-report-wote-county-watchdog-fy-2019-2nf2</link>
      <guid>https://forem.com/gateru/budget-analysis-report-wote-county-watchdog-fy-2019-2nf2</guid>
      <description>&lt;h1&gt;
  
  
  Overview
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;The wote County watchdog FY 2019 Budget Dashboard provides a detailed financial snapshot of the county's budget allocation,actual expenditures and variance (over or under budget ) across multiple departments and funds.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Purpose of the Dashboard
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Which departments stayed within budget&lt;/li&gt;
&lt;li&gt;Which ones overspent&lt;/li&gt;
&lt;li&gt;How funds were distributed across different county projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Findings and Insights
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1s6rgkluiqit6s8jmfrr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1s6rgkluiqit6s8jmfrr.png" alt=" " width="800" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amount Over Budget: ($325.12 million)&lt;/li&gt;
&lt;li&gt;Number of Expenditures: 469,000&lt;/li&gt;
&lt;li&gt;Largest Fund: General Fund (~$4 billion)&lt;/li&gt;
&lt;li&gt;Department with Overspending: Wake County Public School System&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Department Analysis
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each department is listed with a small symbol showing its budget status:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ Green = within budget&lt;br&gt;
🚫 Red = over budget&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dashboard tells us that:&lt;/li&gt;
&lt;li&gt;Wote County manages its big funds quite well.&lt;/li&gt;
&lt;li&gt;Smaller or specialized departments might not be tracking budgets as closely.&lt;/li&gt;
&lt;li&gt;Education spending needs better planning and review.&lt;/li&gt;
&lt;li&gt;With over 469,000 expenditure entries, even small overruns can add up fast.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommendations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Audit over-budget departments - finding out why spending exceeded plans.&lt;/li&gt;
&lt;li&gt;Link budgets to results - expecially for education and public  services&lt;/li&gt;
&lt;li&gt;Simplify expense tracking - make it easier for departments to update records.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The Wote County FY 2019 Budget Dashboard shows both strength and challenge. While most departments stayed disciplined, the total overspending of $325 million reveals room for tighter control.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>datascience</category>
      <category>showcase</category>
    </item>
    <item>
      <title>🧠 Customer Churn Analysis with Power BI — Understanding Why Customers Leave</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Wed, 29 Oct 2025 12:42:58 +0000</pubDate>
      <link>https://forem.com/gateru/customer-churn-analysis-with-power-bi-understanding-why-customers-leave-4gpk</link>
      <guid>https://forem.com/gateru/customer-churn-analysis-with-power-bi-understanding-why-customers-leave-4gpk</guid>
      <description>&lt;h1&gt;
  
  
  Introduction.
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;In today's competitive business world, customer retention is the new growth strategy. It's far cheaper to keep an existing customer than to acquire a new one -yet many companies struggle to understand why customers leave.&lt;/li&gt;
&lt;li&gt;The project Customer Churn Analysis dives deep into real-world customer data using Power BI to uncover patterns,behaviors and reasons behind churn.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project Purpose
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The purpose of this project is to analyze customer churn patterns — understanding what drives customers to discontinue a service and how those insights can improve retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Primary Objectives.
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Measure churn rate across various customer segments.&lt;/li&gt;
&lt;li&gt;Identify major churn reasons (competitor influence, pricing, dissatisfaction, etc.).&lt;/li&gt;
&lt;li&gt;Visualize churn by demographics — age, seniority, and usage habits.&lt;/li&gt;
&lt;li&gt;Understand the impact of contracts and data plans on churn.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwktn92zabtjg3h3txmc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwktn92zabtjg3h3txmc.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  📊 Key Findings and Visual Insights
&lt;/h3&gt;

&lt;p&gt;1️⃣ Overall Churn Overview&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total Customers: 6,687&lt;/li&gt;
&lt;li&gt;Churned Customers: 1,796&lt;/li&gt;
&lt;li&gt;Overall Churn Rate: 26.86%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2️⃣ Payment Methods and Churn Behavior&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customers using digital payments (Direct Debit, Credit Card) show better retention than those paying via paper checks — possibly due to convenience and automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3️⃣ Churn by Demographics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seniors are most likely to churn, potentially due to lower tech adoption, cost sensitivity, or customer support difficulties.
Younger users (under 30) show more brand loyalty but may still switch for better deal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4️⃣ Contract Type vs. Churn&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Month-to-month customers make up more than half of all accounts and have the highest churn rate — suggesting that longer-term contracts can improve retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru2tzh2cjtfxrmvm0ugk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru2tzh2cjtfxrmvm0ugk.png" alt=" " width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 Insights Summary
&lt;/h3&gt;

&lt;p&gt;From the dashboard insights, a few strong patterns emerge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Competitor churn dominates — the company must match or beat offers to retain customers.&lt;/li&gt;
&lt;li&gt;Senior customers are most at risk — they need dedicated support and simplified communication.&lt;/li&gt;
&lt;li&gt;Month-to-month contracts are convenient but risky — loyalty incentives are essential.&lt;/li&gt;
&lt;li&gt;Unlimited data doesn’t ensure satisfaction — perceived value and service quality matter more.&lt;/li&gt;
&lt;li&gt;The first 12 months are crucial — early experience determines long-term commitment.&lt;/li&gt;
&lt;li&gt;Customer service performance directly influences retention&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommendations.
&lt;/h3&gt;

&lt;p&gt;1️⃣ Build Loyalty Programs: Offer discounts or perks for customers renewing after one year.&lt;/p&gt;

&lt;p&gt;2️⃣ Target Competitor Leavers: Analyze competitor offerings and develop retention-specific counterplans.&lt;/p&gt;

&lt;p&gt;3️⃣ Monitor Early Churn Signals: Use predictive models to flag customers with short tenure or frequent service calls.&lt;/p&gt;

&lt;p&gt;4️⃣ Improve Customer Support: Reduce response time, improve agent training, and use follow-up surveys.&lt;/p&gt;

&lt;p&gt;5️⃣ Segment by Demographics: Tailor communication and offers to different age groups, especially seniors.&lt;/p&gt;

&lt;p&gt;6️⃣ Encourage Long-Term Contracts: Provide bundles, free add-ons, or premium upgrades for multi-year signups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Every bar,pie and map in this dashboard tells a story - of loyalty, competition and opportunity. Customer churn is not just a number - it is a reflection of trust lost.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>data</category>
      <category>datascience</category>
      <category>tutorial</category>
      <category>functional</category>
    </item>
    <item>
      <title>Analyzing Crime against women in India.</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Wed, 29 Oct 2025 11:35:15 +0000</pubDate>
      <link>https://forem.com/gateru/analyzing-crime-against-women-in-india-26og</link>
      <guid>https://forem.com/gateru/analyzing-crime-against-women-in-india-26og</guid>
      <description>&lt;h1&gt;
  
  
  About this project
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Purpose:
&lt;/h2&gt;

&lt;p&gt;The aim of this project is to analyze crime against women in India during the year 2022. By transforming raw crime statistics into visual insights through Power BI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Objectives.
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;To identify the most common crime types affecting women. &lt;/li&gt;
&lt;li&gt;To analyze the states with the highest and lowest number of reported cases.&lt;/li&gt;
&lt;li&gt;To highlight correlations and patterns in gender-based violence.&lt;/li&gt;
&lt;li&gt;To provide insights that can inform prevention and policy decisions.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Tools and Technologies used.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Power BI Desktop : for data cleaning, modelling and visualization.&lt;/li&gt;
&lt;li&gt;Dataset : National Crime Records Bureau (NCRB) 2022 - Crimes Against Women.&lt;/li&gt;
&lt;li&gt;Data Model : State-wise and crime-type-wise aggregation with calculated metrics such as crime rate per 100k population and percentage distribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz15z2cb8vu9j4d6bcl4s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz15z2cb8vu9j4d6bcl4s.png" alt=" " width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Insights.
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;a.Overview.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total crime cases Reported 445,000&lt;/li&gt;
&lt;li&gt;States/UTs Covered: 35&lt;/li&gt;
&lt;li&gt;Total Crime types: 20&lt;/li&gt;
&lt;li&gt;Women Population: 686M&lt;/li&gt;
&lt;li&gt;Overall Death Rate: 1.5%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;b.Frequent Crimes Against Women&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3r3npjm9mtiugzac5r2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3r3npjm9mtiugzac5r2.png" alt=" " width="461" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reflecting a troubling global pattern, cruelty by husband/relatives stands out to be most persistent and underreported crimes against women, cutting across culture,economy and geography.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;c.Insights Summary.&lt;/strong&gt;&lt;br&gt;
🔴 Uttar Pradesh consistently ranks highest in total reported cases, demanding urgent intervention.&lt;br&gt;
❗ Domestic abuse is systemic: Found across every Indian state with alarming frequency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqwvum2ryqdj7b9sxs3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqwvum2ryqdj7b9sxs3j.png" alt=" " width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Lakshwadeep city is the safe city in India that had no incidence of crime committed against women.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Safe regions show that crime reduction is possible through social cohesion, education, and proactive governance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The Crime Analysis in India 2022 project paints a sobering but necessary picture of women’s safety in India. While progress has been made in reporting and awareness, the persistence of domestic abuse, dowry deaths, and crimes against young girls shows that gender-based violence is both a social and cultural challenge. Through data-driven storytelling and visual analytics, this project serves as both a mirror and a call to action — urging policymakers, citizens, and advocacy groups to turn insights into impact.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>discuss</category>
      <category>datascience</category>
      <category>data</category>
      <category>sql</category>
    </item>
    <item>
      <title>Fashion Analysis: Understanding Customer Churn in Fashion Industry</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Tue, 28 Oct 2025 15:17:26 +0000</pubDate>
      <link>https://forem.com/gateru/fashion-analysis-understanding-customer-churn-in-fashion-industry-3jge</link>
      <guid>https://forem.com/gateru/fashion-analysis-understanding-customer-churn-in-fashion-industry-3jge</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;The fashion industry is growing daily and business owners need to understand customer behaviour to make profits and drive operations.The article delves into the fashion industry to understand customer churn.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎯 Project Objective.
&lt;/h2&gt;

&lt;p&gt;The primary objective of this analysis is to replace gut-based decisions with research-backed strategies to improve stock management and customer retention.The analysis aims to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Understand consumer behavior - This includes customer preferences,purchase frequency and understanding customer demographics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analyzing inventory performance - identifying fast and slow moving items to optimize stock levels.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analyzing sales trends over time - Determining peak sales days, product categories and seasonal variations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Time -based analysis - examining how timing affects consumer engagements and purchases using the rolling 3-Month Sum.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Dataset Overview.
&lt;/h3&gt;

&lt;p&gt;The dataset used was inserted using SQL using the function;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create table fashion (
    clothing_type VARCHAR(8),
    category VARCHAR(11),
    price DECIMAL(5,2),
    discount DECIMAL(4,2),
    shop_outlet VARCHAR(50),
    delivery_date DATE,
    order_date DATE,
    revenue DECIMAL(7,2),
    customer_name VARCHAR(50),
    customer_email VARCHAR(50)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key Fields in the dataset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clothing_type--Brand of the cloth &lt;/li&gt;
&lt;li&gt;Category- category according to men,women or child price &lt;/li&gt;
&lt;li&gt;Discount - Discount offered on clothes &lt;/li&gt;
&lt;li&gt;shop outlet- Various shops that customers bought their clothes. &lt;/li&gt;
&lt;li&gt;delivery date- Date when the bought clothes reached clients &lt;/li&gt;
&lt;li&gt;order date - Dates when ordered were made &lt;/li&gt;
&lt;li&gt;revenue - Amount generated from the clothes sold
&lt;/li&gt;
&lt;li&gt;customer name - Customers first and last name 
customer email - Customers email address&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Analysis and Findings:
&lt;/h2&gt;

&lt;p&gt;🔵 To understand the consumers demographics we analyzed the top 10 customers by revenue and found that women and children clothes are the leading cloths type purchased a lot and leads in revenue generation. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fede05yfm27x0ilkxwqi7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fede05yfm27x0ilkxwqi7.png" alt=" " width="652" height="472"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendation:
&lt;/h3&gt;

&lt;p&gt;🟢 Shop oulets should consider discounting men's items to drive up sales , since there discount is directly related to sales.&lt;/p&gt;

&lt;h3&gt;
  
  
  Findings.
&lt;/h3&gt;

&lt;p&gt;🔵 Analyzing shop outlets that had highest customer visits and made a purchase.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsv0mqdkoixzejhbq9wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsv0mqdkoixzejhbq9wn.png" alt=" " width="416" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brightbean recorded the highest number of visits among all shop outlets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faz39q8znd7twl3axxhda.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faz39q8znd7twl3axxhda.png" alt=" " width="328" height="178"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brightbean also offered the highest discounts, which likely contributed to its higher foot traffic and engagement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Interpretation.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;There’s a positive correlation between the level of discounts offered and the number of visits.&lt;/li&gt;
&lt;li&gt;Outlets providing more promotions and discounts attract significantly more customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommendations:
&lt;/h3&gt;

&lt;p&gt;🟢 Replicate Brightbean’s Strategy: - Other outlets should adopt similar discount or promotional approaches (e.g., limited-time offers, loyalty discounts).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use data-driven promotion scheduling to align offers with customer visit patterns. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🟢 Improve Outlet Visibility: - Increase online presence and share shop updates on Instagram, WhatsApp, and TikTok to reach a broader audience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sales Trends by Day:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4epqcusiwq2g9l2w3ocu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4epqcusiwq2g9l2w3ocu.png" alt=" " width="445" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔵 Friday recorded the highest sales, making it the peak shopping day.&lt;br&gt;
🔵 Wednesday had the lowest sales, showing reduced customer engagement mid-week.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interpretation.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The spike in Friday sales reflects strong end-of-week consumer activity, possibly driven by payday spending or weekend preparation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommendation.
&lt;/h3&gt;

&lt;p&gt;🟢 Enhance Friday Promotions: - Launch “Friday Frenzy” or “Fresh Friday” sales to maintain and grow Friday momentum.&lt;br&gt;
🟢 Boosting sales - use mid-week social media campaigns and personalized offers to attract attention.such as "Ladies Wednesday".&lt;br&gt;
🟢 Marketing Implications: - Build pre-Friday hype through digital channels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time Based Analysis.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubsl5wzue7kvpwuas8iw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubsl5wzue7kvpwuas8iw.png" alt=" " width="366" height="312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Findings.
&lt;/h3&gt;

&lt;p&gt;🔵 Sales were highest on Fridays and Saturdays, showing that consumers prefer shopping toward the weekend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interpretation:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The end-of-week sales peak aligns with consumer behavior trends, as most shoppers have more free time or disposable income closer to weekends.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommendations:
&lt;/h3&gt;

&lt;p&gt;🟢 Enhance Weekend Engagement: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Launch "Styling Saturdays" promotions to maximize weekend traffic.&lt;/li&gt;
&lt;li&gt;Offer free outfit styling services or photo-tag discounts (e.g., “Take a pic in-store, tag us, and get 10% off your next visit”).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🚀 Call to Action.
&lt;/h2&gt;

&lt;p&gt;AS I have analyzed and explored the financial performance of this project, its crystal clear that data driven decision- making is crucial for optimizing sales, driving the growth of the business and retaining happy customers.&lt;/p&gt;

&lt;p&gt;If you found this analysis insightful, consider sharing it with business stakeholders and your network . Also you can reach out or leave a comment if you want to apply the same techniques on your data.&lt;/p&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>learning</category>
      <category>100daysofcode</category>
    </item>
    <item>
      <title>Electrifing insights with BI.</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Mon, 29 Sep 2025 09:40:31 +0000</pubDate>
      <link>https://forem.com/gateru/electrifing-insights-with-bi-3mb</link>
      <guid>https://forem.com/gateru/electrifing-insights-with-bi-3mb</guid>
      <description>&lt;h2&gt;
  
  
  Introduction.
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Equipments such as computers and smartphones, are  widely available today, and the global consumer electronics market generated an impressive $950 billion in revenue in 2024.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The dataset is from GlobalElec, a leading global electronics retailer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ⭐ Project overview.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The goal of this analysis was to uncover electronic sales patterns of across three continents.&lt;/li&gt;
&lt;li&gt;I analyzed and visualize the company’s sales data, covering customers, products, sales, stores, and international exchange rates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dataset.
&lt;/h3&gt;

&lt;h3&gt;
  
  
  💾 The data
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Group&lt;/th&gt;
&lt;th&gt;Column name&lt;/th&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;Definition&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;CustomerKey&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Primary key to identify customers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;Gender&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer gender&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;Name&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer full name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;City&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer city&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;State Code&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer state (abbreviated)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer state (full)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;Zip Code&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer zip code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;Country&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer country&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;Continent&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer continent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customers&lt;/td&gt;
&lt;td&gt;Birthday&lt;/td&gt;
&lt;td&gt;Customers.csv&lt;/td&gt;
&lt;td&gt;Customer date of birth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;ProductKey&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Primary key to identify products&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;Product Name&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Product name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;Brand&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Product brand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;Color&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Product color&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;Unit Cost USD&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Cost to produce the product in USD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;Unit Price USD&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Product list price in USD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;SubcategoryKey&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Key to identify product subcategories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;Subcategory&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Product subcategory name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;CategoryKey&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Key to identify product categories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Products&lt;/td&gt;
&lt;td&gt;Category&lt;/td&gt;
&lt;td&gt;Products.csv&lt;/td&gt;
&lt;td&gt;Product category name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;Order Number&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Unique ID for each order&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;Line Item&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Identifies individual products purchased&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;Order Date&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Date the order was placed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;Delivery Date&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Date the order was delivered&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;CustomerKey&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Unique key identifying which customer ordered&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;StoreKey&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Unique key identifying which store processed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;ProductKey&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Unique key identifying which product purchased&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;Quantity&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Number of items purchased&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales&lt;/td&gt;
&lt;td&gt;Currency Code&lt;/td&gt;
&lt;td&gt;Sales.csv&lt;/td&gt;
&lt;td&gt;Currency used to process the order&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stores&lt;/td&gt;
&lt;td&gt;StoreKey&lt;/td&gt;
&lt;td&gt;Stores.csv&lt;/td&gt;
&lt;td&gt;Primary key to identify stores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stores&lt;/td&gt;
&lt;td&gt;Country&lt;/td&gt;
&lt;td&gt;Stores.csv&lt;/td&gt;
&lt;td&gt;Store country&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stores&lt;/td&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Stores.csv&lt;/td&gt;
&lt;td&gt;Store state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stores&lt;/td&gt;
&lt;td&gt;Square Meters&lt;/td&gt;
&lt;td&gt;Stores.csv&lt;/td&gt;
&lt;td&gt;Store footprint in square meters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stores&lt;/td&gt;
&lt;td&gt;Open Date&lt;/td&gt;
&lt;td&gt;Stores.csv&lt;/td&gt;
&lt;td&gt;Store open date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exchange Rates&lt;/td&gt;
&lt;td&gt;Date&lt;/td&gt;
&lt;td&gt;Exchange_Rates.csv&lt;/td&gt;
&lt;td&gt;Date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exchange Rates&lt;/td&gt;
&lt;td&gt;Currency&lt;/td&gt;
&lt;td&gt;Exchange_Rates.csv&lt;/td&gt;
&lt;td&gt;Currency code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exchange Rates&lt;/td&gt;
&lt;td&gt;Exchange&lt;/td&gt;
&lt;td&gt;Exchange_Rates.csv&lt;/td&gt;
&lt;td&gt;Exchange rate compared to USD&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Data Model.
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh55emgh0g1tbme4gjgmf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh55emgh0g1tbme4gjgmf.png" alt=" " width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions to answer ?
&lt;/h2&gt;

&lt;p&gt;1️⃣:Which countries or regions contribute the most to overall revenue? &lt;br&gt;
          ✅ Toronto.&lt;br&gt;
          🌎 North America &lt;br&gt;
2️⃣:Which product categories and regions yield the highest profit margins?&lt;br&gt;
          💻 Computer &lt;br&gt;
3️⃣: Are there patterns or changes in delivery times across different regions or product categories?&lt;/p&gt;

&lt;h2&gt;
  
  
  Dashboards.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The product dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbt12e2kplydpb419w1tc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbt12e2kplydpb419w1tc.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📌 After data cleaning, the dataset comprised 11 product brands,16 products varieties, 8 categories of products, and 32 sub categories. &lt;/p&gt;

&lt;h3&gt;
  
  
  Insights
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7iz3fnanoc1atx0lruw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7iz3fnanoc1atx0lruw.png" alt=" " width="456" height="289"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
🔹 White products were the most sold.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfvexhhat9q1pmibvz2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfvexhhat9q1pmibvz2n.png" alt=" " width="441" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔹 Computers were the most profitable products.&lt;/p&gt;

&lt;h3&gt;
  
  
  sales dashboard
&lt;/h3&gt;

&lt;p&gt;📌 Total revenue was 55.76M, total profit 33.66M, 26K orders made and the sales were done in 5 currencies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdd24tstkcd9evpnx73el.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdd24tstkcd9evpnx73el.png" alt=" " width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Insights.
&lt;/h3&gt;

&lt;p&gt;🔹 North America had the highest sales, February recorded the highest profit . &lt;br&gt;
🔹 Female purchased more products than there counterparts male.&lt;br&gt;
🔹 Clients transacted more frequently using USD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fav2qco6mtr7pg1v6t0s8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fav2qco6mtr7pg1v6t0s8.png" alt=" " width="800" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Customer dashboard.
&lt;/h3&gt;

&lt;p&gt;📌 After data cleaning and creating KPIs: the total number of customers who bought products at Globalelec were 15.3k and the came from 8 counties. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpmogwlq65hyxz0c9fdn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpmogwlq65hyxz0c9fdn.png" alt=" " width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Insights.
&lt;/h3&gt;

&lt;p&gt;🔹  Canada in North America had the highest number of customers.&lt;br&gt;
🔹  Most customers were above 50 years old.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qkcr3zvlrgjqls5n3z9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qkcr3zvlrgjqls5n3z9.png" alt=" " width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools Used
&lt;/h3&gt;

&lt;p&gt;📈 Microsoft excel - for data cleaning &lt;br&gt;
📊 Power Bi - for data visualization, dashboards &lt;/p&gt;

</description>
      <category>webdev</category>
      <category>python</category>
      <category>career</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Top 30 Pandas Interview Questions and Answers.</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Sat, 09 Aug 2025 12:02:32 +0000</pubDate>
      <link>https://forem.com/gateru/top-30-pandas-interview-questions-and-answers-1noa</link>
      <guid>https://forem.com/gateru/top-30-pandas-interview-questions-and-answers-1noa</guid>
      <description>&lt;h3&gt;
  
  
  A quick recap of pandas ✨
&lt;/h3&gt;

&lt;p&gt;1️⃣ What is Pandas in Python?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pandas is an open-source Python library with powerful and built-in methods to efficiently clean, analyze, and manipulate datasets. Developed by Wes McKinney in 2008, this powerful package can easily blend with various other data science modules in Python.
Pandas is built on top of the NumPy library, i.e., its data structures Series and DataFrame are the upgraded versions of NumPy arrays.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2️⃣ Why doesn't DataFrame.shape have parenthesis? &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In pandas, shape is an attribute and not a method. **df.shape **outputs a tuple with the number of rows and columns in a DataFrame.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3️⃣ What is an index in pandas?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The index is a series of labels that can uniquely identify each row of a DataFrame. The index can be of any datatype like integer,string,hash.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4️⃣ What is the difference between &lt;strong&gt;loc&lt;/strong&gt; and &lt;strong&gt;iloc&lt;/strong&gt;?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both loc and the iloc methods in pandas are used to select subsets of a DataFrame. Practically these are widely used for filtering DataFrame based on conditions.
&lt;strong&gt;Loc&lt;/strong&gt; is used to select data using actual labels of rows and columns, while the &lt;strong&gt;iloc&lt;/strong&gt; method is used to extract data based on integer indices of rows and columns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;5️⃣ How do you get the count of all unique values of a categorical column in a DataFrame. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The function returns the count of each unique series or a column.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Series.value_counts() 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;6️⃣ What is Timedelta?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Timedelta represents the duration i.e. the difference between two dates or times, measured in units as days, hours, minutes, and seconds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;7️⃣ What is the difference between &lt;strong&gt;append **and **concat&lt;/strong&gt; methods?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We can use the concat method to combine DataFrames either along rows or columns while append is used to combine DataFrames but only the rows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;8️⃣ What is the pandas method to get the statistical summary of all the columns in a DataFrame?&lt;br&gt;
 df.describe()&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;df.describe() generates descriptive statistics, including those that summarize the central tendency, dispersion and shape of the dataset's distribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;9️⃣ What is the difference between Series and DataFrame?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;DataFrame: The pandas DataFrame will be in tabular format with multiple rows and columns where each column can be of different data types. A two dimensional data structure that holds data like a  two-dimension array or a table with rows and columns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Series: The Series is a one-dimensional labeled array that can store any data type, but all of its values should be of the same data type. The Series data structure is more like a single column of a Dataframe. It consumes less memory than a Dataframe.such as intergers ,strings,python ,objects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1️⃣0️⃣ How do you access the top 6 rows and last 7 rows of a pandas DataFrame? Also Known as Viewing data.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;head()&lt;/code&gt; method in pandas is used to access the initial rows of a DataFrame, and &lt;code&gt;tail()&lt;/code&gt; method is used to access the last rows.&lt;br&gt;
The opposite method of &lt;code&gt;head()&lt;/code&gt;is&lt;code&gt;tail()&lt;/code&gt; which shows the last n &lt;br&gt;
(5 by default) rows of the dataframe object.&lt;/p&gt;

&lt;p&gt;To access the top 6 rows:  &lt;code&gt;dataframe_name.head(6)&lt;/code&gt;&lt;br&gt;
To access the last 7 rows: &lt;code&gt;dataframe_name.tail(7)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;1️⃣1️⃣ How do you  get the max and min index in a dataframe.&lt;br&gt;
      &lt;code&gt;df['Column_name'].idxmax()&lt;/code&gt;&lt;br&gt;
      &lt;code&gt;df['Column_name'].idxmin()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;1️⃣2️⃣  How do you u describe the Relationship between the items or columns in a dataframe?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python pandas provides tools and methods to perform the analysis efficiently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;-Correlation Analysis- Correlation analysis is a statistical &lt;br&gt;
 technique used to measure the strength and direction of the &lt;br&gt;
 linear relationship between two variables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interpretation of Correlation Coefficients:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The correlation coefficient (often denoted by the symbol ρ or r) &lt;br&gt;
ranges from -1 to 1:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A correlation coefficient of 1 indicates a perfect positive &lt;br&gt;
linear relationship, meaning that as one variable increases, the &lt;br&gt;
other variable also increases proportionally.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A correlation coefficient of -1 indicates a perfect negative &lt;br&gt;
linear relationship, meaning that as one variable increases, the &lt;br&gt;
other variable decreases proportionally.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A correlation coefficient of 0 indicates no linear relationship &lt;br&gt;
between the variables.Values between -1 and 1 represent the &lt;br&gt;
strength and direction of the linear relationship:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Values closer to 1 indicate a &lt;strong&gt;stronger positive correlation.&lt;/strong&gt;&lt;br&gt;
Values closer to -1 indicate a &lt;strong&gt;stronger negative correlation&lt;/strong&gt;.&lt;br&gt;
Values closer to 0 indicate &lt;strong&gt;a weaker or no linear relationship&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;1️⃣3️⃣ How do you get the value of an item in a row after running the code?&lt;br&gt;
    &lt;code&gt;value_counts()&lt;/code&gt; to get values&lt;/p&gt;

&lt;p&gt;1️⃣4️⃣ .What does &lt;code&gt;info()&lt;/code&gt;method print in pandas and how do u export the DataFrame to a CSV file?&lt;/p&gt;

&lt;p&gt;&lt;code&gt;info()&lt;/code&gt;- This method prints out a concise summary of the data frame including information about the index, data types, columns, non- null values, and memory usage.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After you have cleaned and preprocessed your data, the next step may be to export the data frame to a file. to_csv()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;iris_data.to_csv("cleaned_iris_data.csv")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;1️⃣5️⃣  How do you get the highest value of a column in pandas and also how to create a copy of a data set&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; df.[column].max()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;states2 = states.copy()&lt;/p&gt;

&lt;p&gt;1️⃣6️⃣  What are some effective methods for handling missing data in pandas?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ffill(Forward Fill)- used to fill missing values in data analysis- ffill: propagates last valid observation forward to next valid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;df.fillna(method='ffill', inplace=True)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;bfill-  (backward fill) is a method used to fill missing values in a DataFrame or Series by propagating the next valid observation backward.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;df.bfill(*,axis=None, inplace=False,limit=none,limit_area=None)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;df.fillna(method='bfill', inplace=True)&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Interpolation: Interpolate missing values based on the values of neighboring data points. This method works well for ordered data such as time series.&lt;br&gt;
&lt;code&gt;df['column_name'] = df['column_name'].interpolate(method='linear')&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Using Machine Learning Model- Ml model to predict missing values based on other features in the dataset. It might be complex for small datasets.&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;`from sklearn.impute import SimpleImputer
 imputer =SimpleImputer(strategy='mean)
 df['column_name']= imputer.fit_transform(df[['column_name']`
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1️⃣7️⃣ .Briefly describe the function of df.isnull().sum()?&lt;br&gt;
  df.isnull() .sum() used to get the total number of missing values per column.&lt;/p&gt;

&lt;p&gt;1️⃣8️⃣  Which syntax is correct for the add method for a set?&lt;br&gt;
     set.add(item)&lt;/p&gt;

&lt;p&gt;1️⃣9️⃣  Which brackets would you use to create a python tuple?&lt;br&gt;
       ()&lt;/p&gt;

&lt;p&gt;2️⃣0️⃣ Which data structure allows you to add duplicate heterogeneous data &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; type elements in it?   `list`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;2️⃣1️⃣.You have an if statement consisting of two expressions with a&lt;br&gt;
    logical and operator between them. Which case would lead the Python &lt;br&gt;
    interpreter to run the body of the if statement?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When the expressions return True and True responses.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2️⃣2️⃣  .In Python which name would you choose for a variable?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    _count
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>python</category>
      <category>pandas</category>
      <category>datascience</category>
      <category>interview</category>
    </item>
    <item>
      <title>📝 Mastering Date/Time Extraction in Pandas: From Year to Weekend Check.</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Sun, 11 May 2025 04:10:39 +0000</pubDate>
      <link>https://forem.com/gateru/mastering-datetime-extraction-in-pandas-from-year-to-weekend-check-38ab</link>
      <guid>https://forem.com/gateru/mastering-datetime-extraction-in-pandas-from-year-to-weekend-check-38ab</guid>
      <description>&lt;p&gt;Time series data is everywhere in stock prices, weather data,server logs, sales records ,website traffic per hour,machine logs per second. Pandas has tools for parsing,indexing,resampling and analyzing datasets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; - The .dt accessor works for period and timedelta dtypes.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.Converting to Datetime.
&lt;/h3&gt;

&lt;p&gt;The first process convert the date column to date time before doing the time series analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['date'] = pd.to_datetime(df['date'])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.Extracting day from a column date
&lt;/h3&gt;

&lt;p&gt;a.Having the output as 0,1,2,3...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['day_of_week_num'] = df['date'].dt.weekday
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7hdwiz3ho3bf1uxishg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7hdwiz3ho3bf1uxishg.png" alt=" " width="359" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;df['day_of_week_num']: It extracts the numeric day of the week, where Monday is 0 and Sunday is 6.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;b. Having the output as name Monday,tuesday,Wednesday,thursday&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['day_of_week_name'] = df['date'].dt.day_name()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;It returns the name of the day for each date in the Series or DataFrame, such as "Monday", "Tuesday"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;c. Abbreviated day name (e.g., "Mon", "Tue", "Wed")&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['day_of_week_abbr'] = df['date'].dt.strftime('%a')

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This gives you a short form for the day name, which is often used when you want to save space or display compact labels.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Check for Weekend.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['is_weekend'] = df['date'].dt.weekday &amp;gt;= 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This create a new column in your DataFrame that flags whether a date falls on a weekend (Saturday or Sunday).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Extracting time attributes .
&lt;/h3&gt;

&lt;p&gt;Here we will extract Hours,Minutes and seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;.dt.hour:&lt;/strong&gt; - This Extracts the hour from the datetime (0–23).&lt;br&gt;
&lt;strong&gt;.dt.minute:&lt;/strong&gt; - It Extracts the minute from the datetime (0–59)&lt;br&gt;
&lt;strong&gt;.dt.second:&lt;/strong&gt; It Extracts the second from the datetime (0–59). &lt;/p&gt;
&lt;h3&gt;
  
  
  5. Extract Month (Number and Name).
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['month'] = df['date'].dt.month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Extracts the numeric month (1–12) from the date column.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['month_name'] = df['date'].dt.month_name() 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Extracts the full month name (e.g., "January", "February").&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;C. Three-letter abbreviation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['month_abbr'] = df['month_name'].str[:3]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6.To Extract the quarter of the year from a datetime column use .dt.quarter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; df['quarter'] = df['date'].dt.quarter - 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxehhkj57gpa2naji4y3l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxehhkj57gpa2naji4y3l.png" alt=" " width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  7.Half of the Year (0 = Jan–Jun, 1 = Jul–Dec)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['half'] = np.where(df['month'] &amp;lt; 6, 0, 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It assumes your month column is zero-based (i.e., 0 = January, 11 = December).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Months 0–5 → First Half (0)&lt;/li&gt;
&lt;li&gt;Months 6–11 → Second Half (1)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8 . Week of the Year (.dt.isocalendar().week)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['week_of_year'] = df['date'].dt.isocalendar().week
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: As of Pandas 1.1.0, .dt.week is deprecated and replaced by .dt.isocalendar().week.&lt;/p&gt;

&lt;p&gt;Note.What is ISO Week?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ISO weeks follow the ISO 8601 standard:&lt;/li&gt;
&lt;li&gt;Weeks start on Monday&lt;/li&gt;
&lt;li&gt;Week 1 is the week that contains the first Thursday of the year&lt;/li&gt;
&lt;li&gt;Can have 52 or 53 weeks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. Year.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['year']=df['date'].dt.year
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Extracts the year as an interger.&lt;/li&gt;
&lt;li&gt;It's useful for grouping,filtering or feature 
engineering.&lt;/li&gt;
&lt;li&gt;Splits time series data by year.&lt;/li&gt;
&lt;li&gt;Groups data for annual trends.&lt;/li&gt;
&lt;li&gt;Creates Pivot tables or plots year-wise&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;date&lt;/th&gt;
&lt;th&gt;year&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2022-04-05&lt;/td&gt;
&lt;td&gt;2022&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023-11-12&lt;/td&gt;
&lt;td&gt;2023&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024-01-01&lt;/td&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  10.  📅 Day of the month.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;This extracts the day of the month (1-31)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['day_of_month'=df['date'].dt.day
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;date&lt;/th&gt;
&lt;th&gt;day&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2024-01-05&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024-01-15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024-01-31&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  11. How to handle holidays  in pandas
&lt;/h3&gt;

&lt;p&gt;You have to manually define your own list of public holidays and pass them to customBusinessDay.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from pandas.tseries.offsets import CustomBusinessDay
 Kenya_holidays=pd.to_datetime([
   '2025-01-01',  # New Year's Day
   '2025-03-29',  # Good Friday
    '2025-05-01',  # Labour Day
    '2025-06-01',  # Madaraka Day
    '2025-10-20',  # Mashujaa Day
    '2025-12-12',  # Jamhuri Day
    '2025-12-25',  # Christmas
    '2025-12-26',  # Boxing Day
])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Create a custom businessDay (Holidays=kenya_holidays)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kenya_bday=CustomBusinessDay(holidays=kenya_holidays)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;used to generate business days
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dates=pd.date_range('2025-12-20','2025-12-31' freq=kenya_bday)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  12. dt.period is an accessor in pandas used with datetime-like columns to convert them into period objects.
&lt;/h3&gt;

&lt;p&gt;a.Calendar Frequencies.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This is an example of calendar frequency. 
df['period'] = df['date'].dt.to_period('M')&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Frequency Code&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;'D'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023-05-11&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;'W'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Weekly&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023-05-08/2023-05-14&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;'M'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Month End&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023-05&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;'MS'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Month Start&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023-05&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;'Q'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Quarter End&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023Q2&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;'QS'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Quarter Start&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023Q2&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;'A'&lt;/code&gt; or &lt;code&gt;'Y'&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Year End&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;'AS'&lt;/code&gt; or &lt;code&gt;'YS'&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Year Start&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2023&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;b.Business Frequencies.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They are special time intervals that consider only working  days(Monday to Friday) and can exclude holidays if well defined.&lt;/li&gt;
&lt;li&gt;Finance - No trading on weekends or holidays.&lt;/li&gt;
&lt;li&gt;Payroll/HR - Payments and attendance only on working days.&lt;/li&gt;
&lt;li&gt;Operations - Delivery and logistics exclude weekends.
The key difference from calendar frequencies include.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firf3uy3vwbsec85rmhxz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firf3uy3vwbsec85rmhxz.png" alt=" " width="800" height="284"&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business Day Frequencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw14mr36tcqrtiocbhvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw14mr36tcqrtiocbhvj.png" alt=" " width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Business Month Frequencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7uuux0agno3yh9uhtbxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7uuux0agno3yh9uhtbxc.png" alt=" " width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Get last business day of each month in 2024
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bm_end = pd.date_range(start='2024-01-01', end='2024-12-31', freq='BM')
print(bm_end)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output &amp;gt;&amp;gt;&amp;gt; DatetimeIndex(['2024-01-31', '2024-02-29', '2024-03-29', '2024-04-30', ...], dtype='datetime64[ns]', freq='BM')&lt;/p&gt;

&lt;p&gt;-Business Quarter Frequencies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fy6dzgoqgi648l4390n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fy6dzgoqgi648l4390n.png" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business Year Frequencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7f5wdy5uezzpsqz39sm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7f5wdy5uezzpsqz39sm.png" alt=" " width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>python</category>
      <category>tutorial</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Cleaning data in PostgreSQL.</title>
      <dc:creator>Kaira Kelvin. </dc:creator>
      <pubDate>Tue, 29 Apr 2025 09:30:28 +0000</pubDate>
      <link>https://forem.com/gateru/cleaning-data-in-postgresql-3jkp</link>
      <guid>https://forem.com/gateru/cleaning-data-in-postgresql-3jkp</guid>
      <description>&lt;h1&gt;
  
  
  Cleaning string data.
&lt;/h1&gt;

&lt;h3&gt;
  
  
  1a. Capitalization of strings eg people's names, cities, countries.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;INITCAP(input_string) - fixing capitalization.&lt;/strong&gt;
To update the names of the whole column permanently use UPDATE eg
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPDATE customers
SET customer_name = INITCAP(customer_name);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1b. Renaming columns
&lt;/h3&gt;

&lt;p&gt;When renaming columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Use the snake rule i.e  machine_id; &lt;/li&gt;
&lt;li&gt; Use lower case machine_id rather than MACHINE_ID&lt;/li&gt;
&lt;li&gt; Do not include parenthesis() in naming columns for instance
Spindle_Speed(RPM) it should be spindle_speed_rpm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;u&gt;Benefits of renaming columns. &lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easier automation and tooling : Many tools (like ORMs, data pipelines, or SQL formatters) expect lowercase, unquoted identifiers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The two ways to rename columns include;
&lt;/h3&gt;

&lt;p&gt;1️⃣ Rename Columns in a Query Result (use AS).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you just want different column headings in the output, use an alias:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    first_name AS "First Name",
    last_name  AS "Last Name",
    salary * 12 AS "Annual Salary"
FROM employees;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This does not change the table structure—only the names in the result set.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2️⃣ Permanently Rename Columns in a Table.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you need to actually change the column name in the database schema, use ALTER TABLE … RENAME COLUMN.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;alter table databel
rename column"Avg Monthly GB Download" to avg_montly_gb_download;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1c. Stripping.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;For instance removing $ from a column
sales['Revenue'] = sales['Revenue'].str.strip('$')&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2 a.Removing spaces in strings.
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;REPLACE(input_string, to_replace, replacement)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;2b. Standardizing data using the .loc function&lt;br&gt;
Finds all rows where tire_sizes is greater than 27 and sets those values to 27.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; ride_sharing.loc[ride_sharing['tire_sizes'] &amp;gt; 27, 'tire_sizes'] = 27
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Set all in the future to today's date
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ride_sharing.loc[ride_sharing['ride_dt'] &amp;gt; today, 'ride_dt'] = today
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.Makes strings have uniform length. &lt;strong&gt;LPAD(input_string,length[,fill_value]&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    LPAD(event_id, 10, '0') as event_id, 
    -- Replace consecutive spaces with a single space
    REGEXP_REPLACE(INITCAP(parking_held), ' +', ' ','g')  as 
    parking_held
    FROM 
    film_permit;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4.Pattern matching using the regular expressions (REs)&lt;br&gt;
      - Basic of REs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;~&lt;/code&gt; : Matches regex pattern (case-sensitive)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;~*&lt;/code&gt; : Matches regex pattern (case-insensitive)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;!~&lt;/code&gt; : Does not match regex pattern (case-sensitive)&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;!~*&lt;/code&gt; : Does not match regex pattern (case-insensitive)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Regex Metacharacters&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metacharacter&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Example Query&lt;/th&gt;
&lt;th&gt;Returns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;\d&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Digit (0–9)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT '123' ~ '\\d\\d\\d';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;\w&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Word character (alphanumeric or underscore)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'abc_123' ~ '\\w+';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;\s&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Whitespace character&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT ' ' ~ '\\s';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any character except newline&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'a9c' ~ 'a.c';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;^&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Anchors to the beginning of a string&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'Hello' ~ '^H';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Anchors to the end of a string&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'Hello' ~ 'o$';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Matches 0 or more of the preceding character&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'hoop' ~ 'ho*p';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;+&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Matches 1 or more of the preceding character&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'hoop' ~ 'ho+p';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Matches 0 or 1 of the preceding character&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'color' ~ 'colou?r';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Character class: match one of the characters&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'a' ~ '[abc]';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;[^]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Negated character class: not in the set&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'x' ~ '[^0-9]';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;`&lt;/td&gt;
&lt;td&gt;`&lt;/td&gt;
&lt;td&gt;Alternation (OR)&lt;/td&gt;
&lt;td&gt;`SELECT 'dog' ~ 'cat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;{% raw %}&lt;code&gt;()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Grouping for subpatterns&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT 'abab' ~ '(ab)+';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: In SQL strings, backslashes (&lt;code&gt;\&lt;/code&gt;) must be &lt;strong&gt;escaped&lt;/strong&gt;, so you’ll often use &lt;strong&gt;double backslashes&lt;/strong&gt; (&lt;code&gt;\\d&lt;/code&gt;, &lt;code&gt;\\s&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;/blockquote&gt;



&lt;p&gt;&lt;strong&gt;Filtering Rows with Regex&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM users
WHERE email ~* '^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5.Using REGEXP_REPLACE() used to replace patterns in strings. - - Its similar to REPLACE() function.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its  general syntax is
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REGEXP_REPLACE(source,pattern,replace,flags).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;pattern- string pattern to match in source string.&lt;/li&gt;
&lt;li&gt;replace- replacement string to use in place of the pattern.&lt;/li&gt;
&lt;li&gt;flags- an optional string used to control matching.
for instance
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REGEXP_REPLACE(cars,'\d','_','g')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;6.Matching similar strings&lt;br&gt;
postgreSQl does not have function soundex first&lt;br&gt;
 u run create extension. &lt;br&gt;
           CREATE EXTENSION Fuzzystrmatch&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;     SOUNDEX(input_string)  -  4 character code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT summons_number, vehicle_color
FROM parking_violation
WHERE
  -- Match SOUNDEX codes of vehicle_color and 'GRAY'
DIFFERENCE(vehicle_color, 'GRAY') = 4;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT summons_number,vehicle_color,
    -- Include the DIFFERENCE() value for each color
    DIFFERENCE(vehicle_color, 'RED') AS "red",
    DIFFERENCE(vehicle_color, 'BLUE') AS "blue",
    DIFFERENCE(vehicle_color, 'YELLOW') AS "yellow"
FROM
    parking_violation
WHERE 
    (
        -- Condition records on DIFFERENCE() value of 4
        DIFFERENCE(vehicle_color, 'RED') = 4 OR
        DIFFERENCE(vehicle_color, 'BLUE') = 4 OR
        DIFFERENCE(vehicle_color, 'YELLOW') = 4
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    summons_number,
    vehicle_color,
    DIFFERENCE(vehicle_color, 'RED') AS "red",
    DIFFERENCE(vehicle_color, 'BLUE') AS "blue",
    DIFFERENCE(vehicle_color, 'YELLOW') AS "yellow"
FROM
    parking_violation
WHERE
    (
        DIFFERENCE(vehicle_color, 'RED') = 4 OR
        DIFFERENCE(vehicle_color, 'BLUE') = 4 OR
        DIFFERENCE(vehicle_color, 'YELLOW') = 4
    -- Exclude records with 'BL' and 'BLA' vehicle colors
    ) AND vehicle_color NOT SIMILAR TO 'BLA?'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;7.Handling missing data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Missing data can be caused by human error,systematic issues such as a failed replication process.
Types of missing data.
-Missing completely at random (MCAR)
-Missing at Random (MAR)
-Missing not at random(MNAR)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Representations for missing values :
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Null (general)&lt;/li&gt;
&lt;li&gt; '' - empty string (used for string columns)&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Types of missing data
&lt;/h3&gt;

&lt;p&gt;1️⃣ Missing Completely at Random (MCAR)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The data is missing for no reason at all.&lt;/li&gt;
&lt;li&gt;No systematic relationship between missing data and other values.&lt;/li&gt;
&lt;li&gt;Best kind of missing data - safe to remove or ignore.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2️⃣ Missing at Random (MAR) &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For example - Older patient skipping recording their 
weight.&lt;/li&gt;
&lt;li&gt;Systematic relationship between missing data and other 
observed values.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3️⃣ Missing Not at Random (MNAR).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The data is missing because of the missing value itself.&lt;/li&gt;
&lt;li&gt;for example; People with very high incomes don't want to 
share their income.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Identifying missing data.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Is Null is used to check whether a value is NULL. Null in databases indicates that data is either missing or not applicable which is different from an empty string or zero.
example
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  select title from film 
  where release_year is null;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select inspection_type,count(*) as count
from restaurant_ispection
where score is null
group by inspection_type
order by count desc;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Rectifying missing data.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Best option: locate and add missing values , may not be  feasible, may not be worthwhile.&lt;/li&gt;
&lt;li&gt;Provide a value.&lt;/li&gt;
&lt;li&gt;Exclude records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;u&gt;coalesce()&lt;/u&gt; function is designed to evaluate a list of arguments and return the first non-null argument it encounters.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It enhances the readability and robustness of SQL 
queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It has unlimited arguments.&lt;/li&gt;
&lt;li&gt;Returns first non null value evaluates each argument in
the order they are provided and returns the first one 
that is not null.&lt;/li&gt;
&lt;li&gt;Returns Null if all are null&lt;/li&gt;
&lt;li&gt;It stops evaluating if it finds a non null value.&lt;/li&gt;
&lt;li&gt;Coalesce can be employed in JOIN conditions to handle 
nullable columns ensuring that null values do not 
disrupt the join logic. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;replacing null values with unknown&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPDATE countries
SET
--replace null countries values with 'Unknown'
country=Coalesce(national_park,'Unknown');
Select Count(*) From countries
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Handling duplicated data.
&lt;/h3&gt;

&lt;p&gt;Duplicates wastes storage resources.&lt;br&gt;
Duplicates distorts analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
Duplicated values should be addressed with intimate domain knowledge.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Row_number()&lt;/strong&gt; function determines where groups of columns values are duplicated in a dataset.An OVER clause must follow the ROW_NUMBER()function call.The OVER clause defines the window(or set of results over which the function will be applied.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ROW_NUMBER() OVER( PARTITION BY 
    plate_id, issue_date,violation_time,house_number,street_name
      ) - 1 AS duplicate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;7a Visualizing missing values. we use msno.matrix(airquality)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import missingno as msno
import matplotlib.pyplot as plt
msno.matrix(airquality)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;7b. Isolate missing and complete values aside.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;missing = airquality[airquality['C02'].isna()]
complete= airquality[~airquality['C02'].isna()]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;~airquality['co2'].isna()&lt;br&gt;
The ~ operator inverts the Boolean values, so True becomes False and vice versa. This selects rows where 'co2' is NOT missing.&lt;/p&gt;

&lt;p&gt;8.Detecting invalid values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data may need cleaning when table columns contains values that are not valid for the attribute that the column represents.
### Handling invalid data with pattern matching 
Using
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;where score NOT SIMILAR TO '\d+';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;No restriction on length of value&lt;/li&gt;
&lt;li&gt;Query only restricts non-digits character&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using not similar to get the registration_state not in UPPER CASE and not in 2 characters&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT Summons_number,plate_id,registration_state
FROM parking_violation
WHERE registration_state NOT SIMILAR TO '[A-Z{2}'] 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;9.Determining column types.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Select column_name, data_type
from information_schema.columns
Where
table_name = 'machine_downtime' AND 
column_name = 'recorded_date';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffac42ivy5p6bmu64ck26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffac42ivy5p6bmu64ck26.png" alt=" " width="504" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;BIGINT is a data type used to store very large whole numbers safely, much larger than the regular INT type.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;INT can handle about ±2 billion (±2^31)&lt;/li&gt;
&lt;li&gt;BIGINT can handle about ±9 quintillion (±2^63)
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT MAX(summons_number) - MIN(summons_number) AS range_size
FROM parking_violation;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;10.Parsing timestamps with TO_TIMESTAMP()&lt;br&gt;
The timestamp syntax is&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TO_TIMESTAMP(ts_string,format_string) &amp;gt; TIMESTAMP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TO_TIMESTAMP(inspection_datetime,'YYYY-MM-DD HH24:MI'),
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;T0_TIMESTAMP(ts_string,format)- &lt;br&gt;
TO_CHAR(ts_value,format) - timestamp - to string&lt;/p&gt;

&lt;p&gt;11.Combining columns.&lt;br&gt;
-Concatenation involves joining individual values end-to-end to create a single combined value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CONCAT(string[,string2,string3,....]) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Select CONCAT ( name,E'\n',&lt;br&gt;
                building, ' ', street, E'\n',&lt;br&gt;
                boro, ', NY', Zip_code ) AS &lt;br&gt;
                mailing_address &lt;br&gt;
              FROM &lt;br&gt;
                 restaurant_inspection;&lt;br&gt;&lt;br&gt;
|| double pipe operator, like CONCAT() joins the values that surround the double pipe.&lt;br&gt;
string1 || string2 [ || string3 || ...]&lt;/p&gt;

&lt;p&gt;select 'data' || ' ' || 'cleaning' || ' ' || 'is' || ' '|| 'fun'; &lt;br&gt;
11.Splitting columns.&lt;br&gt;
-STRPOS(source_string, search_string).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It returns the integer representing the first position containing search string in source string.&lt;/li&gt;
&lt;li&gt;For instance to find the position of first '-' we use
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  SELECT 
     STRPOS(house_number,'-') AS dash_position
  FROM
     parking_violation;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;-SUBSTRING(source_string FROM start_pos For num_chars)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT
  house_number,
  SUBSTRING(
    house_number
    FROM STRPOS(house_number, '-') + 1
    FOR LENGTH(house_number) - STRPOS(house_number, '-')
  ) AS new_house_number
FROM
parking_violation;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;12.Splitting strings using SPLIT_PART()&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SPLIT_PART(source_string,delimiter_string,part_number) - The SPLIT_PART() function requires 3 arguments: a source string, the delimiter on which to split and the index of the component to return from the split components.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; SELECT 
      SPLIT_PART ('Cycle Inspection / Re-inspection', ' / ', 2 ); 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;REGEXP_SPLIT_TO_TABLE(source,pattern) - the REGEXP_SPLIT_TO_TABLE() function splits string values in a source column using a regular expression pattern. For instance&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT REGEXP_SPLIT_TO_TABLE('Pizza/Italian', '/' );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Duplicate values &lt;/li&gt;
&lt;li&gt;These are columns that have the same values.
we use the .duplicated () to get duplicates across all columns, it returns a series of boolean values that are True for duplicate values and false for non-duplicated values.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   duplicates= ride_sharing.duplicated(subset = 'ride_id', keep =       False)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To drop complete duplicates from ride_sharing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ride_dup = ride_sharing.drop_duplicates()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Uniformity treating date data.
It converts date to  datetime- but won't work! since the date is not in the same format.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;birthdays['Birthday']=pd.to_datetime(birthdays['Birthday'],infer_datetime_format=True,errors='coerce')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;infer_datetime_format=True tells Python to try to figure out the format of the date automatically,instead of the coder 'me' having to specify it. It makes the conversion easier and faster escpecially if the dates are written in different styles such as 2015-05-31, 31/05/2025 or May 31,2025.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>sqlserver</category>
      <category>datascience</category>
      <category>data</category>
    </item>
  </channel>
</rss>
