<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shamso Osman</title>
    <description>The latest articles on Forem by Shamso Osman (@shamso_osman).</description>
    <link>https://forem.com/shamso_osman</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1882104%2Fe17c1904-2b8c-4ee9-8d64-dbab21cda57e.png</url>
      <title>Forem: Shamso Osman</title>
      <link>https://forem.com/shamso_osman</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shamso_osman"/>
    <language>en</language>
    <item>
      <title>Building a Chatbot with Next.js, Cerebras API and Llama 3.1</title>
      <dc:creator>Shamso Osman</dc:creator>
      <pubDate>Sun, 10 Nov 2024 14:45:33 +0000</pubDate>
      <link>https://forem.com/shamso_osman/building-a-chatbot-with-nextjs-cerebras-api-and-llama-31-1dc7</link>
      <guid>https://forem.com/shamso_osman/building-a-chatbot-with-nextjs-cerebras-api-and-llama-31-1dc7</guid>
      <description>&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;An interactive AI-powered Japanese language learning platform built with Next.js and the Cerebras AI API. The project creates a personalized JLPT N5 study experience by combining modern AI conversation capabilities with structured language learning principles. The system provides real-time Japanese text formatting with furigana support and category-based learning modules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8fqn75878ww6im3hgj6.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8fqn75878ww6im3hgj6.jpeg" alt="Project Demo" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Statement
&lt;/h2&gt;

&lt;p&gt;With my JLPT N5 exam looming just weeks away, I found myself needing a study buddy that could help me practice Japanese at any time. Sure, there are plenty of flashcard apps and textbooks out there, but I wanted something more interactive - something that could give me immediate feedback and adapt to my needs. That's when I decided to combine my programming skills with my Japanese studies, leveraging the new Cerebras AI API to create a personalized AI language tutor.&lt;/p&gt;

&lt;p&gt;If you've ever studied Japanese, you know the struggle. Textbooks are static, apps can be rigid, and finding a study partner who's available at 2 AM when you're cramming particles and kanji? Good luck! Plus, most resources either show only romaji (roman letters) or throw you into the deep end with kanji without reading aids.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Create an interactive, AI-powered Japanese tutor focused on JLPT N5 preparation.&lt;/li&gt;
&lt;li&gt;Implement consistent Japanese text formatting with automatic furigana display.&lt;/li&gt;
&lt;li&gt;Provide structured learning paths across vocabulary, grammar, reading, and kanji.&lt;/li&gt;
&lt;li&gt;Deliver immediate, contextual feedback to learners.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;p&gt;To bring this idea to life, I combined a few technologies that allowed me to quickly prototype and build the platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend Framework&lt;/strong&gt;: 
For the frontend, I used Next.js. Its server-side rendering capabilities and API routes make it easy to build dynamic, responsive apps.
&lt;code&gt;Next.js 15.0.3&lt;/code&gt;, &lt;code&gt;React 18.2.0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Integration&lt;/strong&gt;: 
The core of the chatbot is the &lt;code&gt;Llama 3.1-70B model&lt;/code&gt;, accessed via &lt;code&gt;Cerebras Cloud SDK&lt;/code&gt;. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI Components&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;FontAwesome icons for visual elements&lt;/li&gt;
&lt;li&gt;Custom CSS modules for styling&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;State Management&lt;/strong&gt;: React &lt;code&gt;useState&lt;/code&gt; hooks&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;API Integration&lt;/strong&gt;: Custom Next.js API routes&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Frontend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single-page application with component-based architecture&lt;/li&gt;
&lt;li&gt;Message handling system for chat interface&lt;/li&gt;
&lt;li&gt;Quick response component for guided learning&lt;/li&gt;
&lt;li&gt;Responsive design with mobile support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Backend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;RESTful API endpoint (/api/chat)&lt;/li&gt;
&lt;li&gt;Integration with Cerebras AI model&lt;/li&gt;
&lt;li&gt;Custom message formatting middleware&lt;/li&gt;
&lt;li&gt;Environment-based configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Flow
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;User input captured through chat interface&lt;/li&gt;
&lt;li&gt;Messages processed through Next.js API route&lt;/li&gt;
&lt;li&gt;AI response generated via Cerebras API&lt;/li&gt;
&lt;li&gt;Response formatted and displayed with proper Japanese text styling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;Since I’m preparing for JLPT N5, I focused primarily on kana (Hiragana &amp;amp; Katakana). The chatbot is designed to return answers primarily in kana, with kanji and romaji as needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Chat Interface
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setMessages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;([]);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  AI Integration
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;completionParams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;llama3.1-8b&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Japanese Text Formatting
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;([\u&lt;/span&gt;&lt;span class="sr"&gt;4E00-&lt;/span&gt;&lt;span class="se"&gt;\u&lt;/span&gt;&lt;span class="sr"&gt;9FFF&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)\((&lt;/span&gt;&lt;span class="sr"&gt;.*&lt;/span&gt;&lt;span class="se"&gt;?)\)&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;kanji&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;furigana&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;kanji&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;(&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;furigana&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)`&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Challenges Overcome
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Implemented a regex-based system for formatting Japanese text with furigana.&lt;/li&gt;
&lt;li&gt;Developed dynamic, category-based quick responses to enhance learning engagement.&lt;/li&gt;
&lt;li&gt;Designed a fully responsive layout compatible with various screen sizes.&lt;/li&gt;
&lt;li&gt;Optimized the AI response handling for improved performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Testing and Validation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Conducted manual testing to ensure the chat functionality works as intended.&lt;/li&gt;
&lt;li&gt;Tested for cross-browser compatibility to ensure consistent behavior across different browsers.&lt;/li&gt;
&lt;li&gt;Validated mobile responsiveness to ensure a seamless experience on smartphones and tablets.&lt;/li&gt;
&lt;li&gt;Verified the accuracy of Japanese text formatting and its proper display.&lt;/li&gt;
&lt;li&gt;Ensured proper handling of API responses under various conditions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Results and Evaluation
&lt;/h3&gt;

&lt;p&gt;The system successfully:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delivers real-time Japanese language instruction for learners.&lt;/li&gt;
&lt;li&gt;Maintains consistent and accurate formatting of Japanese text with furigana.&lt;/li&gt;
&lt;li&gt;Provides structured learning paths across various language components.&lt;/li&gt;
&lt;li&gt;Ensures responsive performance across all devices.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Performance Metrics
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Average API response time&lt;/strong&gt;: ~2-3 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Japanese text formatting accuracy&lt;/strong&gt;: 100%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile compatibility&lt;/strong&gt;: Fully responsive design&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser support&lt;/strong&gt;: Compatible with all modern browsers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;The project successfully creates an interactive learning environment for Japanese language learners, integrating AI capabilities with structured language instruction. Future improvements will include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integration of speech synthesis for pronunciation assistance.&lt;/li&gt;
&lt;li&gt;Implementation of a progress tracking system for learners.&lt;/li&gt;
&lt;li&gt;Expansion of content coverage, particularly in vocabulary and kanji.&lt;/li&gt;
&lt;li&gt;Incorporation of a spaced repetition learning system to reinforce knowledge retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code and Documentation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The project repository follows Next.js conventions for structure and organization.&lt;/li&gt;
&lt;li&gt;API documentation includes detailed information about integrating with the Cerebras AI model.&lt;/li&gt;
&lt;li&gt;Custom CSS modules are used for styling specific components of the UI.&lt;/li&gt;
&lt;li&gt;An environment configuration guide is provided for deployment and setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself!
&lt;/h2&gt;

&lt;p&gt;Want to check it out or contribute? &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/S-Osman4/Cerebras/" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Remember to star the repo if you found this helpful! ⭐&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live Demo - coming soon!&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://inference-docs.cerebras.ai/quickstart" rel="noopener noreferrer"&gt;Cerebras Cloud SDK Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nextjs.org/docs" rel="noopener noreferrer"&gt;Next.js Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://react.dev/" rel="noopener noreferrer"&gt;React Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.fontawesome.com/v5/web/use-with/react" rel="noopener noreferrer"&gt;FontAwesome React Component Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jlpt.jp/e/about/levelsummary.html" rel="noopener noreferrer"&gt;Japanese Language Learning Standards (JLPT N5)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Understanding Your Data: The Essentials of Exploratory Data Analysis</title>
      <dc:creator>Shamso Osman</dc:creator>
      <pubDate>Sun, 11 Aug 2024 18:17:13 +0000</pubDate>
      <link>https://forem.com/shamso_osman/understanding-your-data-the-essentials-of-exploratory-data-analysis-109k</link>
      <guid>https://forem.com/shamso_osman/understanding-your-data-the-essentials-of-exploratory-data-analysis-109k</guid>
      <description>&lt;p&gt;When working with a new dataset, it's important to explore the data to understand its structure, patterns, and anomalies. This process, known as Exploratory Data Analysis (EDA), helps you get familiar with the data before diving into modeling or drawing conclusions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Exploratory data analysis is one of the basic and essential steps of a data science project. A data scientist involves almost 70% of his work in doing the EDA of the dataset. &lt;a href="https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Key aspects of EDA include:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Distribution of Data&lt;/strong&gt;: Examining the distribution of data points to understand their range, central tendencies (mean, median), and dispersion (variance, standard deviation).&lt;br&gt;
&lt;strong&gt;Graphical Representations&lt;/strong&gt;: Utilizing charts such as histograms, box plots, scatter plots, and bar charts to visualize relationships within the data and distributions of variables.&lt;br&gt;
&lt;strong&gt;Outlier Detection&lt;/strong&gt;: Identifying unusual values that deviate from other data points. Outliers can influence statistical analyses and might indicate data entry errors or unique cases.&lt;br&gt;
&lt;strong&gt;Correlation Analysis&lt;/strong&gt;: Checking the relationships between variables to understand how they might affect each other. This includes computing correlation coefficients and creating correlation matrices.&lt;br&gt;
&lt;strong&gt;Handling Missing Values&lt;/strong&gt;: Detecting and deciding how to address missing data points, whether by imputation or removal, depending on their impact and the amount of missing data.&lt;br&gt;
&lt;strong&gt;Summary Statistics&lt;/strong&gt;: Calculating key statistics that provide insight into data trends and nuances.&lt;/p&gt;
&lt;h3&gt;
  
  
  Types of Exploratory Data Analysis (EDA)
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. Univariate Analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; Focuses on analyzing a single variable at a time.&lt;br&gt;
&lt;strong&gt;Purpose:&lt;/strong&gt; To understand the variable's distribution, central tendency, and spread.&lt;br&gt;
&lt;strong&gt;Techniques:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Descriptive statistics (mean, median, mode, variance, standard deviation).&lt;/li&gt;
&lt;li&gt;Visualizations (histograms, box plots, bar charts, pie charts).&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2. Bivariate Analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; Examines the relationship between two variables.&lt;br&gt;
&lt;strong&gt;Purpose:&lt;/strong&gt; To understand how one variable affects or is associated with another.&lt;br&gt;
&lt;strong&gt;Techniques:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scatter plots.&lt;/li&gt;
&lt;li&gt;Correlation coefficients (Pearson, Spearman).&lt;/li&gt;
&lt;li&gt;Cross-tabulations and contingency tables.&lt;/li&gt;
&lt;li&gt;Visualizations (line plots, scatter plots, pair plots).&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  3. Multivariate Analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; Investigates interactions between three or more variables.&lt;br&gt;
&lt;strong&gt;Purpose:&lt;/strong&gt; To understand the complex relationships and interactions in the data.&lt;br&gt;
&lt;strong&gt;Techniques:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multivariate plots (pair plots, parallel coordinates plots).&lt;/li&gt;
&lt;li&gt;Dimensionality reduction techniques (PCA, t-SNE).
Cluster analysis.&lt;/li&gt;
&lt;li&gt;Heatmaps and correlation matrices.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  4. Descriptive Statistics
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; Summarizes the main features of a data set.&lt;br&gt;
&lt;strong&gt;Purpose:&lt;/strong&gt; To provide a quick overview of the data.&lt;br&gt;
&lt;strong&gt;Techniques:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Measures of central tendency (mean, median, mode).&lt;/li&gt;
&lt;li&gt;Measures of dispersion (range, variance, standard deviation).&lt;/li&gt;
&lt;li&gt;Frequency distributions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  5. Graphical Analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; Uses visual tools to explore data.&lt;br&gt;
&lt;strong&gt;Purpose:&lt;/strong&gt; To identify patterns, trends, and data anomalies through visualization.&lt;br&gt;
&lt;strong&gt;Techniques:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Charts (bar charts, histograms, pie charts).&lt;/li&gt;
&lt;li&gt;Plots (scatter plots, line plots, box plots).&lt;/li&gt;
&lt;li&gt;Advanced visualizations (heatmaps, violin plots, pair plots).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  How to perform Exploratory Data Analysis ?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flli2hrnh46h8jop3vsqz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flli2hrnh46h8jop3vsqz.png" alt="Steps of performing EDA by geeksforgeeks" width="800" height="571"&gt;&lt;/a&gt; &lt;a href="https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article I'll demonstrate the process using a sample &lt;a href="https://www.kaggle.com/datasets/ayushmi77al/weather-data-set-for-beginners" rel="noopener noreferrer"&gt;weather dataset&lt;/a&gt;. This will be a hands-on approach, so we'll walk through each step with simple explanations.&lt;/p&gt;

&lt;p&gt;Let's get started!&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: Loading the Data
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Importing Libraries&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Import Libraries
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;pandas&lt;/strong&gt; is used for data manipulation and analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;numpy&lt;/strong&gt; provides support for large, multi-dimensional arrays and matrices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;matplotlib&lt;/strong&gt; is a plotting library for creating static, animated, and&lt;br&gt;
interactive visualizations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;seaborn&lt;/strong&gt; is built on top of matplotlib and provides a high-level interface for creating attractive and informative statistical graphics&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Loading the data&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The first step in EDA is loading your data into a DataFrame.&lt;br&gt;
We can do this using pandas.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load the dataset
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File_path_to_your_csv_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flct9058qs9lrc5ww03tt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flct9058qs9lrc5ww03tt.png" alt="Importing libraries - Image by Author" width="682" height="70"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;the content in the bracket is usually the file path. alternatively you can define file path then call it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load the dataset
&lt;/span&gt;&lt;span class="n"&gt;File_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File_path_to_your_csv_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;data_frame_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;File_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bxp23clyr259qrri18a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bxp23clyr259qrri18a.png" alt="File Path - Image by Author" width="676" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can get a quick overview of the data and its structure by &lt;br&gt;
using the &lt;code&gt;.head()&lt;/code&gt; method. It gives us a glance at the first few rows to understand the basic structure — what columns are present, how the data is organized, and any initial impressions you might have about the values. By default, it shows the first 5 rows, including column names.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Display the first five rows
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Alternatively you can use a print function. Gives you the same results
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vxcuduovbezpsseaz2l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vxcuduovbezpsseaz2l.png" alt="Display the first five rows - Image by Author" width="800" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;but wait, what if you want the last rows instead? We use &lt;code&gt;.tail()&lt;/code&gt; method instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Display the last five rows
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Alternatively you can use a print function. Gives you the same results
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3d0y8oprn7gbhlqpifa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3d0y8oprn7gbhlqpifa.png" alt="Display the last five rows - Image by Author" width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hmm, how about both the head and tail of the dataframe? We can do that by simply calling our dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Display the first five and last five rows
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;

&lt;span class="c1"&gt;# Alternatively you can use a print function. Gives you the same results
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fooo7oslzyag0f5q4fagn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fooo7oslzyag0f5q4fagn.png" alt="Display the dataset - Image by Author" width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Checking the Data's Structure
&lt;/h2&gt;

&lt;p&gt;Understanding the structure means knowing the number of rows, columns, and data types present in the dataset. This can give you clues about the kind of analysis you'll be able to perform.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Check the structure of the dataset
&lt;/span&gt;&lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqvsj4s7huh6caorbihw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqvsj4s7huh6caorbihw.png" alt="Display the information of the dataset - Image by Author" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This command will show the total number of entries (rows), the number of columns, and the data type of each column. It also highlights how many non-null values are present in each column. &lt;/p&gt;

&lt;p&gt;you can also get the numbers of rows and columns separately using &lt;code&gt;.shape&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Display the number of rows and columns in the DataFrame (rows,cols)
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5x5zbz6ghnksbzo7nizn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5x5zbz6ghnksbzo7nizn.png" alt="Display the rows and columns - Image by Author" width="800" height="127"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and the columns using &lt;code&gt;.columns&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Display the column names of the DataFrame
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95un0nn9r886g2etdb7k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95un0nn9r886g2etdb7k.png" alt="Display the columns- Image by Author" width="800" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and the data types using &lt;code&gt;.dtypes&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Display the data types of each column
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtypes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8evdlwfu2esmcixwu359.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8evdlwfu2esmcixwu359.png" alt="Display the data types - Image by Author" width="768" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Summarizing the Data
&lt;/h3&gt;

&lt;p&gt;Next, you'll want to get a summary of the numerical columns. This provides an overview of the data's central tendency, dispersion, and shape of the distribution. We do that using &lt;code&gt;.describe()&lt;/code&gt;. This command gives you a quick statistical summary of each numeric column, including the mean, standard deviation, minimum, and maximum values. This helps in identifying any outliers or unusual distributions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get a summary of numerical columns
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgqfxrg7tu54d43oaknyh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgqfxrg7tu54d43oaknyh.png" alt="Display the summary of the dataset - Image by Author" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Identifying Missing Values
&lt;/h2&gt;

&lt;p&gt;Missing values can be tricky—they might represent gaps in data collection, or they might be errors. It's essential to identify and decide how to handle them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Check for missing values
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fws06aasholxvl26x8zht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fws06aasholxvl26x8zht.png" alt="Check for missing values - Image by Author" width="800" height="422"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Luckily our dataset does not have any missing values&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If any columns have missing values, you'll need to decide whether to remove them or fill them with an appropriate value (like the mean or median).&lt;/p&gt;

&lt;p&gt;We can also check for any duplicates by&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Check for and count duplicate rows
&lt;/span&gt;&lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;duplicated&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fydfmp2jumtqbxz57gcm2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fydfmp2jumtqbxz57gcm2.png" alt="Check for duplicates - Image by Author" width="800" height="120"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Luckily our dataset does not have duplicates&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 5: Visualizing the Data
&lt;/h2&gt;

&lt;p&gt;Visualization helps you see patterns, trends, and relationships in the data that might not be obvious from raw numbers. How can we represent data this way? Most ways to visualize include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Histograms: Display the distribution of numerical data.&lt;/li&gt;
&lt;li&gt;Scatter plots: Show the relationship between two numerical variables.&lt;/li&gt;
&lt;li&gt;Bar charts: Compare categorical data.&lt;/li&gt;
&lt;li&gt;Line charts: Visualize data over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's start with a simple histogram to understand the distribution of a particular column, such as temperature:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Plot a histogram of the 'Column_name' column
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Column_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edgecolor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#you can change the color to anything
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distribution of Column_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Column_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Label for the x-axis (horizontal axis)
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Label for the y-axis (vertical axis)
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;bins=30&lt;/code&gt;: This divides the data into 30 bins (intervals) for counting frequencies. You can adjust this number.&lt;br&gt;
&lt;code&gt;edgecolor='black'&lt;/code&gt;: This adds black outlines to the bars for better visual separation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cdpa8td5m0n9qdapkvo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cdpa8td5m0n9qdapkvo.png" alt="Histogram - Image by Author" width="800" height="540"&gt;&lt;/a&gt;&lt;br&gt;
let's change the &lt;code&gt;edgecolor&lt;/code&gt; to white and add gridlines&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Plot a histogram of the 'Column_name' column
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Column_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edgecolor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;white&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#you can change the color to anything
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distribution of Column_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Column_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Label for the x-axis (horizontal axis)
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Label for the y-axis (vertical axis)
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Add gridlines for better readability
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcxq438zkbxlcqbam2led.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcxq438zkbxlcqbam2led.png" alt="Histogram with gridlines - Image by Author" width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Try using different columns and different visualization methods&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Finding Correlations
&lt;/h2&gt;

&lt;p&gt;Correlation analysis helps you understand how different variables relate to each other. This is especially useful if you plan to build a predictive model later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Compute correlation matrix
&lt;/span&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Display the correlation matrix
&lt;/span&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzngau8kdorv83xoojq45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzngau8kdorv83xoojq45.png" alt="Display correlation matrix error - Image by Author" width="800" height="421"&gt;&lt;/a&gt;&lt;br&gt;
But wait, we get an error. Why is that? The &lt;code&gt;.corr()&lt;/code&gt; method works only on numerical data. How do we handle this? We extract numerical columns from the dataset by creating a new dataframe. Then try the &lt;code&gt;.corr()&lt;/code&gt; method again on the new dataframe that contains the numerical columns only.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract numerical features for correlation
&lt;/span&gt;&lt;span class="n"&gt;numerical_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_frame_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_dtypes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Compute correlation matrix on numerical data
&lt;/span&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;numerical_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Display the correlation matrix
&lt;/span&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ak7669emfsvinay5xyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ak7669emfsvinay5xyc.png" alt="IDisplay correlation matrix - Image by Author" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This matrix shows how each pair of columns relates. Values close to 1 or -1 indicate a strong relationship, while values near 0 suggest little to no relationship.&lt;/p&gt;

&lt;p&gt;A way to visualize it is using &lt;strong&gt;heatmaps&lt;/strong&gt;. Heatmaps are a type of data visualization that use color to represent the values in a matrix or table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;annot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coolwarm&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.2f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Heatmap Title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;data&lt;/code&gt;: This is the 2D dataset (e.g., a correlation matrix) that you want to visualize.&lt;br&gt;
&lt;code&gt;annot=True&lt;/code&gt;: This displays the numerical values within each cell of the heatmap.&lt;br&gt;
&lt;code&gt;cmap='coolwarm'&lt;/code&gt;: This sets the color palette for the heatmap. 'coolwarm' is a common choice, but you can explore other options.&lt;br&gt;
&lt;code&gt;fmt=".2f"&lt;/code&gt;: This formats the displayed numerical values to two decimal places.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqeqgpp5s8kvzm18jqqjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqeqgpp5s8kvzm18jqqjr.png" alt="IDisplay correlation matrix heatmap - Image by Author" width="800" height="667"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Exploratory Data Analysis Tools
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt; An interpreted, object-oriented programming language with dynamic semantics. Its high-level, built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid application development, as well as for use as a scripting or glue language to connect existing components together. Python and EDA can be used together to identify missing values in a data set, which is important so you can decide how to handle missing values for machine learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;R:&lt;/strong&gt; An open-source programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians in data science in developing statistical observations and data analysis.&lt;/p&gt;

&lt;p&gt;Remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An efficient EDA lays the foundation of a successful machine learning pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EDA is not just about statistics; it's about understanding the story your data tells.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visualization is key to uncovering patterns and anomalies.   &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Domain knowledge is essential for interpreting findings effectively.   &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By mastering EDA, you lay a strong foundation for building predictive models, making data-driven decisions, and gaining valuable insights from your data.&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/" rel="noopener noreferrer"&gt;GeekforGeeks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.simplilearn.com/tutorials/data-analytics-tutorial/exploratory-data-analysis" rel="noopener noreferrer"&gt;Simplilearn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/topics/exploratory-data-analysis" rel="noopener noreferrer"&gt;IBM What is exploratory data analysis (EDA)?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/a-data-scientists-essential-guide-to-exploratory-data-analysis-25637eee0cf6" rel="noopener noreferrer"&gt;Towards Data Science&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/code/imoore/intro-to-exploratory-data-analysis-eda-in-python" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>dataanlysis</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Data Analysis Tools and Techniques</title>
      <dc:creator>Shamso Osman</dc:creator>
      <pubDate>Sun, 04 Aug 2024 19:16:49 +0000</pubDate>
      <link>https://forem.com/shamso_osman/data-analysis-tools-and-techniques-83d</link>
      <guid>https://forem.com/shamso_osman/data-analysis-tools-and-techniques-83d</guid>
      <description>&lt;h2&gt;
  
  
  I. Introduction
&lt;/h2&gt;

&lt;p&gt;Hi there! I'm a recent graduate who developed a passion for data and its potential to drive decision-making.&lt;br&gt;&lt;br&gt;
In this article I want to explore various tools and techniques used in data analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  II. Understanding the Basics
&lt;/h2&gt;

&lt;p&gt;Before diving into techniques and tools, it's crucial to grasp some fundamental concepts:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. What is data analysis?
&lt;/h3&gt;

&lt;p&gt;Simply put, &lt;strong&gt;Data analysis&lt;/strong&gt; is the practice of working with data to answer questions and draw insights. It involves collecting, processing, and interpreting data to help make informed decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The data analysis process
&lt;/h3&gt;

&lt;p&gt;I've found that data analysis typically follows these steps:&lt;/p&gt;

&lt;p&gt;a). &lt;strong&gt;Define the question&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It involves understanding the problem, identifying the data needed to address it, and defining the metrics or indicators to measure the outcomes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;b). &lt;strong&gt;Collect the data&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It involves gathering relevant information from various sources. This can be done through various methods such as surveys, interviews, observations, or extracting from existing databases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;c). &lt;strong&gt;Clean the data&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It involves checking the data for errors and inconsistencies and correcting or removing them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;d). &lt;strong&gt;Analyze the data&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It involves applying statistical or mathematical techniques to the data to discover patterns, relationships, or trends.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;e). &lt;strong&gt;Interpret the results&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It involves drawing conclusions and generate insights from your analysis using visual representations such as charts and/or graphs. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;f). &lt;strong&gt;Communicate findings&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This involves presenting the findings of the analysis in a narrative form that is engaging and easy to understand.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  C. Types of data
&lt;/h3&gt;

&lt;p&gt;There are two main types of data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quantitative data&lt;/strong&gt;: Numerical information that can be measured and expressed as numbers (e.g., age, income, temperature).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qualitative data&lt;/strong&gt;: Non-numerical information that describes qualities or characteristics (e.g., color, texture, opinions).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  III. Essential Data Analysis Techniques
&lt;/h2&gt;

&lt;p&gt;As a beginner, I've focused on learning these fundamental techniques:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Descriptive Statistics
&lt;/h3&gt;

&lt;p&gt;This involves summarizing and describing the main features of a dataset. It takes into account past trends and how they might influence future performance. I've learned to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Measure central tendency&lt;/em&gt;: Mean, median, and mode, which represent the typical value.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Measure dispersion&lt;/em&gt;: Range, variance, and standard deviation, which describe how spread out the data is.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Analyse Data distribution&lt;/em&gt;: Shape of the data (normal, skewed, etc.) using histograms and box plots.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Exploratory Data Analysis (EDA)
&lt;/h3&gt;

&lt;p&gt;EDA is about exploring data through visual methods. I've practiced creating various charts and graphs to understand patterns and relationships in data. Most popular used ones include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Histograms&lt;/em&gt;: Display the distribution of numerical data.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Scatter plots&lt;/em&gt;: Show the relationship between two numerical variables.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Bar charts&lt;/em&gt;: Compare categorical data.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Line charts&lt;/em&gt;: Visualize data over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Inferential Statistics
&lt;/h3&gt;

&lt;p&gt;This technique allows us to make predictions or inferences about a population based on a sample of data. I'm still getting my head around concepts like &lt;strong&gt;&lt;em&gt;hypothesis testing&lt;/em&gt;&lt;/strong&gt; which involves determining if a claim about a population is true or false and &lt;strong&gt;&lt;em&gt;confidence intervals&lt;/em&gt;&lt;/strong&gt; which involves estimating a range of values that likely contains the true population parameter.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Regression Analysis
&lt;/h3&gt;

&lt;p&gt;Regression helps in understanding relationships between variables. A regression model can be linear, multiple, logistic, ridge, non-linear, life data, and more. I've started with &lt;strong&gt;&lt;em&gt;simple linear regression&lt;/em&gt;&lt;/strong&gt;, examining how one variable affects another. I'm gradually working my way up to &lt;strong&gt;&lt;em&gt;multiple regression&lt;/em&gt;&lt;/strong&gt;, which involves multiple independent variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Time Series Analysis
&lt;/h3&gt;

&lt;p&gt;This technique is used for analyzing time-stamped data. I'm learning how to identify trends, seasonality, and make forecasts.&lt;/p&gt;

&lt;h2&gt;
  
  
  IV. Popular Data Analysis Tools
&lt;/h2&gt;

&lt;p&gt;I've experimented with several tools: &lt;/p&gt;

&lt;h3&gt;
  
  
  1. Spreadsheets
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fem72lhcvkoibwwx6w5r6.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fem72lhcvkoibwwx6w5r6.jpeg" alt="Excel" width="800" height="646"&gt;&lt;/a&gt;&lt;br&gt;
I've used both &lt;strong&gt;Microsoft Excel&lt;/strong&gt; and &lt;strong&gt;Google Sheets&lt;/strong&gt;. They're great for basic data manipulation, simple visualizations, and are often my go-to for quick analyses. &lt;/p&gt;

&lt;h3&gt;
  
  
  2. Programming Languages
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa16xx3soxga07n2dtcsn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa16xx3soxga07n2dtcsn.png" alt="Programming Languages" width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;a). &lt;strong&gt;Python&lt;/strong&gt;: This has been my primary focus. I use libraries like pandas for data manipulation, matplotlib and seaborn for visualization, and scikit-learn for machine learning tasks.&lt;/p&gt;

&lt;p&gt;b). &lt;strong&gt;SQL&lt;/strong&gt;: I've used SQL for querying databases and extracting specific datasets for analysis.&lt;/p&gt;

&lt;p&gt;c). &lt;strong&gt;R&lt;/strong&gt;: While I haven't used R yet, I know it's widely used in statistical computing and graphics. It's on my list to learn in the future.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Data Visualization Tools
&lt;/h3&gt;

&lt;p&gt;I'm currently experimenting with &lt;strong&gt;Power BI&lt;/strong&gt;, which allows me to create interactive dashboards and reports.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F001qu4oszmtgb9n99zs6.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F001qu4oszmtgb9n99zs6.jpeg" alt="Power Bi" width="474" height="266"&gt;&lt;/a&gt; &lt;br&gt;
Other popular tools in this category include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tableau&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvalr99hpc4ndvldzty5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvalr99hpc4ndvldzty5v.png" alt="Tableau" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
Known for its user-friendly interface and powerful visualization capabilities.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qlik&lt;/strong&gt;: &lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4hfrebh5hjhh9qg0xt7.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4hfrebh5hjhh9qg0xt7.jpeg" alt="Qlik" width="474" height="139"&gt;&lt;/a&gt;&lt;br&gt;
Offers robust data discovery and analytics features.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Statistical Software
&lt;/h3&gt;

&lt;p&gt;While I haven't used these, I'm aware of tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SPSS&lt;/strong&gt;: &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2cim7xrl6i6mt3l1arw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2cim7xrl6i6mt3l1arw.jpeg" alt="SPSS" width="474" height="474"&gt;&lt;/a&gt;&lt;br&gt;
Offers a user-friendly interface for complex statistical analyses.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SAS&lt;/strong&gt;: 
&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F108n0b9xkhdszamh99mj.jpeg" alt="SAS" width="474" height="194"&gt;
SAS is a command driven software package used for carrying out advanced statistical analysis and data visualization offering a wide variety of statistical methods and algorithms customizable options for analysis and output and publication quality graphics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  References
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.simplilearn.com/top-data-analysis-tools-article" rel="noopener noreferrer"&gt;Simplilearn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.datacamp.com/blog/what-is-data-analysis-expert-guide" rel="noopener noreferrer"&gt;DataCamp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.geeksforgeeks.org/data-analysis-tutorial/" rel="noopener noreferrer"&gt;GeeksforGeeks&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What tools or techniques did I not mention? Feel free to comment. &lt;/p&gt;

</description>
      <category>datanalysis</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
