<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Mugi  Mugendi</title>
    <description>The latest articles on Forem by Mugi  Mugendi (@mugendii_).</description>
    <link>https://forem.com/mugendii_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F700928%2F5fc59b87-47e5-443c-b358-ea6e03a60bf6.jpg</url>
      <title>Forem: Mugi  Mugendi</title>
      <link>https://forem.com/mugendii_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mugendii_"/>
    <language>en</language>
    <item>
      <title>Introduction to I2C communication module</title>
      <dc:creator>Mugi  Mugendi</dc:creator>
      <pubDate>Sun, 04 Feb 2024 15:33:32 +0000</pubDate>
      <link>https://forem.com/mugendii_/introduction-to-i2c-communication-module-50kd</link>
      <guid>https://forem.com/mugendii_/introduction-to-i2c-communication-module-50kd</guid>
      <description>&lt;p&gt;One of the widely used protocols for inter-device communication, especially in the realm of microcontrollers and integrated circuits, is the Inter-Integrated Circuit, or I2C, protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Basics
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86f6h4kl88135esg3x62.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86f6h4kl88135esg3x62.jpg" alt="Image description" width="323" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At its core, I2C is a synchronous, multi-master, multi-slave serial communication protocol. This means that multiple devices can communicate with each other over the same bus, with one or more devices acting as masters initiating communication and others as slaves responding to the master's commands. The synchronous nature of the protocol means that data is transferred based on a shared clock signal, ensuring precise timing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware Configuration
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9m3ssw2msmpvdgkaxjl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9m3ssw2msmpvdgkaxjl.png" alt="Image description" width="321" height="157"&gt;&lt;/a&gt;&lt;br&gt;
I2C communication typically involves two wires: a Serial Data Line (SDA) and a Serial Clock Line (SCL). These wires facilitate bidirectional communication between devices on the bus. Both lines are pulled up to a positive voltage level (usually 3.3V or 5V) using resistors, and devices connected to the bus are equipped with open-drain or open-collector outputs to drive the lines low.&lt;/p&gt;

&lt;h2&gt;
  
  
  Addressing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtk5mygm0c19tgnatvwd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtk5mygm0c19tgnatvwd.png" alt="Image description" width="355" height="142"&gt;&lt;/a&gt;&lt;br&gt;
Each device on the I2C bus is assigned a unique 7-bit or 10-bit address. When initiating communication, the master device sends the address of the slave it wishes to communicate with along with a read/write bit indicating the direction of data transfer. This addressing scheme allows for the connection of multiple devices without conflicts, as each device only responds to its specific address.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Transfer
&lt;/h2&gt;

&lt;p&gt;Data transfer in I2C occurs in bytes. After addressing a specific slave device, the master can send or receive data from the slave. During data transmission, the SDA line is stable when the clock signal on the SCL line is high, allowing for the data to be read or written. When the clock signal transitions from high to low, the data on the SDA line is sampled.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start and Stop Conditions
&lt;/h2&gt;

&lt;p&gt;Communication on the I2C bus begins with a start condition, where the SDA line transitions from high to low while the SCL line remains high. This indicates the start of a new data transfer sequence. Conversely, a stop condition occurs when the SDA line transitions from low to high while the SCL line remains high, signaling the end of the data transfer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Clock Speed
&lt;/h2&gt;

&lt;p&gt;The speed of communication on the I2C bus is determined by the frequency of the clock signal. Standard mode operates at a maximum speed of 100 kHz, while Fast mode extends this to 400 kHz. Additionally, there are high-speed modes such as Fast Mode Plus (Fm+) and Ultra-Fast Mode (UFm), which support speeds of up to 1 MHz and beyond.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages and Applications
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37j2wj8xgn60myt3lylj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37j2wj8xgn60myt3lylj.jpg" alt="Image description" width="220" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The I2C protocol offers several advantages, including simplicity, flexibility, and support for multi-device communication. Its ease of implementation makes it ideal for various applications, including sensor interfacing, real-time clock modules, EEPROM memory, and communication between microcontrollers and peripheral devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the ever-expanding landscape of embedded systems and IoT devices, efficient communication protocols like I2C play a crucial role in enabling seamless data exchange between components. With its simplicity, versatility, and robustness, the I2C protocol continues to be a cornerstone in modern electronics, empowering engineers to design innovative and interconnected systems with ease. Understanding its principles and intricacies is essential for anyone venturing into the realm of embedded systems and microcontroller programming.&lt;/p&gt;

</description>
      <category>iot</category>
    </item>
    <item>
      <title>Learn regression model</title>
      <dc:creator>Mugi  Mugendi</dc:creator>
      <pubDate>Sun, 25 Jun 2023 17:10:36 +0000</pubDate>
      <link>https://forem.com/mugendii_/learn-regression-model-495d</link>
      <guid>https://forem.com/mugendii_/learn-regression-model-495d</guid>
      <description>&lt;h2&gt;
  
  
  Introduction:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Torture the data, and it will confess to anything.” – Ronald Coase&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the vast field of machine learning, regression models play a vital role in understanding and predicting continuous outcomes. Regression is a supervised learning algorithm. It establishes the relationship between a dependent (target) variable and one or several independent variables. It is widely used in  finance, marketing, healthcare, etc. Usage of regression models varies according to the nature of data involved. &lt;/p&gt;

&lt;p&gt;In this article, we will explore the concept of regression machine learning models, their applications, and popular algorithms used for regression tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Regression analysis
&lt;/h2&gt;

&lt;p&gt;Regression analysis is a predictive modelling technique to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variable. It helps us understand how the dependent variable changes corresponding to the independent variables. For example, predicting checking the number of ice creams sold(target) by using the temperature (independent variable).&lt;/p&gt;

&lt;p&gt;The primary goal of regression models is to find a mathematical function that best fits the observed data points, allowing us to predict the value of the dependent variable. In Regression, the predicted output values are real numbers. It deals with problems such as predicting the price of a house or the trend in the stock price at a given time, etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of regression models
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Linear Regression&lt;/strong&gt;&lt;br&gt;
This regression technique finds out a linear relationship between a dependent variable and the other given independent variables. The below-given equation is used to denote the linear regression model:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;y=mx+c+e&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;where &lt;em&gt;m&lt;/em&gt; is the slope of the line, &lt;em&gt;c&lt;/em&gt; is an intercept, and &lt;em&gt;e&lt;/em&gt; represents the error in the model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8h4jjg6ghv7ydy5afs21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8h4jjg6ghv7ydy5afs21.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Train and evaluating linear regression
&lt;/h1&gt;

&lt;p&gt;We start by splitting the dataset into train and test&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="c1"&gt;# Split data 70%-30% into training set and test set
&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Training Set: %d rows&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Test Set: %d rows&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we then fit the model to train&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Train the model
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;

&lt;span class="c1"&gt;# Fit a linear regression model on the training set
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Predict&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_printoptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suppress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Predicted labels: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Actual labels   : &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Evaluate&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r2_score&lt;/span&gt;

&lt;span class="n"&gt;mse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MSE:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;rmse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RMSE:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rmse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;r2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;r2_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;R2:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/mugendii/Bicycle-Rentals" rel="noopener noreferrer"&gt;Here's&lt;/a&gt;, an example notebook&lt;/p&gt;

&lt;p&gt;In this &lt;a href="https://github.com/mugendii/Bicycle-Rentals" rel="noopener noreferrer"&gt;notebook&lt;/a&gt;, we'll focus on regression, using an example based on a real study in which data for a bicycle sharing scheme was collected and used to predict the number of rentals based on seasonality and weather conditions. We'll use a simplified version of the dataset from that study&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>github</category>
      <category>100daysofcode</category>
    </item>
    <item>
      <title>Exploratory Data Analysis</title>
      <dc:creator>Mugi  Mugendi</dc:creator>
      <pubDate>Sat, 04 Mar 2023 21:36:50 +0000</pubDate>
      <link>https://forem.com/mugendii_/exploratory-data-analysis-391d</link>
      <guid>https://forem.com/mugendii_/exploratory-data-analysis-391d</guid>
      <description>&lt;p&gt;Exploratory data analysis (EDA) is an essential step in the data science process. It helps to uncover patterns, trends and correlations that are not easily visible in a dataset. EDA is especially important if you are dealing with large datasets or if you need to find relationships between variables. In Python, it is possible to use the pandas library to work with data frames, create visualizations and carry out correlation tests. By leveraging data frames, we can easily explore our dataset and gain insights into how different variables interact with each other. Moreover, we can build models based on the insights from our exploratory analysis. This will help us make better predictions or decisions based on our datasets. In this guide, we will cover the essential techniques and tools for EDA in Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  STEPS IN EXPLORATORY DATA ANALYSIS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Importing and Loading Data
&lt;/h3&gt;

&lt;p&gt;The first step in any data analysis project is to import and load the data. Python has many libraries for reading data from various sources, such as CSV, Excel, SQL databases, and more. Some popular libraries for loading data include pandas, NumPy, and SciPy.&lt;/p&gt;

&lt;p&gt;For example, to load a CSV file in pandas, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'data.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the  Data.
&lt;/h3&gt;

&lt;p&gt;Understand the data: shape, rows(samples), columns(features), features’ type, null values…&lt;br&gt;
Get introductory details about data: check few introductory details like number of columns, number of rows, type of features, and data types of column entries…&lt;/p&gt;

&lt;p&gt;Get statistical insight of data: get details about various statistical data like count, mean, standard deviation, min value, median, max value&lt;br&gt;
Here are some of the methods used&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="c1"&gt;#view the first few rows
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="c1"&gt;# view the last few rows
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="c1"&gt;#Gives summary of the data
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="c1"&gt;# Prints the shape of dataset
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="c1"&gt;#gives the column names
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nunique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# gives sum of unique values in each column
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="c1"&gt;# counts the Null values
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cleaning and Preprocessing Data
&lt;/h3&gt;

&lt;p&gt;Clean the data from redundancies: such as irregularity in the data, uninformative features, and noisy outliers. This involves removing missing values, handling outliers, scaling the data, and more. Pandas provides many methods for cleaning and preprocessing data, such as dropna(), fillna(), replace(), apply(), and more.&lt;/p&gt;

&lt;p&gt;For example, to remove missing values from a DataFrame, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isNull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt; &lt;span class="c1"&gt;# give the number of missing values for each 
&lt;/span&gt;&lt;span class="n"&gt;variable&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="c1"&gt;# remove NULL entries if it exists
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="c1"&gt;# fill in NULL entries with mean/median or any integer
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duplicated&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="c1"&gt;# return total number of duplicate entries
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop_duplicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="c1"&gt;# remove duplicates
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Visualizing Data
&lt;/h3&gt;

&lt;p&gt;Visualization is a crucial part of EDA, as it allows us to see patterns and relationships that might not be apparent from numerical summaries alone.It helps us convert raw data into a visual form such as a graph.&lt;br&gt;
Visualization makes data easier for us to understand and extract useful insights.&lt;br&gt;
Python has many libraries for data visualization, such as Matplotlib, Seaborn, Plotly, and more.&lt;/p&gt;

&lt;p&gt;For example, to create a scatter plot using Matplotlib, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'x'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'y'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://matplotlib.org/stable/tutorials/index.html"&gt;Here,s&lt;/a&gt; an introductory tutorial on Matplotlib&lt;br&gt;
&lt;a href="https://seaborn.pydata.org/tutorial.html"&gt;Here's&lt;/a&gt; one on seaborn&lt;/p&gt;
&lt;h3&gt;
  
  
  Exploring Relationships
&lt;/h3&gt;

&lt;p&gt;Once we have summarized and visualized the data, the next step is to explore relationships between variables. This involves calculating correlations, creating heatmaps, and more. Pandas provides many methods for exploring relationships, such as corr(), pivot_table(), and more.&lt;/p&gt;

&lt;p&gt;For example, to calculate the correlation matrix for a DataFrame, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this guide, we have covered the essential techniques and tools for exploratory data analysis in Python. By using these techniques, you can gain valuable insights from your data and improve the performance of your models. Remember that EDA is an iterative process, and you should always be exploring and testing new ideas.&lt;br&gt;
Below is the link to my github with An example of EDA in python &lt;br&gt;
&lt;a href="https://github.com/mugendii/Logistic-regression"&gt;GITHUB&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>database</category>
      <category>r</category>
    </item>
    <item>
      <title>Introduction to SQL for Data Analysis</title>
      <dc:creator>Mugi  Mugendi</dc:creator>
      <pubDate>Sun, 19 Feb 2023 09:48:49 +0000</pubDate>
      <link>https://forem.com/mugendii_/introduction-to-sql-for-data-analysis-52pg</link>
      <guid>https://forem.com/mugendii_/introduction-to-sql-for-data-analysis-52pg</guid>
      <description>&lt;h1&gt;
  
  
  Standard Query Language
&lt;/h1&gt;

&lt;p&gt;Structured Query Language (SQL) is a standard language used to manage relational databases. It is used to create, modify, and query databases by managing the data stored in the tables. SQL is used in a variety of settings, from small businesses to large corporations, and it is essential for anyone who works with data to have a basic understanding of SQL.&lt;/p&gt;

&lt;p&gt;This article will provide an introduction to SQL, covering its history, syntax, basic concepts, and some common commands. By the end of this article, readers will have a basic understanding of SQL and be able to start using it to manage data.&lt;/p&gt;

&lt;h2&gt;
  
  
  History of SQL
&lt;/h2&gt;

&lt;p&gt;SQL was first introduced in the 1970s by IBM researchers Donald Chamberlin and Raymond Boyce. At the time, it was called SEQUEL, which stood for Structured English Query Language. The name was later changed to SQL to avoid trademark issues.&lt;/p&gt;

&lt;p&gt;In the 1980s, SQL became the standard language for managing relational databases, and it was adopted by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO).&lt;/p&gt;

&lt;p&gt;Today, SQL is widely used in the tech industry and is an essential skill for anyone who works with data. It is used in a variety of settings, from small businesses to large corporations, and it is essential for anyone who works with data to have a basic understanding of SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database
&lt;/h2&gt;

&lt;p&gt;A database is a collection of related information&lt;br&gt;
it keeps track of products, and enhances the security of information&lt;/p&gt;
&lt;h2&gt;
  
  
  Database Management System (DBMS)
&lt;/h2&gt;

&lt;p&gt;It is a special software program that helps users create and maintain a database&lt;br&gt;
it manages large amounts of information, Handles Security, Backups , Import and Export of Data&lt;/p&gt;
&lt;h3&gt;
  
  
  Types of Database
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Relational Database&lt;br&gt;
Organizes data into one or more tables&lt;br&gt;
Each table has columns and rows and a unique key identifies each row&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Non-Relational Database(no SQL)&lt;br&gt;
Include documents such as .json, .xml files &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Types of Database management systems
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Relational Database Management systems(RDBMS).They help users create and maintain Relational DB. They include: &lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;mySQL&lt;/li&gt;
&lt;li&gt;Oracle&lt;/li&gt;
&lt;li&gt;Postgre SQL&lt;/li&gt;
&lt;li&gt;MariaDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Non-RDBMS&lt;/p&gt;

&lt;p&gt;They help create and maintain a non-relational database management system. They include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; MongoDB&lt;/li&gt;
&lt;li&gt;DynamoDB&lt;/li&gt;
&lt;li&gt;Apache&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  SQL Types
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Data Query Language

&lt;ul&gt;
&lt;li&gt;used to query database for information&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data definition Language
-defines database schemas&lt;/li&gt;
&lt;li&gt;Data Control Language

&lt;ul&gt;
&lt;li&gt;controls access to data in the Database&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Manipulation Language
-used for inserting ,updating and Deleting&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  SQL syntax
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Some of The Most Important SQL Commands
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SELECT - extracts data from a database&lt;/li&gt;
&lt;li&gt;UPDATE - updates data in a database&lt;/li&gt;
&lt;li&gt;DELETE - deletes data from a database&lt;/li&gt;
&lt;li&gt;INSERT INTO - inserts new data into a database&lt;/li&gt;
&lt;li&gt;CREATE DATABASE - creates a new database&lt;/li&gt;
&lt;li&gt;ALTER DATABASE - modifies a database&lt;/li&gt;
&lt;li&gt;CREATE TABLE - creates a new table&lt;/li&gt;
&lt;li&gt;ALTER TABLE - modifies a table&lt;/li&gt;
&lt;li&gt;DROP TABLE - deletes a table&lt;/li&gt;
&lt;li&gt;CREATE INDEX - creates an index (search key)&lt;/li&gt;
&lt;li&gt;DROP INDEX - deletes an index&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SQL is is used to perform C.R.U.D. operations &lt;br&gt;
C - create&lt;br&gt;
R - Read/retrieve&lt;br&gt;
U - update&lt;br&gt;
D - Delete&lt;/p&gt;
&lt;h3&gt;
  
  
  CREATE
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,

);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  READ/ RETREIVE
&lt;/h3&gt;

&lt;p&gt;we use the select statement&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column1, column2, ...
FROM table_name
WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  UPDATE
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DELETE
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM table_name WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SQL JOINS
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h1a871jpjd57brnvlfw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h1a871jpjd57brnvlfw.png" alt=" " width="268" height="188"&gt;&lt;/a&gt;&lt;br&gt;
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.&lt;/p&gt;
&lt;h3&gt;
  
  
  Different Types of SQL JOINs
&lt;/h3&gt;

&lt;p&gt;Here are the different types of the JOINs in SQL:&lt;/p&gt;

&lt;p&gt;(INNER) JOIN: Returns records that have matching values in both tables&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>nuxt</category>
      <category>vercel</category>
      <category>vue</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
