Forem: Mugi Mugendi

Introduction to I2C communication module

Mugi Mugendi — Sun, 04 Feb 2024 15:33:32 +0000

One of the widely used protocols for inter-device communication, especially in the realm of microcontrollers and integrated circuits, is the Inter-Integrated Circuit, or I2C, protocol.

Understanding the Basics

At its core, I2C is a synchronous, multi-master, multi-slave serial communication protocol. This means that multiple devices can communicate with each other over the same bus, with one or more devices acting as masters initiating communication and others as slaves responding to the master's commands. The synchronous nature of the protocol means that data is transferred based on a shared clock signal, ensuring precise timing.

Hardware Configuration

I2C communication typically involves two wires: a Serial Data Line (SDA) and a Serial Clock Line (SCL). These wires facilitate bidirectional communication between devices on the bus. Both lines are pulled up to a positive voltage level (usually 3.3V or 5V) using resistors, and devices connected to the bus are equipped with open-drain or open-collector outputs to drive the lines low.

Addressing

Each device on the I2C bus is assigned a unique 7-bit or 10-bit address. When initiating communication, the master device sends the address of the slave it wishes to communicate with along with a read/write bit indicating the direction of data transfer. This addressing scheme allows for the connection of multiple devices without conflicts, as each device only responds to its specific address.

Data Transfer

Data transfer in I2C occurs in bytes. After addressing a specific slave device, the master can send or receive data from the slave. During data transmission, the SDA line is stable when the clock signal on the SCL line is high, allowing for the data to be read or written. When the clock signal transitions from high to low, the data on the SDA line is sampled.

Start and Stop Conditions

Communication on the I2C bus begins with a start condition, where the SDA line transitions from high to low while the SCL line remains high. This indicates the start of a new data transfer sequence. Conversely, a stop condition occurs when the SDA line transitions from low to high while the SCL line remains high, signaling the end of the data transfer.

Clock Speed

The speed of communication on the I2C bus is determined by the frequency of the clock signal. Standard mode operates at a maximum speed of 100 kHz, while Fast mode extends this to 400 kHz. Additionally, there are high-speed modes such as Fast Mode Plus (Fm+) and Ultra-Fast Mode (UFm), which support speeds of up to 1 MHz and beyond.

Advantages and Applications

The I2C protocol offers several advantages, including simplicity, flexibility, and support for multi-device communication. Its ease of implementation makes it ideal for various applications, including sensor interfacing, real-time clock modules, EEPROM memory, and communication between microcontrollers and peripheral devices.

Conclusion

In the ever-expanding landscape of embedded systems and IoT devices, efficient communication protocols like I2C play a crucial role in enabling seamless data exchange between components. With its simplicity, versatility, and robustness, the I2C protocol continues to be a cornerstone in modern electronics, empowering engineers to design innovative and interconnected systems with ease. Understanding its principles and intricacies is essential for anyone venturing into the realm of embedded systems and microcontroller programming.

Learn regression model

Mugi Mugendi — Sun, 25 Jun 2023 17:10:36 +0000

Introduction:

"Torture the data, and it will confess to anything.” – Ronald Coase

In the vast field of machine learning, regression models play a vital role in understanding and predicting continuous outcomes. Regression is a supervised learning algorithm. It establishes the relationship between a dependent (target) variable and one or several independent variables. It is widely used in finance, marketing, healthcare, etc. Usage of regression models varies according to the nature of data involved.

In this article, we will explore the concept of regression machine learning models, their applications, and popular algorithms used for regression tasks.

Regression analysis

Regression analysis is a predictive modelling technique to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variable. It helps us understand how the dependent variable changes corresponding to the independent variables. For example, predicting checking the number of ice creams sold(target) by using the temperature (independent variable).

The primary goal of regression models is to find a mathematical function that best fits the observed data points, allowing us to predict the value of the dependent variable. In Regression, the predicted output values are real numbers. It deals with problems such as predicting the price of a house or the trend in the stock price at a given time, etc.

Types of regression models

Linear Regression
This regression technique finds out a linear relationship between a dependent variable and the other given independent variables. The below-given equation is used to denote the linear regression model:

y=mx+c+e

where m is the slope of the line, c is an intercept, and e represents the error in the model.

Train and evaluating linear regression

We start by splitting the dataset into train and test

from sklearn.model_selection import train_test_split

# Split data 70%-30% into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

print ('Training Set: %d rows\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0]))

we then fit the model to train

# Train the model
from sklearn.linear_model import LinearRegression

# Fit a linear regression model on the training set
model = LinearRegression().fit(X_train, y_train)
print (model)

Predict

predictions = model.predict(X_test)
np.set_printoptions(suppress=True)
print('Predicted labels: ', np.round(predictions)[:10])
print('Actual labels   : ' ,y_test[:10])

Evaluate

from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_test, predictions)
print("MSE:", mse)

rmse = np.sqrt(mse)
print("RMSE:", rmse)

r2 = r2_score(y_test, predictions)
print("R2:", r2)

Here's, an example notebook

In this notebook, we'll focus on regression, using an example based on a real study in which data for a bicycle sharing scheme was collected and used to predict the number of rentals based on seasonality and weather conditions. We'll use a simplified version of the dataset from that study

Exploratory Data Analysis

Mugi Mugendi — Sat, 04 Mar 2023 21:36:50 +0000

Exploratory data analysis (EDA) is an essential step in the data science process. It helps to uncover patterns, trends and correlations that are not easily visible in a dataset. EDA is especially important if you are dealing with large datasets or if you need to find relationships between variables. In Python, it is possible to use the pandas library to work with data frames, create visualizations and carry out correlation tests. By leveraging data frames, we can easily explore our dataset and gain insights into how different variables interact with each other. Moreover, we can build models based on the insights from our exploratory analysis. This will help us make better predictions or decisions based on our datasets. In this guide, we will cover the essential techniques and tools for EDA in Python.

STEPS IN EXPLORATORY DATA ANALYSIS

Importing and Loading Data

The first step in any data analysis project is to import and load the data. Python has many libraries for reading data from various sources, such as CSV, Excel, SQL databases, and more. Some popular libraries for loading data include pandas, NumPy, and SciPy.

For example, to load a CSV file in pandas, you can use the following code:

import pandas as pd
df = pd.read_csv('data.csv')

Understanding the Data.

Understand the data: shape, rows(samples), columns(features), features’ type, null values…
Get introductory details about data: check few introductory details like number of columns, number of rows, type of features, and data types of column entries…

Get statistical insight of data: get details about various statistical data like count, mean, standard deviation, min value, median, max value
Here are some of the methods used

data.head()#view the first few rows
data.tail()# view the last few rows
data.describe()#Gives summary of the data
data.shape# Prints the shape of dataset
data.columns#gives the column names
data.nunique() data.feature.unique()
# gives sum of unique values in each column
data.isnull().sum()# counts the Null values

Cleaning and Preprocessing Data

Clean the data from redundancies: such as irregularity in the data, uninformative features, and noisy outliers. This involves removing missing values, handling outliers, scaling the data, and more. Pandas provides many methods for cleaning and preprocessing data, such as dropna(), fillna(), replace(), apply(), and more.

For example, to remove missing values from a DataFrame, you can use the following code:

df.dropna(inplace=True)

data.isNull().sum # give the number of missing values for each 
variable
data.dropna(axis=0, inplace=True)# remove NULL entries if it exists
data[“column”].fillna(value=data[“column”].mean(), inplace = True)# fill in NULL entries with mean/median or any integer
data.duplicated().sum()# return total number of duplicate entries
data.drop_duplicates(inplace=True)# remove duplicates

Visualizing Data

Visualization is a crucial part of EDA, as it allows us to see patterns and relationships that might not be apparent from numerical summaries alone.It helps us convert raw data into a visual form such as a graph.
Visualization makes data easier for us to understand and extract useful insights.
Python has many libraries for data visualization, such as Matplotlib, Seaborn, Plotly, and more.

For example, to create a scatter plot using Matplotlib, you can use the following code:


import matplotlib.pyplot as plt
plt.scatter(df['x'], df['y'])
plt.show()

Here,s an introductory tutorial on Matplotlib
Here's one on seaborn

Exploring Relationships

Once we have summarized and visualized the data, the next step is to explore relationships between variables. This involves calculating correlations, creating heatmaps, and more. Pandas provides many methods for exploring relationships, such as corr(), pivot_table(), and more.

For example, to calculate the correlation matrix for a DataFrame, you can use the following code:


print(df.corr())

In this guide, we have covered the essential techniques and tools for exploratory data analysis in Python. By using these techniques, you can gain valuable insights from your data and improve the performance of your models. Remember that EDA is an iterative process, and you should always be exploring and testing new ideas.
Below is the link to my github with An example of EDA in python
GITHUB

Introduction to SQL for Data Analysis

Mugi Mugendi — Sun, 19 Feb 2023 09:48:49 +0000

Standard Query Language

Structured Query Language (SQL) is a standard language used to manage relational databases. It is used to create, modify, and query databases by managing the data stored in the tables. SQL is used in a variety of settings, from small businesses to large corporations, and it is essential for anyone who works with data to have a basic understanding of SQL.

This article will provide an introduction to SQL, covering its history, syntax, basic concepts, and some common commands. By the end of this article, readers will have a basic understanding of SQL and be able to start using it to manage data.

History of SQL

SQL was first introduced in the 1970s by IBM researchers Donald Chamberlin and Raymond Boyce. At the time, it was called SEQUEL, which stood for Structured English Query Language. The name was later changed to SQL to avoid trademark issues.

In the 1980s, SQL became the standard language for managing relational databases, and it was adopted by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO).

Today, SQL is widely used in the tech industry and is an essential skill for anyone who works with data. It is used in a variety of settings, from small businesses to large corporations, and it is essential for anyone who works with data to have a basic understanding of SQL.

Database

A database is a collection of related information
it keeps track of products, and enhances the security of information

Database Management System (DBMS)

It is a special software program that helps users create and maintain a database
it manages large amounts of information, Handles Security, Backups , Import and Export of Data

Types of Database

Relational Database
Organizes data into one or more tables
Each table has columns and rows and a unique key identifies each row
Non-Relational Database(no SQL)
Include documents such as .json, .xml files

Types of Database management systems

Relational Database Management systems(RDBMS).They help users create and maintain Relational DB. They include:

mySQL
Oracle
Postgre SQL
MariaDB

Non-RDBMS

They help create and maintain a non-relational database management system. They include:

MongoDB
DynamoDB
Apache

SQL Types

Data Query Language
- used to query database for information
Data definition Language -defines database schemas
Data Control Language
- controls access to data in the Database
Data Manipulation Language -used for inserting ,updating and Deleting

SQL syntax

Some of The Most Important SQL Commands

SELECT - extracts data from a database
UPDATE - updates data in a database
DELETE - deletes data from a database
INSERT INTO - inserts new data into a database
CREATE DATABASE - creates a new database
ALTER DATABASE - modifies a database
CREATE TABLE - creates a new table
ALTER TABLE - modifies a table
DROP TABLE - deletes a table
CREATE INDEX - creates an index (search key)
DROP INDEX - deletes an index

SQL is is used to perform C.R.U.D. operations
C - create
R - Read/retrieve
U - update
D - Delete

CREATE

CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,

);

READ/ RETREIVE

we use the select statement

SELECT column1, column2, ...
FROM table_name
WHERE condition;

UPDATE

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

DELETE

DELETE FROM table_name WHERE condition;

SQL JOINS

A JOIN clause is used to combine rows from two or more tables, based on a related column between them.

Different Types of SQL JOINs

Here are the different types of the JOINs in SQL:

(INNER) JOIN: Returns records that have matching values in both tables

SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table

SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table


SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table

SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;