Forem: Ann Jeffa

Bayesians and Frequentists

Ann Jeffa — Fri, 17 Oct 2025 13:34:09 +0000

As a statistician ,data analyst or scientist, how do you define your probability. That's the ultimate difference between the Bayesians and Frequentists.

Frequentist

Frequentists define probability with the concept of frequencies. To a Frequentist, saying "there's a 30% chance of rain tomorrow" means that in 100 days with identical atmospheric conditions, it would rain on approximately 30 days of them. Probability uses the idea property of repetition on random processes. This way only works correctly for coin flips and dice rolls but becomes strained when dealing with one time events. Frequentist methods rely entirely on the data collected, without using any prior information. Common examples include t-tests, ANOVA, and confidence intervals.

Bayesian

The Bayesian approach treats probability as a degree of belief or confidence in an event. It considers parameters as random variables that can change as new data becomes available. Bayesian methods start with a prior belief about a parameter, then update this belief using observed data to produce a posterior probability. This process, based on Bayes’ theorem, allows continuous learning as new information is introduced.

Real World Example

Imagine a doctor testing the effectiveness of a new drug.
A Frequentist would collect data from a large number of patients and test whether the difference in recovery rates between the drug and placebo is statistically significant (example, p-value < 0.05).
A Bayesian would start with prior knowledge about similar drugs, then update the probability that the drug is effective as more patient data becomes available.

Summary

In summary, the Frequentist approach focuses on objective, long-term patterns that emerge from repeated experiments, while the Bayesian approach centers on updating beliefs as new evidence becomes available. Each method has its own strengths . Frequentist techniques are straightforward and widely used, whereas Bayesian methods are more flexible and allow for continuous learning. Ultimately, the best approach depends on the nature of the data, how much prior information is available, and the specific goals of the analysis.

Parametric and Non-Parametric Tests

Ann Jeffa — Fri, 17 Oct 2025 11:34:25 +0000

Introduction

In Statistics, different tools , methods and tests are used differently with an end purpopse to derive meaningful insights from data . There are several methods used to determine the choice of these tools and methods.Type of data is the major factor in choosing the test conducted. The tests are divided either as parametric or non-parametric tests. Let's differentiate them.

Parametric Tests

Parametric tests are statistical methods that work under the assumption that data follows a specific distribution, most often a normal distribution. They rely on numerical parameters like the mean, standard deviation, and variance to describe and analyze the data. Because they make these assumptions, parametric tests are generally more powerful and accurate when the data fits the expected conditions, allowing researchers to detect even small differences or relationships within their datasets..

Examples:t_test,ANOVA,Pearsons Correlation,Linear Regression.

Non-Parametric Tests

Non-parametric tests are often called distribution-free tests because they do not require the data to follow any particular distribution. These tests are used when data is ordinal, categorical, or not normally distributed. Instead of using numerical values directly, non-parametric tests often rank the data or use medians, making them less affected by extreme values or outliers.Non-parametric tests are especially useful for small sample sizes, non-linear relationships, or when the assumptions of parametric tests cannot be met. Although they are less powerful when compared to parametric tests, they offer flexibility and robustness, ensuring valid conclusions even under irregular data conditions.

Examples:Chi-square test,Spearman's correlation.

Differences

The main difference between parametric and non-parametric tests lies in the assumptions about the data. Parametric tests assume normal distribution, equal variances, and continuous data measured on an interval or ratio scale. Non-parametric tests, however, do not rely on these assumptions and are suitable for ranked or categorical data.

When deciding which test to use, researchers should first check the normality of the data. If the data is normally distributed and sample size is large, a parametric test is appropriate. However, if the data is skewed, ordinal, or has outliers, a non-parametric test will produce more reliable results.

Conclusion

Both parametric and non-parametric tests are essential tools in statistical analysis. Parametric tests are preferred when the data meets their assumptions because they are more efficient and provide stronger conclusions. Non-parametric tests, on the other hand, offer flexibility and reliability when data does not meet the strict assumptions of normality or when dealing with ranked or categorical data.

Degrees Of Freedom

Ann Jeffa — Fri, 03 Oct 2025 15:24:57 +0000

Defination

Degrees of freedom are the number of independent values that a statistical analysis can estimate. You can also think of it as the number of values that are free to vary as you estimate parameters.

Scenario

Your hockey team has 20 players registered for a weekend tournament. During each match, only 11 players can be on the pitch, while the rest remain as substitutes. A day before the tournament, your best defender and the only goalkeeper inform the coach that they cannot make it. Since the team has no backup goalkeeper, the coach insists that the goalkeeper must attend. Fortunately, there are three defenders available, so the coach replaces the absent defender with another.

In your team, 11 players must be on the pitch. That’s a fixed requirement.
If the goalkeeper has no substitute, the coach has no freedom in choosing that position, it’s fixed (0 degrees of freedom for the goalkeeper).
For defenders, there are 3 options to fill 1 spot (since one defender is absent but you still have choices). That means the coach has freedom to decide which defender to bring in (1 degree of freedom for that position).
Similarly, for the other positions, depending on how many substitutes are available, the coach has more freedom of choice.

How to calculate Degrees Of Freedom in Statistics

The total degrees of freedom is one less than the total sample size

DoF=n-1
where (n) is the sample size.

That's the general formula .

Importance of Degrees of Freedom

Degrees of freedom matter because they affect the shape of probability distributions used in tests. For instance:

The t-distribution changes depending on the degree of freedom, with larger degree of freedom, it approaches the normal distribution.
In chi-square tests, they determine the critical values needed to decide whether to reject a hypothesis
In regression analysis, they help calculate residual variance and test model fit.

In short, degrees of freedom ensure that statistical tests adjust for the amount of information and constraints in your data.

Conclusion

Understanding degrees of freedom helps us interpret statistical tests more accurately, ensuring that our conclusions are based on the right amount of free information.

List Comprehension vs Dictionary Comprehension in Python

Ann Jeffa — Fri, 26 Sep 2025 20:35:20 +0000

Introduction

In Python, writing clean and efficient code is highly valued. Instead of using long loops, we often rely on comprehensions ,short, elegant ways to build new data structures. The two most common ones are list comprehension and dictionary comprehension. They look similar at first glance, but they serve different purposes.

List Comprehension

A list comprehension is a way to create a list by applying an expression to each item in an iterable.
Python loops over the required range returning a new list. List comprehension produces a list of values.
Example

squares = [i**2 for i in range(1, 6)]
print(squares)  
[1, 4, 9, 16, 25]

Dictionary Comprehension

A dictionary comprehension is similar, but instead of producing a list, it produces a dictionary with key–value pairs. Dictionary comprehension produces a mapping of keys to values.
Example

squares_dict = {i: i**2 for i in range(1, 6)}
print(squares_dict)
 {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

Key Difference
In list Comprehension the results are ordered values in a list.
Whereas in dictionary Comprehension the results are key value pairs.

Combining Two Lists into a Dictionary with Comprehension
Imagine you have two lists:
Method 1: Using zip()
zip() pairs items from both lists together, and the dictionary comprehension builds key–value pairs.

names = ["Alice", "Bob", "Charlie"]
scores = [85, 90, 95]
result = {zip(names, scores)}
print(result)  
{'Alice': 85, 'Bob': 90, 'Charlie': 95}

Method 2: Using Indexing

Here, the comprehension loops over the index and maps the matching items.

names = ["Alice", "Bob", "Charlie"]
scores = [85, 90, 95]
result = {names[i]: scores[i] for i in range(len(names))}
print(result)
{'Alice': 85, 'Bob': 90, 'Charlie': 95}

Conclusion

List comprehension -creates a list of values.
Dictionary comprehension -creates key–value pairs in a dictionary.
We can combine two lists into a dictionary by using zip() or index-based dictionary comprehension.
Comprehensions make Python code shorter, faster, and more readable.

Skewness and Kurtosis

Ann Jeffa — Fri, 26 Sep 2025 19:40:16 +0000

When analyzing data in statistics or data science, it is not enough to only look at measures of central tendency like the mean, median, or mode and variability (like variance or standard deviation). To fully understand the shape of a dataset’s distribution, we use skewness and kurtosis. These two measures describe how data deviates from a perfectly normal (bell-shaped) distribution.

Skewness

It measures the asymmetry of a distribution. A perfectly normal distribution has skewness equal to 0, meaning it is symmetric around the mean.

Types of Skewness:

Positive Skew (Right-skewed):
The right tail is longer than the left.
Most data values are concentrated on the left, but a few very large values pull the mean to the right.
Negative Skew (Left-skewed):
The left tail is longer than the right.
Most data values are concentrated on the right, but a few very small values pull the mean to the left.
Zero Skew:
The distribution is symmetric.
Mean = Median = Mode.

Kurtosis

Kurtosis measures the “tailedness” of a distribution, or how extreme the outliers are compared to a normal distribution.
Types of Kurtosis:

Leptokurtic Heavy tails and a sharp peak. More extreme outliers than a normal distribution.
Platykurtic Light tails and a flatter peak. Fewer outliers than a normal distribution.
Mesokurtic Normal bell-shaped curve. Moderate tails and peak.

Importance of skewness and kurtosis in data science.

Data Analysis: They reveal whether data follows assumptions of normality.
Risk Management In finance, skewness and kurtosis help in understanding market risks — highly skewed or leptokurtic data indicates greater uncertainty.
Decision-Making: They help analysts avoid misleading conclusions that come from looking at mean and standard deviation alone.

Conclusion
Skewness tells us about the symmetry of data.
Kurtosis tells us about the outliers and tail heaviness.
Together, they provide a deeper picture of data distribution beyond averages and variability, helping statisticians, data scientists, and decision-makers draw more accurate insights.

Similarities Between Stored Procedures and Python Functions

Ann Jeffa — Fri, 26 Sep 2025 18:24:29 +0000

Introduction

In the field of data science, python and sql are both tools used in day-to-day work . A Stored Procedure is collection of SQL statements that are supposed to be executed at the same time while a Python function is a block of reusable code that performs a specific task and runs only when called . They help make code more organized, reusable, and easier to maintain.

Although they exist in different environments—stored procedures inside a database system, and Python functions inside application code—they share a number of similarities in concept and purpose.

1. Groups details in a single unit

Both stored procedures and Python functions provide a way to group a set of instructions into a single named block.
A stored procedure contains several SQL statements to retrieve, insert, or update data.
A Python function is like a container that holds lines of codes.
Examples

CREATE PROCEDURE Get_Employees
AS
BEGIN
    SELECT employee_id, first_name, last_name, department
    FROM employees;
END;

-- Call the procedure
CALL EmployeesByDepartment('Sales');


def add_numbers(a, b):
    return a + b
result = add_numbers(5, 3)
print("The sum is:", result)

2.Reusability
Reusability is one of the strongest similarities between the two. Instead of writing the same block of code multiple times, you can define the logic once and call it whenever needed. This reduces duplication and improves efficiency.

3.Input Parameters
Both stored procedures and Python functions can accept input parameters that make them dynamic.

4.Returning Results
Stored procedures can return data set that is query results.
Python functions return values using the return statement.
This makes them useful for producing outputs based on given inputs.

5.Control Flow Capabilities
Both support control flow logic, such as conditionals and loops.
Stored procedures can use IF…ELSE, CASE, or looping constructs like WHILE.
Python functions can use if, for, and while loops.
This allows developers to write logic that adapts to different conditions.

6.Modularity and Maintainability
By breaking down tasks into smaller reusable blocks, both stored procedures and Python functions promote modularity. This makes applications easier to read, maintain, and debug.

7.Execution method.
Both are executed by calling their name with the required parameters.

Conclusion

Stored procedures and Python functions may belong to different environments, but their similarities lie in structure, purpose, and usage. Both serve to encapsulate logic, promote reusability, accept parameters, return results, support control flow, and enhance maintainability.

In essence, whether you are working inside a database or writing application code, the concepts behind stored procedures and functions remain closely related.

CROPS ANALYSIS DASHBOARD ON POWER BI

Ann Jeffa — Wed, 27 Aug 2025 21:07:44 +0000

POWER BI ON AGRICULTURE

In today’s world, most economic activities are data-driven, including
Agriculture. Agriculture is no longer just about intuition and experience. Farmers and agribusinesses are increasingly relying on data analytics to optimize crop yields, reduce costs, and improve sustainability. Microsoft Power BI—a business intelligence platform that turns raw data into interactive, insightful dashboards.

When applied to the crops sector, Power BI enables stakeholders to monitor everything from production costs to profits, weather impacts, and market demand. Stakeholders are also able to make meaningful and informed decisions from the insights of the data. Below, we explore how Power BI can be used in agriculture, focusing on crops, and cover all aspects of the process.

DATA CLEANING AND TRANSFORMATION.

Data Cleaning in Power BI is a critical and a must step in ensuring reliable crop analysis because data often comes in bias or inconsistent formats. Using Power Query remove duplicates, fill or replace missing values, and standardize measurements.
In this case, are the steps I took to clean the crops dataset

Changing the formats for the different columns to their required format ,that is ensuring the right data type.
Filled the blanks.
Corrected errors

KEY METRICS

The KPIs show important information from the data.
The Key Performance Indicators I conducted on the dataset include:

Yield in kilograms
Revenue and Profit by Crop Type
Cost of Production

Tracking these indicators helps both smallholder farmers and large agribusinesses make data-backed decisions.

VISUALIZATIONS

Data visualizations enables one to tell a story about the data.

DATA INSIGHTS FROM THE DATA

Key Perfomance
• Total Revenue: KES 1.19biliion
• Total Profit: KES 1.10bn
• Average Revenue: KES 121.67million
• Average Profit: KES 197.13k
Crop Perfomance

Top Performing Crop is rice. It generated KES 133.01M in revenue with KES 114.94M profit.
Rice did produce more revenue during the rainy season and poorly during the dry season.
Sorghum made KES 121.12M revenue with KES 119.18M profit .
Tomatoes was the least revenue generating crop .Its total revenue was 58.11M and profit of 51.99M.

County Perfomance

Nyeri County was the most productive county 162.05M as total revenue and 160.09M as profit even with it not producing the highest yield.
Nakuru County was the second best performing with a total revenue of132.89M and a total profit of 98.43M.
Nairobi produced the highest yield of crops at 142k KG and Kericho the least of 92k KG of total crops yield

More Insights
• Farmers who used the DAP fertilizer and harvested the most yields while those who used CAN got least yields
• Crops seem to perform better on clay soil and produce more yields than the rest but loam soil produces the least yields.
• The organic crop did best while hybrid did poorly .
• The average crop profit was 197.53M

CONCLUSION

Power BI has the potential to revolutionize crop management by turning raw agricultural data into actionable insights. From individual farmers to national policymakers, the tool empowers all players in the agricultural ecosystem to make data-driven, sustainable, and profitable decisions.

EXCEL’S STRENGTHS AND WEAKNESSES IN PREDICTIVE ANALYSIS AND THE ROLE OF EXCEL IN MAKING DATA DRIVEN BUSINESS DESICIONS

Ann Jeffa — Mon, 11 Aug 2025 13:13:07 +0000

INTRODUCTION

Excel offers valuable tools for predictive analysis and plays a significant role in data-driven business decisions, but has also has limitations. Its advantages include easy to use, widely spread & known and the ability to handle data manipulation tasks and wide range of calculations. Otherwise, excel is not able to support complex data models, real-time data, large datasets and collaboration making it less suitable for large scale and advanced predictive analysis as users are limited only to specific conditions.

EXCELS STRENGTHS

Familiarity and Easy to use.
Excel is widely spread and adopted in various industries for its strengths in project management. Its easy-to use interface and familiar functionalities makes it an accessible tool for professionals with low-level skills and hence less time in training but more on execution.
Data Summarization and Visualizations.
Excel is able to calculate descriptive statistics, create charts and graphs, build interactive dashboards and also use of conditional formatting helps in understanding data patterns and trends.
Advanced Data Analysis Capabilities.
Its ability to use pivot tables and complex statistical functions enables one to analyse deeply.
Flexibility and Customization.
Project managers are able to create and setup their projects to their liking without conforming to a specific style.
Integration with other Microsoft Tools.
Excels seamless integration with other Microsoft Office Applications like Word, PowerPoint enhances its utility in project management.

EXCEL'S WEAKNESSES

Large Datasets.
Excel's performance reduces when handling very large datasets or complex calculations therefore wasting time.
Difficulty with Advanced Statistical Modelling.
Its functions may not be sufficient for sophisticated predictive modelling techniques, such as regression
Collaboration.
Managing and collaborating on large, complex spreadsheets can be challenging and prone to errors.
Connections.
Excel does not support other applications apart from Microsoft Applications.

ROLE OF EXCEL IN DATA-DRIVEN BUSINESS DECISIONS

Informed Decision Making.
It provides tools to analyse data, identify trends and generate insights enabling business managers to make more informed decisions.
Financial Analysis.
Excel is used for budgeting, financial modelling and forecasting. This helps plan for future of a business.
Performance Tracking.
Excels helps track key performance indicators (KPIs), analyse trends and identify areas of improvement.
Risk Assessment.
The outcome after analysis can be used to assess potential risks and opportunities.

CONCLUSION

Excel is a powerful tool in data manipulation, visualization and predictive analysis making it suitable for wide range of business applications. However, its limitations in handling complex models, collaboration and connection with other applications is a setback.

POSTGRESQL INSTALLATION ON LINUX SERVER

Ann Jeffa — Sun, 03 Aug 2025 16:14:43 +0000

PostgreSQL is a powerful, open-source object-relational database system known for its robustness, extensibility, and standards compliance.
For one to install postgreSQL on linux ,one must

You must have a Linux server (like Ubuntu).
You must have admin rights (called sudo).
You must know how to use the terminal/command line.

INSTALLATION PROCESS

One must update the system first using commands on the terminal

sudo apt update
sudo apt upgrade -y

Installing postgres, main database and its tools are installed.
sudo apt install postgresql postgresql-contrib -y
Checking if postgres is running after installation
sudo systemctl status postgresql
Ensure that postgress is running and if not start postgress
sudo systemctl start postgresql
After installing, postgress creates a user called postgress and we should switch to that user.

sudo -i -u postgres

5.Getting into the database
psql
You should set a password for postgress while still in the database
ALTER USER postgres PASSWORD 'mypassword';

Your postgress has been installed on the linux server. If you want you can create your own database.