Forem: Sylvester Promise

Day 39 of improving my Data Science skills

Sylvester Promise — Sat, 03 Jan 2026 00:11:25 +0000

A silent struggle in data work that frustrates a lot of people (and no one talks about it): "I'm learning a lot… but everything feels disconnected."

Today reminded me that the problem isn't learning too little, it's not seeing how the pieces fit while you're learning.
So here's how my today went, 4+ hours deep into data, and honestly, it was beautiful.

I started with Introduction to Data Science in Python.
The course frames learning as solving mysteries with data, which made even the basics interesting again:
importing and using modules

creating variables
setting up the foundation for analysis

Nothing brand new here, but context made it click differently.

Then I continued Introduction to Importing Data in Python
I picked up from yesterday and worked more with relational databases:
querying databases in Python

learning that any SELECT statement can be ordered by any column using ORDER BY

These are small SQL detail but big impact when working with real systems.

I moved into data visualization, this time with Seaborn
Here I learned:
how to create scatterplots

how to use count plots

how Seaborn works seamlessly with pandas DataFrames

It reinforced something important: visualization is more about asking better questions of your data rather than just displaying charts.

I started Introduction to Functions in Python, and I looked at, functions with and without parameters

Again, not all new, but clearer.

Under Python Toolbox, I learned about Iterators
This part surprised me:
the difference between iterables and iterators

how the next() method actually works

and yes, file connections are iterables too

That explained a lot of "magic" I'd previously taken for granted.

Finally, Cleaning Data in Python
This might have been my favorite part today:
unique constraints

handling complete vs incomplete duplicates deciding when to: drop duplicates or use groupby with meaningful summary statistics

This is the part of data work that quietly determines whether insights are trusted or ignored.

What today taught me? Some things weren't new. Some things were. But everything connected.
And that's the part people struggle with:
Learning tools is easy. Learning how they fit together takes intention.
I'm continuing the year doing exactly that, learning deeply, documenting honestly, and respecting the unglamorous parts of data work.
And yeah, I completed another chapter today !

Happy New Year once again🎉Here's to cleaner data, clearer thinking, and fewer "why doesn't this make sense?" moments in 2026.

-SP

Day 38 of improving my Data Science skills

Sylvester Promise — Thu, 01 Jan 2026 23:22:01 +0000

Happy New Year 🥂
I'm starting 2026 with a quiet but important realization: A lot of data problems aren't caused by analysis, they're caused by what we do before anyone ever sees the result.
Today was a 4+ hour deep work day, and almost everything I learned sat upstream of insight.

In Data Visualization, I didn't just "plot charts."
I learned how to prepare figures for real humans:
Why PNG is best for reports and dashboards (lossless, clean)

Why JPG works for the web but quietly sacrifices detail
Why SVG matters when designs need to be edited later
How dpi, figure size, and quality can change how trustworthy a chart feels
How to automate figure generation from data using: loop variables and .unique() method to create multiple plots from one dataset.

This taught me that sharing data is a responsibility, not a final step.

Then, in Importing Data, I stepped into relational databases:
What a relational database actually is

How to create a database engine with SQLAlchemy

How data is queried, fetched, and controlled long before analysis begins

And in Cleaning Data, I learned about Data Range Constraints. Data often breaks the rules we set for it.
Ratings exceed their limits. Subscriptions appear in the future. And suddenly you're forced to choose:
Do you drop the data?
Or do you correct it to preserve meaning?

Those aren't technical decisions. They're judgment calls.

That's what tied everything together for me today: Saving figures, querying databases, enforcing data ranges, they all decide whether the story we tell later is honest or misleading.

I wrapped up Data Visualizations with Matplotlib course today, and I'm continuing the remaining two others, with four more interactive courses starting tomorrow.

New year, new standards for how I work with data.

Happy New Year to everyone building carefully, not just quickly.

-SP

Day 37 of improving my Data Science skills

Sylvester Promise — Wed, 31 Dec 2025 10:12:46 +0000

One reason your data insights don't land (even when the analysis is correct)
A small but frustrating struggle I keep seeing among data users, analysts, founders, and hiring managers, is this: "The data is right, but the output is confusing, misleading, or unusable"
This usually shows up in two places:
1️⃣ Data visualizations that look fine… but don't communicate
I've been learning data visualization with Matplotlib, and one key lesson stood out: good charts are not about aesthetics, they're about accessibility and decision clarity.
Some practical fixes I learned:
Scatterplots are powerful for relationships, but when there's a third variable, encoding it with color (c=) instantly adds context.

Dark backgrounds may look cool, but they reduce readability when shared.
If color matters, use colorblind-friendly styles like:
✔️seaborn-colorblind
✔️tableau-colorblind10
These preserve meaning for everyone, not just people with perfect color vision.

If your work might be printed:
✔️Use less ink
✔️Consider grayscale styles for black-and-white printers

A visualization that excludes part of your audience is a broken visualization.

I wrapped up the Intermediate Importing Data in Python course today, including exercises that involved scraping data via the Twitter API. That felt like closing a chapter: pulling data from real systems, understanding authentication, and working with messy, real-world responses instead of clean examples.

But instead of moving on, I did something intentional: I enrolled in a new track, Importing & Cleaning Data in Python, and immediately started the Cleaning Data in Python course, which led to point 2️⃣
2️⃣"Clean" data that isn't actually clean
When cleaning data, type constraints quietly decide what analyses are even possible. Sometimes the problem isn't the model, the visualization, or the question, it's that the data is pretending to be something it's not.

Numbers that should be categories.
Categories treated like numbers.
Decisions made on assumptions no one stopped to question.
What looks like a number doesn't always behave like one.
For example: Codes, ratings, categories stored as numbers can break analysis if you don't enforce data type constraints. In this case, Python allows you to store them properly as 'category'.

Learning this made one thing clear: Cleaning data is about protecting meaning.
This is especially important when importing:
HDF5 files (hierarchical, complex structures)
MATLAB (.mat) files using scipy.io
Data pulled from APIs (like Twitter), where structure doesn't equate quality

Why this matters for decision-makers and hiring managers
Anyone can load data. Anyone can plot a chart. Anyone can scrape an API.
But not everyone:
Preserves meaning during import
Enforces correct data types
Designs visuals that work for real humans
Thinks about how insights will be consumed, printed, or acted on

That gap is what silently kills trust in data. If you have ever stared at a chart or dataset thinking "Why doesn't this sit right?", this might be why.

The work today reminded me that good data work isn't louder or fancier. It's quieter. More intentional. More honest.

That's the kind of data work I want to be known for!

-SP

Day 36 of improving my Data Science skills

Sylvester Promise — Mon, 29 Dec 2025 20:05:09 +0000

If you work with data long enough, you stop wishing for fancier models and start wishing for something simpler - Confidence.
Confidence that what you're seeing is real.
Confidence that what you're reporting won't fall apart under questions.
That was the thread running through my learning today.
In data visualization, I wasn't just drawing charts, I was learning how easily visuals can mislead if we're careless.
Histograms taught me how distributions can hide or exaggerate patterns depending on bin size.

Box plots forced me to confront variability, outliers, and spread not just averages.
Error bars forced me to admit uncertainty instead of hiding it. Instead of pretending a value is exact, I show how much it can realistically vary. That small visual choice makes a big difference, because decisions aren't made on perfect numbers, they're made within ranges.

Then came importing data, where many data problems are quietly born.
I worked with SAS and Stata files using pandas, and it reinforced something uncomfortable: reliable analysis doesn't start with models or plots. It starts with respecting how data was originally structured.
Knowing how to read SAS and Stata files means:
You can preserve meaning instead of guessing it
You can catch assumptions early
You're less likely to build insights on silently altered data
And that's exactly the kind of quiet skill that separates using data from understanding data.

And finally, I stepped into the world of Twitter APIs and authentication. Not scraping. Not downloading files. But asking a live system for data, with permissions, rate limits, and constraints.

It made one thing clear:
Real-world data doesn't wait for us. We negotiate access to it.

Here's the insight that stuck with me most today:
Most data failures don't happen at the "advanced" stage. They happen when we underestimate the basics.
A misleading histogram.
An ignored error bar.
An imported dataset we never questioned.
An API response we assumed was complete.

If you're building products, making decisions, or hiring people who work with data, this is the real differentiator. Not who knows the most tools, but who knows where trust can break.
That's the skill I'm deliberately building.

And tomorrow, I'm pushing deeper, more practice, more questioning, more discomfort.
Because trustworthy insights are never accidental.

-SP

Day 35 of improving my Data Science skills

Sylvester Promise — Sat, 27 Dec 2025 07:52:42 +0000

Lately, I've noticed something changing in how I learn.
I'm no longer excited just because something works.
I'm more interested in why it works, and what breaks quietly when I don't pay attention.
Today made that very clear.

While working with time series in Matplotlib, I annotated a point on a chart where something meaningful happened. It wasn't just a label on a line, it felt like saying, "This moment matters"
That's when it hit me: visualization isn't decoration. It's judgment. What you choose to highlight says what you believe is important.

At the same time, I was importing data in different forms, Excel files with multiple sheets, pickled files meant only for machines, JSON data pulled from an API.
That's where I felt the most tension.
Because importing data looks simple… until you realize how much trust you place in it without questioning:
Did I choose the right sheet?
Did I understand the missing values?
Did I assume the structure was "clean" just because it loaded?

Then came APIs. data that doesn't live in files at all. Data that exists somewhere else, shaped by decisions I didn't make, exposed through endpoints I have to respect.
That was really humbling.

Here's what I believe now, more strongly than before: Most mistakes in data work don't come from lack of skill. They come from moving too fast through the early steps.
Annotating taught me to slow down and ask, "What deserves attention?"
Importing taught me that structure is never neutral. Working with APIs reminded me that real-world data is messy by default, and that's normal.

I'm still learning. Still breaking things. Still fixing them. But I'm becoming more intentional, and that feels like real progress.

If you're on a similar path, I'm curious: What part of your process do you rush through because it feels "basic"… but probably deserves more care?

-SP

Day 34 of improving my Data Science skills

Sylvester Promise — Fri, 26 Dec 2025 13:06:11 +0000

I caught myself today staring at a chart and thinking: "This isn't just a plot anymore. It's a conversation"

That moment didn't come from theory. It came from practice, breaking things, fixing them, and noticing patterns I used to ignore.
Today's learning sat at an interesting intersection for me:
Seeing data over time

Understanding how data enters Python

Realizing how structured data quietly shapes everything downstream

In Matplotlib, I worked with time series and learned how to compare two variables over the same timeline using .twinx().

Instead of cluttering one axis, I learned how to let each variable speak in its own scale, clearly and honestly. I also built a small plot_timeseries function so I wouldn't repeat myself every time. That felt like progress: not just plotting, but designing how I work.

In Importing Data with Pandas, I went deeper into .read_csv(), not just loading files, but understanding how arguments like nrows, sep, header, and na_values quietly determine what kind of story your dataset will tell before you even visualize it.

Then, in Intermediate Importing Data, I shifted gears and met data where it lives today: APIs and JSONs. Loading JSON locally felt simple on the surface, but it unlocked something bigger, the realization that much of the data we analyze isn't born in spreadsheets at all.

Here's the uncomfortable truth I ran into: Most data mistakes don't happen during analysis. They happen much earlier, when we import, structure, or visualize without thinking deeply enough.
A mislabeled column.
A hidden missing value.
Two variables plotted on the same axis when they shouldn't be.

They are small choices but result to big consequences.

What changed for me today was intention.
Visualization became less about "making a chart" and more about respecting scale and meaning.

Importing data became less about "getting it into Python" and more about preserving truth.

Working with JSONs stopped feeling abstract and started feeling like a bridge to real-world systems.

Data doesn't speak clearly by default. We make it clear, through how we import it, structure it, and choose to show it.

Before I wrap this up, happy Boxing Day to everyone reading 🎁
I hope today finds you resting, reflecting, and maybe even quietly sharpening skills that will compound long after the holidays fade.

If you work with data, build systems around it, or make decisions from it: What part of your workflow do you trust too quickly, importing, visualizing, or interpreting?
That's the question I'm sitting with today.

-SP

Day 33 of improving my Data Science skills🎄

Sylvester Promise — Thu, 25 Dec 2025 11:12:07 +0000

Most datasets don't exist until someone decides they should.
So here's the thought that shaped my learning this Christmas morning: how do you turn time, text, and messy web pages into data that actually tells a story?
That question is why I spent today practicing three things that look unrelated on the surface, but aren't.

First, I worked with time series data in Matplotlib, because patterns only matter when you can see how they change over time. Plotting with a time index isn't just visualization; it's how trends, seasonality, and anomalies reveal themselves without explanation.

Second, I practiced importing flat files with Pandas, because most real-world data doesn't arrive polished. Flat files are the raw material, simple, scalable, and foundational to almost every data workflow.
Finally, I moved into web scraping with Requests and BeautifulSoup, because some of the most valuable datasets aren't downloadable at all. They live inside HTML, waiting to be structured, cleaned, and interpreted.

Today wasn't about "learning tools."
It was about learning how data professionals create meaning from what already exists, and from what hasn't been formalized yet.

Here's the evidence behind those lessons:
I plotted time series data using Pandas with a DateTimeIndex, letting Matplotlib automatically handle time-based labeling and trends.

I imported flat files using Pandas, reinforcing why clean indexing and data types matter before any visualization or modeling.

I scraped a live website, using BeautifulSoup methods like .find_all(), .get_text(), and .title to transform unstructured HTML into structured data.

Exploring this reminded me that data work starts long before dashboards, and often before the dataset even exists.

And on a day about reflection, giving, and meaning: Merry Christmas to everyone building quietly, learning deeply, and turning raw information into insight🎄

-SP

Day 32 of improving my Data Science skills

Sylvester Promise — Wed, 24 Dec 2025 19:12:24 +0000

Today was one of those days where everything quietly stacked on top of each other.

I worked across three areas:
Data Visualization (Matplotlib): learning how to create subplots using small multiples, one figure, multiple related stories.

Introduction to importing Data: loading flat files with np.loadtxt(), fast, simple, and perfect for numeric data.

Intermediate importing Data (which is my main focus): scraping the web using BeautifulSoup.

Recall yesterday, I fetched raw HTML with requests. Today, I learned how to make sense of it. Which brings me to four questions 👇

1️⃣ Why is BeautifulSoup important?
If you work with data, here's a question for you:
How much valuable information do you rely on that doesn't come neatly packaged in CSVs or databases? Job postings? Market prices? Customer reviews? Competitor insights? Public reports? Most of it lives on the web, messy, inconsistent, and unstructured.

BeautifulSoup matters because it helps you turn public web pages into usable data, without needing to be a full-blown web developer.
Now that we have gotten that out of the way, I would like to know:
Where does your organization still manually copy data from websites?
What decisions could be faster if that data was structured automatically?

2️⃣ What is BeautifulSoup actually about?
In web development, there's a term called "tag soup."
It refers to HTML that's: poorly structured, inconsistent, and syntactically messy... that's the common structure of most of the web.

BeautifulSoup exists to make tag soup beautiful again. It: parses messy HTML, organizes it into a tree structure, and lets you extract exactly what you need, calmly and predictably

The core object is called BeautifulSoup, and one of its most helpful methods is prettify(), which formats ugly HTML into clean, readable, indented structure.
Think of it as: turning a noisy room into an organized library.

3️⃣ How does BeautifulSoup work? (The practical framework)
Here's the simple workflow I practiced today:

Fetch the page (using requests from yesterday)
Parse the HTML with BeautifulSoup
Navigate the structure
Extract what matters

Some methods I used:
.prettify() to see clean, indented HTML

.title to get the page title

.get_text() to extract all readable text

.find_all() to collect all links or repeated elements

This is where scraping stops being "guesswork" and starts being systematic analysis.

4️⃣ What happens when you apply this to your world?
Now imagine:
Tracking competitor pricing changes automatically

Monitoring job market trends weekly

Extracting customer sentiment from reviews

Building datasets that don't officially "exist"

What questions could you answer if the web became queryable?
And more importantly: What data are you currently ignoring because it looks too messy to touch?

Today's lesson for me was simple but powerful:
Getting data isn't the hard part anymore, understanding and structuring it is.

Tomorrow, I'll keep pushing deeper into implementation and real use cases.
Still learning. Still experimenting. Still curious.

If you've ever wondered how raw web pages turn into insights, this is one of the first real steps.

-SP

Day 31 of improving my Data Science skills

Sylvester Promise — Tue, 23 Dec 2025 16:18:58 +0000

One thought kept crossing my mind today: How many things look simple… until you actually try to understand what's underneath?
That was my mood as I settled into my space and started learning.

I spent time customizing my plots, changing labels, adding markers, and titles until the visuals actually said something. It reminded me that data isn't just numbers..

I also worked through exercises pulling HTML pages using the Requests package. Seeing raw web content arrive in my editor felt oddly grounding, like peeking behind the curtain before any importing magic begins.

Then there were flat files. Simple. Unflashy. Powerful. The kind of files that quietly hold together so much of data science work, easy to share, easy to load, easy to overlook until you truly understand why they matter.
Nothing about today felt rushed. Nothing felt forced.
Just small, steady moments of understanding stacking up.
Tomorrow, I'll keep building on this. More curiosity. More exercises. More implementation.

-SP

Day 30 of improving my Data Science skills

Sylvester Promise — Mon, 22 Dec 2025 22:10:37 +0000

Today felt like one of those quiet but powerful learning days, the kind where things don't just work, but I understand why and how they work.

I spent today deep in importing data in Python. It started with files. Not fancy models. Not dashboards. Just learning how data enters Python.
I learned how to:
Read entire text files safely using with open()
Import Excel files with multiple sheets using pd.read_excel()
I passed sheet_name=None and realized: "Oh… Python just handed me every sheet as a dictionary." That was a small win that felt big.
Suddenly, Excel wasn't just a file anymore. It was a structured collection of DataFrames, each one accessible by name. Simple, Clean yet Powerful.

Then came the visuals. Data isn't meant to stay silent. Using Matplotlib, I began turning numbers into pictures, they are visual patterns that explained more than raw values ever could

This is where curiosity really took over. Because when you see the data:
You start asking better questions
You notice patterns you didn't expect
You stop guessing and start observing
Today really taught me learning isn't just about new functions or syntax. It's about realizing:
Data doesn't magically appear, it's imported intentionally
Files aren't scary, they're just formats waiting to be read
Visualization isn't decoration, it's understanding

Tomorrow, I'm going deeper:
More data sources
More visuals
More exercises that stretch understanding beyond

If today was about opening the door to data, Tomorrow is about walking confidently through it.

And I'm just getting started

-SP

Day 29 of improving my Data Science skills

Sylvester Promise — Mon, 22 Dec 2025 18:50:06 +0000

Over the weekend, I crossed two big learning milestones🥳🎉
✅Completed Web Scraping in Python
✅Completed Introduction to Statistics in Python
Web scraping taught me how to actually collect data from the web, building spiders, navigating pages, extracting structured information, and understanding how raw data is born.

Statistics helped me make sense of that data, from probability distributions to correlation, causation vs spurious relationships, design of experiments and how data behaves in the real world.

To start the week, I moved into:
Introduction to Data Visualization with Matplotlib
Intermediate Importing Data in Python

Now it's about turning data into clear visuals and learning how to bring data in from different sources, files, formats, and real world datasets.
Starting tomorrow, I'll also be adding: Introduction to Importing Data in Python, to complete the pipeline.
This phase feels exciting because everything is finally connecting: Get the data, clean it, analyze it, visualize it and explain it.

Slowly but surely, I'm building the full data workflow, not just learning concepts, but practicing how they fit together.

Onward to more charts, more datasets, and deeper questions

-SP

Day 28 of improving my Data Science skills

Sylvester Promise — Fri, 19 Dec 2025 20:34:04 +0000

Today, I switched things up. Instead of just talking about it, I recorded a short video showing how I solved a real Web Scraping exercise using Scrapy.

https://www.loom.com/share/c70bb3df17ba4e34b0db37a794d401cc

Specifically,I used response.follow() to navigate through the links I scraped.

Alongside this, I also worked through several exercises in Introduction to Statistics in Python, focusing on:
Using scatterplots to see relationships
Computing correlation coefficients
Knowing when to use correlation after visualization

Even though I didn't record the statistics exercises, they strongly complemented today's scraping work. One helps you collect data, the other helps you reason about what that data actually means.

I'm enjoying documenting not just what I'm learning, but how I approach problems, step by step.

-SP