Forem: Iman

Stop Using .iterrows(). Here's What Actually Fast Looks Like

Iman — Thu, 21 May 2026 15:24:09 +0000

You're looping over a DataFrame. It feels natural. It's killing your performance.

# What most tutorials say
for index, row in df.iterrows():
    df.at[index, 'tax'] = row['price'] * 0.17

Here's the progression you should actually know:

# Level 1: Vectorization — 10-100x faster 
df['tax'] = df['price'] * 0.17

# Level 2: .apply() when logic is conditional 
df['tax'] = df['price'].apply(lambda x: x * 0.17 if x > 0 else 0)

# Level 3: np.where — the fastest option 
import numpy as np
df['tax'] = np.where(df['price'] > 0, df['price'] * 0.17, 0)

Method	1M rows
`.iterrows()`	~480s
`.apply()`	~3s
Vectorized / `np.where`	~0.04s

Pandas wraps NumPy. NumPy operates on entire arrays at the C level. The moment you loop row by row, you throw that away.

The shift: don't think "what do I do to each row?" rather you should ask "what transformation applies to this column?"

That's it. Notebooks that took minutes will now run in seconds.

Why Your Windows Paths Break Inside a Docker Container (and How to Fix It in .NET)

Iman — Sun, 17 May 2026 15:59:22 +0000

If you have ever deployed a .NET app inside a Docker container on a Windows host, you have probably run into a situation where a path that looks perfectly valid on the host machine causes subtle, hard-to-debug failures inside the container. This post walks through the exact problem, the runtime behaviour that causes it, and a one-line fix.

The Setup

I was building DevMetrics, a self-hosted developer productivity dashboard. Users add local Git repositories through a web form by pasting in the path. The app then calls LibGit2Sharp to scan commits from that path.

On Windows, running via dotnet run, everything worked fine. After dockerizing the app and running it as a Linux container, paths started silently mangling themselves.

The Symptom

A user pastes this into the form:

D:\Users\Downloads\my-project

The app logs show the path it actually tried to open:

/app/D:\Users\Downloads\my-project

The working directory /app had been prepended to the Windows path. LibGit2Sharp then throws because that path obviously does not exist.

Why It Happens

The culprit is Path.GetFullPath. In .NET, calling GetFullPath on a relative path resolves it against the current working directory. The relevant question is: what counts as "rooted" on Linux?

// On Windows: returns true
Path.IsPathRooted("D:\\Users\\Downloads\\my-project");

// On Linux: returns FALSE
Path.IsPathRooted("D:\\Users\\Downloads\\my-project");

Linux has no concept of Windows drive letters. To the Linux runtime, D:\Users\Downloads\my-project is not an absolute path starting with a drive letter. It is a relative path that happens to start with the character D.

So when you call:

Path.GetFullPath("D:\\Users\\Downloads\\my-project")

on Linux, the runtime treats it as relative and prepends the process working directory, giving you /app/D:\Users\Downloads\my-project.

No exception is thrown. No warning is logged. The path just silently becomes wrong.

The Fix

Guard with IsPathRooted before calling GetFullPath:

private static string NormalisePath(string path)
{
    var trimmed = path.Trim();

    var absolute = Path.IsPathRooted(trimmed)
        ? trimmed
        : Path.GetFullPath(trimmed);

    return absolute.TrimEnd(
        Path.DirectorySeparatorChar,
        Path.AltDirectorySeparatorChar);
}

If IsPathRooted returns false (which it will for any Windows-style path on Linux), skip GetFullPath entirely and use the trimmed value as-is. The path will still be wrong in the sense that D:\... is not a valid Linux path, but at least you have not silently corrupted it further. You can then validate it properly and return a clear error to the user.

The Deeper Problem: Host Paths vs Container Paths

Even with the fix above, there is a second issue worth understanding. When you run Docker on Windows, the paths inside the container are Linux paths, not Windows paths.

If your docker-compose.yml mounts a host directory like this:

volumes:
  - D:\Users\Downloads\my-project:/repos/my-project

Inside the container, that directory is available at /repos/my-project. The Windows path D:\Users\Downloads\my-project does not exist from the container's perspective at all.

So the correct path for a user to enter in your app's form is /repos/my-project, not the Windows path they see in File Explorer.

This is worth making explicit in your UI. In DevMetrics I added a hint directly on the Add Repository form:

In Docker, use the container path (e.g. /repos/my-project).

A one-line hint that prevents a lot of confusion.

The rule is simple: IsPathRooted is OS-aware. A Windows drive-letter path is not considered rooted on Linux, and GetFullPath will silently corrupt it as a result.

How Analyzing Stock Market Data Taught Me What Time Series Textbooks Couldn't

Iman — Sat, 16 May 2026 10:57:05 +0000

I spent a semester analyzing OGDC — Pakistan's largest oil and gas company — on the Pakistan Stock Exchange. Financial data breaks every assumption you were taught to rely on. This is what that actually looks like in practice.

Classroom Time Series: Deceptively Plain
When you first learn time series analysis, you get clean examples. A seasonal temperature dataset. A sales series that trends upward. The examples are chosen because the methods work on them.
Real financial data doesn't do that.
The first thing I did was plot OGDC's daily returns and run a normality test. The Jarque-Bera statistic came back at 172,348. The p-value was effectively zero. Excess kurtosis was 41.7.
A normally distributed series has kurtosis of 0. A kurtosis of 41.7 means the tails are so fat that standard deviation becomes a nearly meaningless risk metric. Extreme daily moves — the kind that would be essentially impossible under a Gaussian model — were happening regularly.
That number forced me to actually understand what kurtosis means instead of just knowing it's the fourth moment of a distribution. There's a difference.

Stationarity Is Not Academic
Every time series textbook starts with stationarity. Run the ADF test, check the p-value, and proceed. I did this robotically for two years before working with financial data made it concrete.
OGDC's price series has a unit root — it's non-stationary. The returns series is stationary. This distinction matters enormously because:

You cannot apply ARIMA to a non-stationary series without differencing it
If you use price levels as your ML target, you're teaching your model to predict a random walk with drift — it will learn "tomorrow's price ≈ today's price," score well on R², and be completely useless
Feature engineering on price levels creates look-ahead bias in ways that are subtle and easy to miss

Once I understood why we model returns instead of prices — not because a textbook said so, but because I watched what happened when I tried to model prices — stationarity became a tool rather than a checkbox.

Autocorrelation Actually Tells You Something
I ran a Ljung-Box test on the return series and found significant autocorrelation at lags 1, 3, 12, 22, and 29. Then I ran a Runs Test and found that the signs of returns — whether each day is up or down — are completely random (Z = 0.043, p = 0.97).
These two results together are more interesting than either one alone.
There is short-horizon autocorrelation in the magnitude and sequence of returns, but no predictability in direction. The Runs Test is essentially a direct test of the Efficient Market Hypothesis at the binary level. OGDC passes it — you cannot predict whether tomorrow is an up day from knowing today was an up day.
The ARMA(0,3) model I fit finds genuine structure — three moving-average terms capturing autocorrelation at those specific lags, with residuals that pass all diagnostic tests. The model is adequate. It just can't forecast direction, only structure.
That distinction — structure vs predictability — is something I didn't appreciate until I saw it in data where it actually mattered.

Volatility Is A Time Series Too
The most important thing I built in this project was not a price prediction model. It was a volatility model.
The Ljung-Box test on squared returns came back with Q(10) = 226.96 (p ≈ 0). Volatility was clustering — large moves followed by large moves, regardless of direction. This is the ARCH effect. It means a constant-variance assumption in any model you build on this data is simply wrong.
I implemented four GARCH family models from scratch using scipy.optimize for maximum likelihood estimation — ARCH(1), GARCH(1,1), EGARCH(1,1), and GJR-GARCH(1,1). No external library.
GJR-GARCH won by AIC. The leverage parameter γ = 0.33 means negative shocks amplify future volatility 33% more than positive shocks of equal magnitude. Seeing that emerge from your own MLE implementation — watching the optimizer converge to a parameter that confirms something the financial econometrics literature established decades ago — is a different kind of understanding than reading about it.
The persistence parameter came out at α+β = 0.78, implying a shock half-life of 2.8 trading days. Developed market equities typically show half-lives of 20–30 days. PSX stocks mean-revert faster, which is consistent with thinner liquidity and more retail-driven price discovery.

The Machine Learning Part Humbled Me
I built a Random Forest classifier to predict daily price direction. It achieved 99.7% accuracy and ROC-AUC above 0.999.
I spent about an hour thinking I had done something impressive before I figured out what had actually happened.
The raw data from Investing.com includes a Change % column — the same-day percentage change. I had included it as a feature. It is numerically equivalent to the classification target I was trying to predict. The model had learned to read the answer from the question paper.
When I removed it and all other same-day OHLC data, accuracy dropped to 53–54%. That's the honest number. Still above the 51.3% majority-class baseline — a real but modest edge consistent with the short-horizon autocorrelation the ARMA model detected. But nowhere near 99.7%.
Data leakage is easy to understand abstractly and much harder to catch in practice, especially when it comes from a column that looks obviously useful and has a name that doesn't immediately suggest it contains the target variable. I caught it because the accuracy was too high. If it had been 72%, I might never have looked.

What Financial Data Forces You To Learn
Working with market data has specific advantages over other domains as a learning environment.
Everything is measurable. You can immediately test whether a finding is statistically meaningful — run a Diebold-Mariano test, get a direct answer. The ground truth is public and continuous, producing a new observation every trading day. The feedback loop is tight in a way most applied ML projects aren't.
More importantly, the assumptions of standard methods are genuinely violated. Non-normality, heteroscedasticity, autocorrelation, structural breaks — financial data has all of them simultaneously. You cannot take shortcuts and get away with it the way you can with cleaner data. And the cost of ignoring violations is concrete: a risk model assuming Normal returns will underestimate the probability of a -10% day by an order of magnitude. That's not theoretical. It's the kind of mistake that has real consequences.

What I Would Tell Myself Before Starting
The statistical tests are not bureaucratic checkboxes before you get to the interesting modelling. They are the foundation that determines whether your model is even asking the right question.
The ADF test result determined the entire structure of the ML pipeline. The Jarque-Bera result determined which distribution to fit. The Ljung-Box result determined which lag features to include. The ARCH effect test justified the GARCH modelling. Each test was load-bearing.
If you skip them and go straight to building a model on price levels, you will get results that look plausible and are wrong in ways that are hard to diagnose after the fact. Financial data will make you do the statistics properly — not because a course requires it, but because it will break your model if you don't.

Calling Your First API Using Its OpenAPI Spec — A Python Walkthrough

Iman — Fri, 15 May 2026 10:26:02 +0000

Originally published on imalaitech.com

Every public API comes with documentation. But some APIs go further — they ship an OpenAPI spec. It's a single JSON file that describes every endpoint, every parameter, and every possible response. Read it once and you know exactly how to use the API before writing a single line of code.
Let's do that right now.

What Is an OpenAPI Spec?
It's a standardized JSON (or YAML) file that describes an API completely. Endpoints, required parameters, response shapes — all in one place. Tools like Swagger UI turn it into interactive documentation, but the raw file is what matters here.

Step 1 — Get the Spec
We're using the Swagger Petstore. It's a fake pet store API built purely for learning. Open this in your browser:
https://petstore3.swagger.io/api/v3/openapi.json
Save it locally. Don't try to memorize anything — just scan the structure. You'll notice three things that matter:

paths — all available endpoints
parameters — what each endpoint accepts
responses — what comes back

That's the whole mental model.

Step 2 — Find Your Endpoint
Search for /pet/findByStatus in the spec. Here's what it looks like:
json"/pet/findByStatus": {
"get": {
"summary": "Finds Pets by status",
"parameters": [
{
"name": "status",
"in": "query",
"required": false,
"schema": {
"type": "string",
"enum": ["available", "pending", "sold"]
}
}
]
}
}
The spec tells you everything. One query parameter called status, three allowed values. No guessing.

Step 3 — Call It in Python
pythonimport requests

response = requests.get(
"https://petstore3.swagger.io/api/v3/pet/findByStatus",
params={"status": "available"}
)

data = response.json()
print(f"Status: {response.status_code}")
print(f"Pets returned: {len(data)}")
print(data[0]) # peek at the first result
Run it. Then try swapping available for pending or sold and see what changes.

Step 4 — Read the Response
You'll get back a list of pet objects. Something like:
json{
"id": 1,
"name": "doggie",
"status": "available",
"photoUrls": []
}
The spec told you this was coming — check the responses section for /pet/findByStatus. It describes the exact shape of what you just received.

Why This Actually Matters
Most developers go straight to tutorial blog posts when they want to use an API. The spec is better. It's always up to date, it's authoritative, and it tells you things blog posts skip — like which parameters are optional, what the valid enum values are, and what error responses look like.
Get comfortable reading specs now. As your projects grow more complex, this habit saves hours.

Stop Breaking Your System Python: A Practical Guide to Managing Multiple Python Versions

Iman — Fri, 15 May 2026 10:17:08 +0000

Originally published on imalaitech.com

Every Python developer eventually hits the same wall: one project needs Python 3.9, another requires 3.11, and the new one won't run on anything below 3.12. The instinct is to upgrade globally — which promptly breaks something else.
There's a better way. Two tools worth knowing: pyenv and conda. They solve the same problem differently, and knowing which to reach for matters.

pyenv — Lightweight, Automatic Version Switching
pyenv lets you install and switch between Python versions at the system, user, or project level. Critically, it never touches your system Python.
Installing pyenv
Linux / Mac:
bashcurl https://pyenv.run | bash
Add this to your ~/.bashrc or ~/.zshrc:
bashexport PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
Windows — use pyenv-win:
powershellInvoke-WebRequest -UseBasicParsing -Uri "https://raw.githubusercontent.com/pyenv-win/pyenv-win/master/pyenv-win/install-pyenv-win.ps1" -OutFile "./install-pyenv-win.ps1"; &"./install-pyenv-win.ps1"
Basic Usage
bash# See available versions
pyenv install --list

Install a specific version

pyenv install 3.11.9

Set global default

pyenv global 3.11.9

Set version for current project only

pyenv local 3.10.14
pyenv local creates a .python-version file in your project directory. Every time you cd into that folder, pyenv switches automatically — no manual toggling.
Per-Project Workflow
bashcd my-project
pyenv local 3.10.14
python --version # Python 3.10.14

cd ../other-project
pyenv local 3.12.0
python --version # Python 3.12.0
Clean and automatic.

conda — Full Environment Management
conda handles Python versions and packages together. It's heavier than pyenv but earns its weight when dependencies get complex — particularly in data science and ML.
Installing conda
Download Miniconda — the minimal install — for Windows, Linux, or Mac.
Basic Usage
bash# Create an environment with a specific Python version
conda create -n myenv python=3.10

Activate it

conda activate myenv

Install packages

conda install numpy pandas

Deactivate

conda deactivate
Per-Project Workflow
bashcd my-project
conda activate project-39 # Python 3.9 environment

cd ../other-project
conda activate project-312 # Python 3.12 environment
Unlike pyenv, switching isn't automatic — you activate manually. Small trade-off for what you get in return.

pyenv vs conda — Which One?
pyenvcondaPython version switchingAutomatic (per directory)Manual activationPackage managementNo — use pip + venvYes, built-inBest forGeneral / backend / web devData science / MLWindows supportVia pyenv-win (limited)ExcellentOverheadLightweightHeavier
Use pyenv if you want lightweight, automatic version switching per project.
Use conda if you're in data science, need environment and package management together, or are primarily on Windows.

The Ideal Setup: pyenv + venv Together
For most developers, this combination covers everything:
bash# Set Python version with pyenv
pyenv local 3.11.9

Create a virtual environment

python -m venv .venv

Activate it

source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows

Install dependencies

pip install -r requirements.txt
pyenv handles the version. venv handles dependency isolation. Best of both.

Common Issues Worth Knowing
pyenv: command not found after install
Your shell config wasn't reloaded. Run source ~/.bashrc (or ~/.zshrc) after editing it, or close and reopen your terminal.
conda: environment not activating in scripts
Run conda init once after installing, then restart your terminal. This adds the necessary hooks to your shell profile.
python --version still shows system Python after pyenv local
Make sure the pyenv init lines are actually in your shell config and that you've reloaded it. Running pyenv versions should list your installed versions.

Quick Reference
bash# pyenv
pyenv install 3.11.9
pyenv local 3.11.9 # project-level
pyenv global 3.11.9 # system-level
pyenv versions # list installed versions

conda

conda create -n myenv python=3.11
conda activate myenv
conda deactivate
conda env list # list all environments

Managing Python versions correctly is one of those things that saves you hours of debugging later. Set it up once, stop thinking about it.
If you're running into an issue not covered here, drop it in the comments.