Forem: Oleh Koren

Part 2: Missing Parameterization in Test Scenarios

Oleh Koren — Thu, 07 May 2026 21:31:00 +0000

Hey, welcome back.

Last week, we talked about running tests locally — and why it destroys your results.

Today's mistake is sneakier. The test runs. The numbers look fine. But you're not testing what you think you're testing.

⚠️ The script works. The test is broken.

A few years ago I reviewed a load test for an e-commerce platform. On the surface, everything looked right — ramp-up, think time, user count. Results looked acceptable. Everyone was happy.

Then I looked at the script:

500 virtual users.
All logging in as test_user@example.com.
All searching for "nike shoes".
All adding product ID 1042 to the cart.
I asked: how many unique users in production? Answer: about 100k.

They weren't testing the application. They were testing what happens when one user sends 500 identical requests.

What actually goes wrong

When all users send identical requests, a few things happen — none reflect reality.

The database cache gets artificially warm. The same query runs 500 times. After the first hit, everything comes from cache. In production with real varied data, your database would be working significantly harder. Your test just told you your cache is fast. Congratulations.

Session and token conflicts show up as errors. Most systems won’t allow hundreds of sessions for one user. You see auth errors, 4xx responses — and waste hours debugging non-performance issues.

Real bottlenecks stay invisible. If everyone hits the same page, you'll never discover the real bottlenecks. That's the page that matters on launch day.

The result?

You finish the test, the numbers look okay, and you ship. Then production behaves nothing like your test predicted. Because you tested a single cache-warmed path — not your system.

🛠 The fix is straightforward

Use data files. JMeter, k6, Gatling, Locust — all support CSV/external data.

At minimum, parameterize:

User credentials — a pool of unique accounts, not one shared login
Search queries — varied terms that reflect what real users actually search
Entity IDs — product IDs, order numbers, anything your script references Even 500–1000 unique records are enough to break the cache dependency and get results that mean something.

Parameterization fixes one layer of realism. But it's rarely the only one missing.

In that e-commerce review, fixing the data was the easy part. The harder conversation was about how the load model was designed, why the test environment didn't reflect production, and how to read results without drawing the wrong conclusions.

Those aren't individual mistakes. They're gaps in how the whole testing process was built.

If you want to build that process properly from the start — Performance Testing Fundamentals course covers it end-to-end, from test design to hands-on execution.

That's Part 2. The test ran. The results were meaningless.

Part 3 is coming soon — we'll talk about ignoring think time between requests, and why it turns your load test into something no real user would ever do.

🔔 Follow me the next part

Part 1: Common Mistakes in Performance Testing - Running Tests from Your Local Machine

Oleh Koren — Fri, 01 May 2026 11:46:32 +0000

Hey, I'm Oleh — a Performance QA Engineer who has been breaking systems professionally for a while now. In a good way, of course.

I also teach performance testing — you can check out my hands-on course below:

🎯 Performance Testing: From Basics to Hands-On

We're starting a series: Common Mistakes in Performance Testing that I usually see. Each part covers one mistake in depth. Today — the one that surprises people the most when they first hear it.

The Problem

You built a great script. Results look terrible. Before blaming the app — where did you run the test from?

Before you start blaming the app — where exactly did you run that test from?

One of the most common mistakes I see is engineers running load tests from their local machine while the application is hosted somewhere completely different. Say, you're sitting in Europe, your app is deployed in the US, and you're wondering why response times look so bad.

Here's the thing — your test is not just measuring how the app performs. It's measuring the network latency between your laptop and the server.

And that latency can easily add 200–400ms per request. I've even seen cases where it turned 200ms into 5–7 seconds. Multiply that by hundreds of virtual users sending thousands of requests — and your results are just noise.

Real users of your app are not sitting in your living room. They're distributed. And your load generator should reflect that.

What To Do Instead

Run your tests from a machine that is close to where your real users are — or better, from a cloud-based load generator in the same region. Most modern tools like Grafana Cloud, BlazeMeter or Azure Load Testing let you choose the region. Use it.

If you're testing a global app — spin up agents in multiple regions and see how latency affects different user groups. That's a much more realistic picture.

Quick checklist before your next test run

Where is your app hosted?
Where is your load generator running?
Are these in the same region — or at least close?

If the answer to the last question is "no idea" — fix that first before looking at any results.

That's it for Part 1. Simple mistake, сompletely broken results.

Part 2 is coming next week — we'll talk about missing parameterization in test scenarios.

🔔 Follow me for Part 2: Missing Parameterization

Hidden Problem That Makes Most Load Tests Unrealistic

Oleh Koren — Wed, 18 Mar 2026 09:45:02 +0000

You execute a load test using 100 virtual users.

The test generates 1000 requests per second.

In production, the same system only receives around 80 requests per second.

So what happened?

The answer is simple — think time.

If you don't use it, your load test might act like robots instead of real users, and the results could be totally wrong.

Let’s break down why think time is important and how to use it correctly during performance testing.

What Is Think Time?

Think time is the amount of time a real user takes between performing actions in an application.

In real life, users don't immediately perform the next request after receiving a response.

They usually:

read the page
scroll through content
think about what to click
enter data
compare options
navigate through menus

All these actions take time.

Performance tests that ignore this behavior produce unrealistic load patterns.

The Problem Without Think Time

User opens a page
User clicks a button
User searches for a product

If your script runs without pauses, the virtual user will send requests as quickly as it can.

Example:

Without think time:

100 virtual users → ~1000 requests per second (RPS)

But real users behave differently.

With realistic pauses:

100 real users → ~80 requests per second

That’s more than a 12× difference.

You can think about it using a simple model:

RPS ≈ Users / (Response Time + Think Time)

When think time increases, the number of requests per second naturally drops — even with the same number of users.

The same number of users produces completely different load depending on think time:

Scenario	Users	RPS
No think time	100	~1000
With think time	100	~80

This is why tests that don't include time for thinking often show more system traffic than there really is.

Why This Matters

Ignoring think time can lead to incorrect conclusions.

You might think:

system cannot handle the load
infrastructure needs scaling
performance is worse than expected

In reality, the test was simply unrealistic.

Virtual users perform actions continuously without any human behavior.

Real users don’t behave like that.

Where Does Think Time Come From?

Good performance tests aim to mimic how real people behave when using an application.

Think time values can be gathered from various sources.

1. User Analytics

User analytics tools offer important information about how people use your application.

Examples of such tools include:

Google Analytics
Mixpanel
Amplitude
Pendo

These tools allow you to look at several metrics, such as:

Average time spent on a page
Session duration
Click intervals
Page navigation patterns

These metrics can help estimate how long a user usually stays on a page before taking another action.

2. Real User Monitoring (RUM)

Real User Monitoring tools track how users interact with the application in production.

They can provide data such as:

Time between user interactions
Navigation timing
User session behavior

This type of data is often one of the most reliable source for defining think times.

3. Business Knowledge

In some cases, the most effective source of think time information comes from understanding the product and its typical use scenarios.

For instance:

Login page → 2–5 seconds
Search page → 5–10 seconds
Checkout → 15–30 seconds

Users require time to read, compare, and make decisions.

Different user journeys naturally involve different think times.

How Think Time Is Implemented in Performance Tests

Performance testing tools often offer various methods to simulate think time, which represents the time users take between actions.

JMeter

Common timers available in JMeter include:

Constant Timer
Uniform Random Timer
Gaussian Random Timer

Random timers are generally preferred because real users do not wait the exact same amount of time between actions.

An example of a random pause would be a delay between 3 and 8 seconds.

In k6, think time is typically simulated using the sleep() function.

Example:

sleep(randomBetween(3,8))

Gatling

Gatling includes built-in pause functions to simulate think time.

Example:

pause(3,8)

This creates a random delay between actions.

Common Think Time Mistakes

Even experienced engineers can sometimes use think time incorrectly.

Here are some common problems.

No think time at all

The script runs without any pauses.

This creates traffic patterns that don't match real-world behavior.

Fixed pauses

Example:

sleep(5)

Real users don't wait the same amount of time each time.

Random delays are typically more accurate.

Unrealistic pause values

Pauses that are too short or too long can affect the test results in an unrealistic way.

Think time should be based on how real users actually behave.

The Key Idea

Virtual users execute scripts.

Real users think.

If your performance test does not simulate this thinking time, the results may not represent real-world traffic.

A realistic load test is not just about the number of users.

It is about how those users behave.

If you're interested in learning more about realistic load modeling, think time, and performance testing in practice — I cover these topics in detail in my course.

You can check it out here:
👉 Performance Testing Fundamentals: From Basics to Hands-On (Udemy)

What Does a Performance QA Engineer Actually Do?

Oleh Koren — Mon, 09 Mar 2026 11:37:00 +0000

When I tell people that I work as a Performance QA Engineer, the reaction is often the same:

"Do you always have tasks in that role?"
"Isn’t it mostly functional testing and only partly performance?"
"Or is it closer to DevOps and infrastructure work?"

The confusion is understandable.

Performance QA engineers are relatively rare. In many companies, there’s no dedicated role, and developers only run quick load tests before big releases.

But in teams where performance really matters, there are specialists responsible for making sure the system can actually handle real-world traffic.

As someone working in this area, I can say one thing clearly: there is always work to do.

Why Performance QA Engineers Are Rare

When I started learning QA automation during my company’s training program, I hardly heard anyone mention performance testing.

Most discussions were about manual testing, UI automation, frameworks, and CI/CD.

Performance testing felt occasional at first, but over time I realized it’s not just QA — it’s a mix of automation, backend, frontend, and infrastructure skills that lets you see the system as a whole.

To work effectively, you need to understand:

APIs and system architecture
Backend and frontend behavior
Infrastructure and monitoring
Data analysis

This combination explains why there are fewer specialists in this field.

What Performance QA Engineers Actually Do

Many think performance testing is just running a load test and generating a report.

In reality, it’s a continuous process that touches multiple areas.
Typical tasks include:

Implementing and maintaining test scenarios — designing realistic user flows and scripts.
Running different types of performance tests — load, stress, spike, endurance.
Investigating test results — identifying bottlenecks, such as slow database queries or caching issues.
Analyzing metrics and presenting insights — interpreting response times, throughput, and errors; presenting findings to developers, DevOps, and stakeholders.
Collaborating across teams — working with backend, frontend, QA, and infrastructure teams to improve system performance.
Monitoring test environments — ensuring environments can handle realistic traffic.

Performance testing isn’t a solo task — it’s a mix of technical investigation, communication, and collaboration.

Backend and Frontend Performance Testing

When we begin performance testing, the journey almost always starts behind the scenes — in the backend.

Much of our work focuses on API testing, because APIs handle the majority of system load.

To understand how the backend behaves under load, we use JMeter, Gatling, and k6.

But backend testing is only half the story.

User experience also depends on frontend performance — page load, rendering, and responsiveness matter.

Tools like PageSpeed Insights and Sitespeed.io help us measure this, because even if the server responds immediately, users might still be waiting on a loading screen.

Infrastructure Matters a Lot

One of the biggest challenges in performance testing is environment setup.

Ideally, performance tests should run in an environment that is as close to production as possible.

In practice, this often means creating a dedicated performance testing environment that mirrors the real system architecture.

In projects where this is done properly, we can:

scale services up and down during testing
simulate realistic traffic
avoid impacting production systems
turn environments off outside working hours to save infrastructure costs

Without a realistic environment, performance testing results can be misleading.

CI/CD and Automation

Modern performance testing is rarely manual. We integrate tests into CI/CD pipelines (Jenkins, TeamCity, GitHub Actions, GitLab CI) to:

Run tests regularly
Detect regressions early
Ensure performance testing is continuous

Collecting and Visualizing Results

Running a test is just the beginning.

The real value comes from analyzing results.

In many projects, we store performance metrics in time-series databases such as InfluxDB or Prometheus, and then visualize them using Grafana dashboards.

This setup allows us to track things like:

response times
throughput
error rates
hardware resource usage — both on the load generator and the application servers

Having all this data in one place makes it much easier to understand how the system behaves under load and identify performance trends over time.

Communication Is Key

One thing that many people underestimate is how much communication is involved.

Performance engineers often act as a bridge between different teams.

We regularly work with:

developers
DevOps engineers
QA teams
architects
product/delivery managers

Sometimes the hardest part is not running the test — it’s explaining the results.

For example, telling the team that the system handles 5,000 users instead of the expected 10,000 can have serious business implications. So it’s important to explain not only the numbers but also why the system behaves that way and what can be improved.

How Performance Teams Are Organized

Team structures can vary a lot.

In some companies, a performance engineer works directly inside a product team together with developers and QA engineers.

But quite often there are dedicated performance testing teams that support multiple projects at once.

I’ve seen setups where a single performance team works across several systems, helping different teams prepare for production traffic or major releases.

In that case, the role becomes somewhat similar to an internal consultant who helps teams understand and improve system performance.

The Work Never Really Stops

One thing I quickly realized about performance engineering is that the work never really ends.

Applications are constantly changing:

new features are added
traffic grows
infrastructure evolves
architecture becomes more complex

Every change introduces new performance risks.

That’s why performance testing is not just something you do before a release — it’s something that needs to happen continuously.

Final Thoughts

Performance QA Engineers might not be the most common role, but their impact can be huge. This role uniquely combines testing, automation, database knowledge, backend and frontend understanding, DevOps practices, data analysis, and collaboration.

And for me personally, that’s exactly what makes this role interesting.

If this resonates and you want to dive deeper — I have a course on Udemy that covers performance testing from scratch.

Black Friday: Would You Choose the Right Performance Test?

Oleh Koren — Fri, 27 Feb 2026 09:50:40 +0000

I recently ran a poll on LinkedIn in a Software Testing group.

Here are the results:

Spike Testing — ~55% ✅

Which means:

👉 ~45% of respondents chose the wrong answer.

And that’s not surprising.

Many engineers mix up load, volume, scalability, endurance, and spike testing — especially when the scenario sounds “real production-like”.

Let’s break it down.

What Is Spike Testing?

Spike testing is a type of performance testing where:

Load increases suddenly and significantly
System behavior is observed during the spike
System recovery is measured after the spike drops

It answers questions like:

Can the system handle a sudden traffic burst?
Does it crash or degrade gracefully?
Does it recover automatically?
Are there cascading failures?

Black Friday traffic jumping 4x in minutes?
That’s a textbook spike scenario.

Why It’s NOT Volume Testing

Volume testing checks how the system behaves with large amounts of data (e.g., millions of records in the database).

It’s about data size, not sudden traffic bursts.

Black Friday is not about data growth.
It’s about concurrent users arriving fast.

Why It’s NOT Endurance Testing

Endurance (soak) testing verifies system stability over long periods of sustained load.

Example:

50-70% expected load
6–14 hours
Monitoring memory leaks

Black Friday spike is short-term chaos, not long-term stability.

Why It’s Not Primarily Scalability Testing

Scalability testing evaluates how well the system scales when load increases gradually.

It checks:

Linear resource growth
Auto-scaling behavior (rules)
Cost efficiency

But Black Friday is not gradual.
It’s explosive.

That difference matters.

Real-World Insight

In real production systems, spike failures often happen because of:

Cold caches
Connection pool limits
Thread pool exhaustion
DB lock
Autoscaling delays

And here’s the critical part:

Many teams test average load.
Some test peak load.
Very few test sudden load jumps.

That’s where production incidents live.

How to Find Your API’s Breaking Point (Before Your Users Do) - Capacity Testing with JMeter

Oleh Koren — Thu, 19 Feb 2026 06:45:45 +0000

When you build an API service, it’s crucial to know how many users or requests it can handle before things start breaking. This is where capacity testing comes in.

What is Capacity Testing?

Capacity testing identifies the maximum load your system can handle before performance degrades or errors appear.

It helps detect bottlenecks and verify performance requirements under heavy load.

Capacity Testing an E-Commerce API with Apache JMeter

Let’s say we have a REST API for an online store. We want to see how many requests it can handle before it starts failing.

Step 1: Organize Thread Groups

Option 1: Separate Thread Groups per endpoint

Why? Because if one endpoint starts to degrade (high response time or errors), it can affect the throughput of other endpoints if tested together in the same group.

Separate Thread Groups allow:

Independent load for each endpoint
Clear identification of which endpoint is the bottleneck
Easier reporting and analysis

Option 2: Mixed workload in one Thread Group

In this case:

Multiple endpoints are executed together
Load is distributed based on user behavior percentages
The goal is to identify overall system capacity under real-world conditions

In this example, we use Option 1.

Step 2: Load Configuration

Number of Threads (Users)
Set per Thread Group based on expected usage (e.g., browsing endpoints typically require more users than checkout).

Ramp-Up Period
Defines how quickly users are added.
Short ramp-up creates spikes; longer ramp-up simulates gradual traffic growth (10 minutes).

Loop Count and Duration
Loop Count = Infinite, with a 10-minute test duration.

Startup Delay
Used for sequential Thread Groups to allow system recovery before applying the next load stage (the first Thread Group starts immediately).

Increase users step-by-step to determine the capacity limit.

Step 3: Add Configuration Elements

1. Add User-Defined Variables
Create a User-Defined Variables element to store values you’ll use across your test plan:

Why use variables?

Makes it easy to adjust load or URLs without editing each Thread Group
Supports parameterization for multiple environments (dev, staging, production)
Keeps the test plan organized and maintainable

Now pass the created variables into the Thread Groups (example):

2. Add HTTP Request Defaults
Add an HTTP Request Defaults element to set common request parameters, so you don’t have to repeat them in every HTTP Request sampler:

Step 4: Add HTTP Requests

Add an HTTP Request sampler to each Thread Group.
Set the API endpoint and HTTP method (GET, POST, etc.) accordingly.

Example:

Thread Group 1 → GET /products – browse products
Thread Group 2 → GET /products/{id} – view product details
Thread Group 3 → GET /products/search?q={searchPhrase} – search products by phrase
Thread Group 4 → GET /orders/{orderId} – order details
… and so on

Current Test Plan structure:

Step 5: Add Think Time

Think Time simulates real user pauses between actions.
In other words, it simulates the time a real user spends "thinking," reading, or interacting with a page before making the next request.

Why It Matters

Without think time:

JMeter sends requests back-to-back, which is not realistic
The load pattern may overwhelm the system compared to real users
Metrics like response time, throughput, and error rate can be misleading

With think time, the test more accurately reflects real user behavior, helping identify bottlenecks under realistic traffic conditions.

How to Implement in JMeter

Use Timers such as Constant, Uniform Random, or Gaussian Random.

Placement:

Place the Timer at the same hierarchical level as configuration elements in your Test Plan (above all Thread Groups).
When positioned here, the Timer applies globally to all Thread Groups.
This ensures consistent think time across all endpoints without repeating the Timer in each group

Constant Delay Offset = 2000 ms → every request waits at least 2 seconds before executing.
Random Delay Maximum = 4000 ms → adds a random delay between 0 and 4 seconds.

Each request will randomly wait somewhere between 2 and 6 seconds

Step 6: Add Listeners

Listeners in Apache JMeter are used to collect, display, and export test results. They help you analyze performance metrics during and after test execution.

1. Add Aggregate Report

Aggregate Report – summary metrics (avg, min, max, throughput, error %).
Useful for quick performance evaluation.

2. Add View Results Tree

View Results Tree – debugging only (inspect requests and responses).

3. Add Backend Listener

Backend Listener – send metrics to storage (InfluxDB) and analyze in Grafana for real-time dashboards and historical comparison.

This approach allows you to:

Monitor performance in real time
Store historical test data
Build dashboards for trend analysis
Compare multiple test runs

Using the Backend Listener with InfluxDB and Grafana provides a much more professional, production-ready analysis than basic JMeter listeners.

Current Test Plan:

Step 7: Run Test and Observe

Start with a low number of users and increase gradually.

Monitor server resources (CPU, memory, database connections) while testing.

Capacity is reached when:

When throughput stops increasing, response times spike, or errors appear, this indicates the capacity limit for that endpoint.

Step 8: Example Result

This example shows a 10-minute test with a 10-minute ramp-up to 100 users for the first endpoint:
GET /products

Since ramp-up equals test duration, users were added gradually throughout the entire test. Throughput increased steadily as concurrency grew.

However, 500 errors appeared and increased with higher load, indicating endpoint degradation before reaching a stable throughput plateau. As a result, the capacity point could not be clearly determined, and a defect should be created for further investigation.

This graph represents the response time metrics for the same 10-minute test.

At the beginning of the test, response time is higher (~1 second) due to warm-up and cache initialization.

After the initial minute, response time stabilizes around:

P50 ≈ 510 ms
P90 ≈ 540 ms
P95 ≈ 580 ms

Although latency remained relatively stable, the increasing error rate indicates system instability under higher concurrency.

In summary:

Errors were present from the early stage of the test and increased with higher load
Although response times remained relatively stable, the growing error rate indicates system instability
A clear capacity point could not be determined
Further investigation and defect creation are required before continuing capacity validation

Final Thoughts

Capacity testing should be:

Incremental
Endpoint-specific
Data-driven

Testing endpoints separately provides clearer insights and prevents one bottleneck from hiding another.

Only after stabilizing individual endpoints should you move to mixed workload testing for full system capacity validation.

Want to Go Further?

If you’d like to dive deeper into other performance testing types and learn how to build a complete testing stack using JMeter, InfluxDB, and Grafana for real-time monitoring and analysis, you can explore the full course here:

👉 Performance Testing Fundamentals: From Basics to Hands-On (Udemy)

Behind the Scenes: Why I Created a Performance Testing Course

Oleh Koren — Fri, 13 Feb 2026 23:18:42 +0000

Performance testing is one of those skills many engineers think they understand.

Until production says otherwise.

Over the years, I kept seeing the same pattern:

Load test reports showing "PASSED"
Average response time within limits
Zero errors during the test
And then… production incidents

The problem wasn’t tools.
The problem was understanding.

The gap I keep noticing

When I looked at available learning materials, I saw a few common issues.

1️⃣ Outdated content disguised as "updated"

Some courses are technically refreshed — new thumbnail, new title, small edits.

But then you open the video and see a MS Word document on the screen for 15 minutes while the instructor reads text aloud.

Performance testing is practical.

It requires scenarios, metrics interpretation, trade-offs, and production context.

Not just theory.

2️⃣ Too tool-focused

A lot of courses focus heavily on:

"Here’s how to use Tool X."

Buttons. Config fields. How to run a test.

But very little about:

How to design a meaningful workload model
How to connect performance metrics to real user experience
How to interpret test results correctly
How to prevent false confidence from "green" reports

Tools change.
Principles don’t.

If you only learn the tool, you’re limited.
If you understand performance engineering thinking, you can use any tool.

3️⃣ Limited structured material

Performance testing isn’t as popular as automation testing or manual testing.

Finding structured, end-to-end material is surprisingly hard.

You’ll find:

Blog posts
Isolated tutorials
Tool documentation

But rarely a complete path from fundamentals → metrics → workload modeling → test executions → result analysis → reporting.

The turning point

After multiple production-related discussions and post-incident analyses, I realized something:

Many teams don’t fail because they don’t run load tests.

They fail because they don’t know how to think about performance correctly.

That’s when the idea started forming — not to create “another tool course,” but to structure performance testing the way I believe it should be taught:

Start with core principles and the real purpose of performance testing in modern systems
Explain why performance testing matters for business, not just for engineering
Break down different types of performance tests and when to use each of them
Dive deep into performance metrics and how to analyze results correctly
Show how to choose the right tools instead of blindly following trends
Demonstrate how to design and execute practical tests using JMeter and BlazeMeter
Build a simple but complete performance testing setup with JMeter, InfluxDB, and Grafana
Teach how to communicate results clearly to both technical and non-technical stakeholders

What it actually took

It took around 3 months to create — evenings and weekends after my main job.

The hardest part wasn’t just recording the videos.

It was everything around it.

I had to figure out recording tools, experiment with setups, and learn how to make the content look and sound professional. The first versions didn’t even pass the platform’s quality review because the audio wasn’t good enough. I had to invest in a proper microphone, re-record several lessons, adjust sound settings, and rethink the whole setup.

Good audio matters more than most people expect.

And while solving the technical side of recording, I was also trying to solve a different challenge — how to simplify complex topics without oversimplifying them.

Performance testing sits at the intersection of infrastructure, backend architecture, system design, monitoring, and even statistics. Turning that into something structured, practical, and clear required a lot of iteration — reorganizing sections, refining explanations, replacing vague theory with concrete examples.

Recording was just the visible part.

The real work was making sure the content was both accurate and understandable.

Why I’m sharing this

Not as an announcement.

But because performance testing deserves more attention.

If you work in QA or backend engineering and you’ve ever seen:

"Average response time looks fine"
"It passed in staging"
"We didn’t see that coming"

Then you already know why this topic matters.

I decided to organize my experience into a structured course. If you're curious, you can find it here:

👉 Performance Testing Fundamentals: From Basics to Hands-On (Udemy)

Either way — I hope more engineers move beyond just running tests and start truly understanding performance.

Because that’s where the real difference is made.

Why I’m Writing About This on Dev.to

I’m not here just to publish a one-time announcement.

My goal is to regularly share practical, sometimes uncomfortable topics related to performance testing.

Performance testing is still a niche area compared to automation or backend/frontend development. But when systems fail, performance is often at the center of the problem.

Through Dev.to, I want to:

Break down real-world performance issues
Explain concepts in a practical way
Share lessons learned from production discussions
Encourage deeper thinking beyond just “running a tool”

If performance engineering is relevant to your work, you’ll see more content here focused on fundamentals, interpretation, and real system behavior.

Because the industry doesn’t need more button-click tutorials.

It needs better performance thinking.

Your Load Test Passed. Production Still Failed. Why?

Oleh Koren — Wed, 11 Feb 2026 18:02:36 +0000

Your load test report says:

Metric	Value
90th percentile	1.7 s
Errors	0 %
Test result	PASSED

Two weeks later — production incident.

CPU spikes 🔺

Users complain about 12-second response times ⏳

What went wrong?

1️⃣ Unrealistic workload model

In your test:

100% of users hit “Search”
No browsing
No login/logout mix
No background jobs impact

In reality:

Search + Login + Cart + Background jobs
Scheduled tasks
Third-party API calls

Performance issues rarely happen because of one endpoint.
They happen because multiple flows compete for shared resources:

DB connections
Thread pools
CPU
Memory
I/O

If your workload model does not reflect real traffic distribution,
you are not testing the system — you are testing a simplified demo.

That’s not load testing

2️⃣ No think time

🟥 Without think time, your test becomes:

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Request     │→│ Request     │→│ Request     │→│ Request     │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

This artificially increases request rate per user.

🟩 Real User:

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Click       │→│ Read        │→│ Think       │→│ Click       │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Without think time:

You simulate robots, not humans
You overload backend artificially

This changes:

CPU usage patterns
DB lock behavior
Thread scheduling
Cache efficiency

Under realistic traffic, resource contention increases non-linearly.
Once thread pools are saturated or DB connections are exhausted, response time doesn’t degrade gradually — it spikes.

Most production incidents are not caused by load. They are caused by saturation.

3️⃣ No real production analytics

Did you build your load model based on:

Real traffic distribution?
Real endpoint usage ratios?
Peak hour data?
Seasonal spikes?

Or just:

“We expect around 1000 users.”

Capacity planning without production analytics is guesswork.

And guesswork doesn’t survive Black Friday traffic.

4️⃣ Test duration too short

30 minutes ≠ production reality.

0–30m ✅ Everything looks fine

2h ✖ Memory pressure · Connection pool fragmentation

4h ✖ Cache eviction thrashing · GC pauses grow longer

6h ✖ Thread pool starvation · Response times double

12h+ ✖ OOM kills begin 🔴 · Silent data corruption

If you test only for 30 minutes, you only validate startup behavior.

Final Thought

Load testing is not about running tests.

It’s about modeling reality.

And reality is always more complex than your script.

If you want to move from “running load tests” to actually understanding system behavior under load, I cover workload modeling, performance criteria, monitoring, and real-world strategy step-by-step in my course:

👉Performance Testing Fundamentals: From Basics to Hands-On (Udemy)

Why Percentiles Matter More Than Average Response Time in Performance Testing

Oleh Koren — Mon, 09 Feb 2026 09:22:01 +0000

When analyzing load test results, many teams highlight a single number:

Average response time = 1.2 seconds

And that number is often presented as the main indicator of system performance.

The problem?
Average response time can lie.

If you rely only on the mean, you might completely miss serious performance issues affecting real users.

Let’s break this down.

Why Average Response Time Is Misleading

The average (mean) is calculated as:
Sum of all response times / Total number of requests

Simple.

But averages are highly sensitive to outliers.

Imagine this response time distribution (in milliseconds):

100, 110, 120, 130, 140, 150, 5000

Now calculate:

Average ≈ 821 ms

That’s a huge difference.
One slow request (5000 ms) drastically shifts the average, even though most users experienced fast responses.

Now imagine the opposite situation.

100, 110, 120, 130, 4420, 4620, 4920

Average ≈ 2060 ms
But 3 out of 7 users waited 4+ seconds.

Does the average really represent user experience?

Not even close.

What Percentiles Actually Show

Percentiles answer a much more meaningful question:

"How fast were responses for most users?"

Definition

The P95 response time means:

95% of all requests were completed in this time or faster.
Only 5% were slower.

Similarly:

P90 → 90% of requests are faster than this value
P99 → 99% of requests are faster than this value

How Percentiles Are Calculated

Sort all response times from smallest to largest.
Determine the percentile position.

Basic intuition formula:

Position = (Percentile / 100) × N

Where:

Percentile = for example 95 (for P95)
100 = used to convert the percentage into a decimal (95% → 0.95)
N = total number of requests

Example:
If you have 1000 requests:
P95 position = 0.95 × 1000 = 950
The value around the 950th position in the sorted list represents your P95.

In practice, different tools may use slightly different formulas and interpolation methods, but the core idea remains the same:

Percentiles describe distribution, not averages.

No magic. Just distribution awareness.

Real-World Example from Load Testing

Let’s say during a load test you get:

Average response time = 3981 ms
Median (P50) = 3451 ms
P90 = 7325 ms
P95 = 9212 ms
P99 = 12760 ms

If you only report the average:

“The system responds in about 4 seconds.”

Sounds acceptable.

But reality:

10% of users wait more than 7.3 seconds
5% wait more than 9.2 seconds
1% wait more than 12.7 seconds

Now ask yourself:

Would 1% of users waiting 12+ seconds be acceptable in your production system?

For e-commerce during checkout?
For login?
For payment processing?

Probably not.

Why Percentiles Represent User Experience Better

Users don’t experience averages. They experience their own request.

If your P95 is high, that means a noticeable portion of users is suffering.

In modern systems, especially:

High-concurrency APIs
Distributed microservices
Cloud-native environments
Systems with auto-scaling

Latency spikes are normal.

Percentiles help you detect:

Queue buildup
Thread pool saturation
Garbage collection pauses
Network bottlenecks
Lock contention
Cold starts

Average hides all of that.

Final Thought

Next time someone reports only the average response time, ask:

What does P95 look like?
What about P99?

Because performance is about distribution —
and users feel the slowest moments.

If you want to better understand performance testing and go beyond just running tools, I cover this topic in more depth in my course:

👉Performance Testing Fundamentals: From Basics to Hands-On (Udemy)