Forem: Jamie Gray

How I Approach Evaluation When Building AI Features

Jamie Gray — Mon, 23 Mar 2026 09:59:22 +0000

Building an AI feature is not the same as shipping traditional software.

In classic software, you write code, test it, and deploy it. Deployment is usually a finish line.

With AI features, deployment is just the beginning.

That is one of the biggest mindset shifts I have had while working on AI systems. The question is not only whether a feature works during development. The bigger question is whether it keeps working well when real users, messy inputs, changing data, and production constraints enter the picture.

That is why I take evaluation seriously.

Not as a one-time quality check.
Not as something to do right before launch.
But as an ongoing part of building the product.

Why evaluation has to be continuous

AI systems are different because their behavior is not fully fixed.

Even if the code around the model does not change, the outputs can still shift because of:

new user inputs
different context data
retrieval quality changes
prompt changes
model updates
distribution drift in real-world usage

That means evaluation cannot be treated as a checkbox.

It has to be part of the product lifecycle.

I want to know not just whether the feature looked good in a demo, but whether it remains useful, stable, and trustworthy as conditions change.

That is the real test.

What I actually care about when evaluating AI features

When I evaluate an AI feature, I usually care about five things most:

1. Accuracy

Is the output correct?

This sounds obvious, but it is still the first thing I check. If the system produces wrong answers, wrong classifications, wrong summaries, or wrong structured data, nothing else matters much.

That said, accuracy in AI systems is often contextual. Sometimes “correct” means factually correct. Sometimes it means aligned with a business rule. Sometimes it means sufficiently useful for the task.

So I try to define accuracy in a way that matches the real product outcome, not just a vague technical idea of correctness.

2. Relevance

Even technically correct output can still be unhelpful.

A feature might produce something reasonable, but if it does not solve the user’s actual need, it is not high quality.

That is why I evaluate whether the output is relevant to the request, the workflow, and the context in which the feature is being used.

This matters a lot in AI systems because models are often capable of producing plausible but slightly off-target results.

Those are dangerous because they look good at first glance.

3. Consistency

If two similar inputs produce wildly different output quality, the product will feel unreliable.

Consistency matters because users form expectations fast.

If a feature works beautifully once and then behaves weakly the next time, trust drops quickly.

So I pay attention to whether the system behaves predictably across similar cases, especially around formatting, decision logic, quality level, and error handling.

Consistency is one of the most underrated parts of AI quality.

4. Safety and failure behavior

A feature is not only defined by when it works.

It is also defined by how it behaves when it does not work.

I want to know:

does it fail clearly?
does it avoid unsafe output?
does it avoid pretending to know more than it knows?
does it return a controlled result when confidence is low?
does it trigger fallback logic appropriately?

This is part of evaluation too.

A system that occasionally says “I cannot produce a reliable result here” may be much better than a system that always returns something confident but questionable.

5. Usability

Even a technically strong model can create a bad product if the experience is clumsy.

That is why I also evaluate things like:

response format
readability
latency
whether the output is actionable
whether the UI can use the response cleanly
whether the feature helps the user move forward

Usability is not separate from model quality.

In product terms, usability is quality.

Automated evaluation is useful, but limited

I like automated tests.
I use them often.
But I do not think they are enough for AI systems.

Automated evaluation is great for checking:

known test cases
regression behavior
output shape
schema compliance
business rules
scoring against benchmark datasets

These are all valuable.

They help catch obvious breakages early.
They make iteration safer.
They create a baseline.

But automated evaluation usually has limits.

It may miss subtle quality issues.
It may fail to capture user expectations.
It may not notice when output is technically valid but practically weak.

So I treat automation as necessary, but not sufficient.

Human evaluation still matters a lot

One of the most important lessons in AI product work is that human review still matters.

AI output quality is often contextual, and context is hard to fully encode in automated checks.

That is why I like including some kind of human evaluation loop, such as:

manual review of outputs
comparison between versions
user feedback collection
spot checks on edge cases
domain expert review for sensitive use cases

Human evaluation helps catch issues that metrics alone can miss.

For example:

tone feels wrong
reasoning is shallow
output is technically correct but not useful
the answer misses the most important point
the result feels inconsistent with user expectations

These are real product issues, even if an automated score does not flag them.

My favorite way to think about AI evaluation

I usually break evaluation into three layers:

Layer 1: Component correctness

This is the most basic layer.

I ask:

does the endpoint work?
is the request valid?
is the output schema correct?
does the system parse and return the result properly?
do rules and validations work as expected?

This layer is mostly about engineering correctness.

Layer 2: Workflow quality

Here I ask whether the full feature works in practice.

For example:

does retrieval bring in the right context?
does the prompt produce the intended behavior?
does the output fit the product need?
does fallback behavior work when needed?
does latency stay in an acceptable range?

This is where many real issues appear.

The model may work fine, but the workflow around it may be weak.

Layer 3: Real user value

This is the highest layer.

I ask:

is this actually helping users?
is it reducing effort?
is it improving speed or quality?
are people trusting it?
are they using it again?

This is the layer that matters most in the long run.

A feature can pass technical tests and still fail to create value.

I care a lot about edge cases

A lot of AI features look strong on normal examples.

That is not enough.

The real test is how they behave when inputs are incomplete, ambiguous, messy, repetitive, or just strange.

That is why I deliberately evaluate edge cases such as:

missing context
contradictory input
unexpected formatting
overly long input
low-signal input
near-duplicate requests
malformed documents
empty or partial results from upstream systems

Edge cases are where reliability becomes visible.

They also reveal whether the system is truly engineered or just loosely connected around a model call.

Feedback loops are part of evaluation

Evaluation should not stop once a feature goes live.

After launch, I want to learn from real usage.

That means looking at signals like:

user ratings
correction patterns
support complaints
failed requests
fallback frequency
manual review findings
drift in quality over time

These signals help answer an important question:

Is the system getting better, staying flat, or quietly getting worse?

Without feedback loops, it is very easy to assume an AI feature is healthy just because no one is actively reporting disaster.

That is not a strong standard.

Metrics matter, but I try not to worship them

Metrics are useful.
I rely on them.
But I also think it is easy to over-trust them.

A number can look clean while the user experience is getting worse.

For example, a feature may have:

strong request success rate
good average latency
valid JSON output
stable infrastructure

and still be underperforming from a product perspective.

Maybe the answers are too generic.
Maybe the model is missing nuance.
Maybe users are redoing the work manually.
Maybe the feature is technically “working” but not actually helping.

So I like metrics, but I always want them paired with real qualitative review.

My practical rule

When I evaluate an AI feature, I usually come back to one simple question:

If this feature became important to users tomorrow, would I trust the current evaluation process to catch quality problems early?

If the answer is no, the evaluation setup is probably too weak.

That usually means one of these is missing:

representative test cases
regression checks
edge-case coverage
human review
real-user feedback loops
monitoring for drift
clear definitions of quality

Evaluation is not just about measuring the system.

It is about building confidence that the system can keep improving without quietly breaking.

Final thought

When building AI features, I do not think evaluation is something you do after the work.

I think it is part of the work.

It shapes how you design the system.
It affects how safely you can iterate.
It influences how much trust the product earns.
And it determines whether the feature can survive real-world usage instead of just looking good in a test environment.

To me, strong AI evaluation means combining engineering discipline with product thinking.

It means checking correctness, usefulness, consistency, safety, and real user value.
It means using both automated checks and human review.
And it means accepting that quality is something you keep managing, not something you permanently finish.

That is how I approach evaluation when building AI features.

Closing question for DEV readers:

When you evaluate AI features, what do you trust more: automated test coverage, human review, or real user feedback after launch?

Why I Insist on Clear Contracts for Robust AI Systems

Jamie Gray — Fri, 13 Mar 2026 02:45:31 +0000

One of the fastest ways to make an AI system fragile is to leave too much undefined.

Undefined inputs.
Undefined outputs.
Undefined expectations.
Undefined failure behavior.

That is why I insist on clear contracts when building AI systems.

A lot of people think contracts are mostly a backend or API concern. They think about them as documentation, schemas, or something engineers add later after the “interesting” AI work is done.

I see it very differently.

In AI systems, contracts are one of the most important tools we have for reducing chaos.

Because the truth is simple: the model is already probabilistic enough.

If the rest of the system is vague too, reliability drops fast.

What I mean by “clear contracts”

When I say “clear contracts,” I do not just mean an API spec.

I mean every important boundary in the system should be explicit:

what input is accepted
what shape the output must follow
what fields are required
what happens when context is missing
what errors are possible
what fallback behavior exists
what downstream systems can safely assume

A contract is really an agreement between parts of the system.

It says:

“If you give me this, I will give you that.”

And if that agreement is weak, every layer starts making guesses.

That is when systems become brittle.

AI systems get messy faster than normal systems

In traditional software, bad contracts already cause pain.

In AI systems, they cause even more pain because there is already more uncertainty in the stack.

You may have:

user-generated input
retrieval context from multiple sources
prompt construction logic
one or more model providers
output parsing
business rules
UI rendering
monitoring and evaluation layers

If even two or three of those layers are loosely defined, bugs become harder to track and failures become harder to contain.

That is why I care so much about strong boundaries.

The model should be the only place where limited uncertainty is allowed.
Everything around it should be as disciplined as possible.

Weak contracts create hidden bugs

One thing I have learned over time is that weak contracts often do not fail loudly.

They fail quietly.

And that is what makes them dangerous.

For example:

a missing field gets interpreted as an empty value
a model response changes format slightly and breaks parsing
the frontend assumes a value always exists when it does not
a service sends partial context and no one notices
a fallback path returns a different structure than the primary path

These are the kinds of issues that do not always cause immediate crashes.

Instead, they create inconsistent behavior.

The product feels unstable.
The team wastes time debugging.
Trust drops.
And everyone starts blaming the model when the real issue is poor system discipline.

That is why I would rather define contracts early than debug ambiguous behavior later.

Clear contracts make AI systems easier to trust

Trust is a huge part of AI product design.

Users do not trust a system because it sounds smart.

They trust it because it behaves in a way that feels consistent and understandable.

Strong contracts help create that experience.

If the system has well-defined inputs and outputs, it becomes easier to:

validate data before processing
reject malformed requests
keep UI behavior consistent
apply fallback logic safely
measure quality over time
trace failures back to specific layers

All of that improves reliability.

And reliability is what users experience as trust.

Contracts protect teams too

I think engineers sometimes talk about contracts only in technical terms, but they also help teams collaborate better.

When contracts are clear, people do not have to guess what another service or component is supposed to do.

That helps:

backend engineers
frontend engineers
ML engineers
product teams
QA teams
platform teams

Clear contracts reduce coordination overhead.

They make integration faster.
They make reviews clearer.
They make debugging less emotional because people can inspect behavior against a known agreement instead of arguing from assumptions.

That matters even more in AI products, where multiple disciplines usually need to work closely together.

I want inputs to be strict, not “flexible”

A common mistake in early AI systems is trying to make inputs too flexible.

The thinking usually sounds like this:

“The model is smart. It can figure it out.”

Sometimes it can.

But that does not mean the system should rely on that.

I would much rather define:

required fields
allowed types
size limits
optional vs mandatory context
accepted enum values
clear validation errors

Strict inputs are not a limitation.

They are protection.

They reduce noisy requests.
They improve consistency.
They make failures easier to understand.
And they stop the model from wasting effort dealing with avoidable mess.

In my experience, flexible inputs often feel convenient at first and expensive later.

Output contracts are even more important

If an AI system produces output that is going to be read by another part of the product, then output contracts matter even more than input contracts.

This is where a lot of systems become fragile.

The model returns something “close enough.”
Then parsing logic tries to interpret it.
Then downstream systems assume the result is valid.
Then edge cases start breaking everything.

That is why I strongly prefer structured outputs whenever possible.

For example, instead of treating the response as a loose paragraph, I want something like:

from pydantic import BaseModel, ValidationError

class AIResult(BaseModel):
    summary: str
    category: str
    confidence: float

def validate_result(raw: dict) -> AIResult | None:
    try:
        return AIResult(**raw)
    except ValidationError:
        return None

This does a few important things:

makes the output predictable
prevents malformed data from spreading
simplifies downstream code
makes monitoring easier
makes fallback behavior easier to implement

A strong output contract turns a fuzzy model response into something the product can safely use.

That is a big difference.

Contracts make failure handling better

Another reason I insist on clear contracts is that they make failure behavior much more intentional.

Without clear contracts, failure handling often becomes random.

One endpoint returns null.
Another returns a partial response.
Another returns plain text.
Another silently retries.
Another sends an error the frontend cannot interpret.

That kind of inconsistency is painful.

A better system defines failure behavior as part of the contract:

what errors are expected
what error shape is returned
when fallbacks are used
when retries happen
what the user sees
what gets logged for investigation

This makes the whole application feel more stable.

Users may accept a limitation.
They rarely accept confusing behavior.

Contracts help you change systems safely

AI systems evolve quickly.

Prompts change.
Providers change.
Models change.
Retrieval logic changes.
Business rules change.

That is exactly why contracts matter so much.

When the inside of the system changes, the boundaries around it should stay stable whenever possible.

That way, you can improve the implementation without constantly breaking everything connected to it.

This is one of the biggest advantages of good contracts:

they let you move faster without spreading instability everywhere.

To me, that is one of the best signs of strong engineering.

Not just speed.
Not just flexibility.
But controlled change.

My rule of thumb

Whenever I look at an AI system, I usually ask:

Where are the assumptions hiding?

Most of the time, those hidden assumptions point directly to weak contracts.

Maybe one service assumes a field is always present.
Maybe the frontend assumes a confidence score always exists.
Maybe the parser assumes the model always follows the same format.
Maybe the monitoring layer assumes every successful request is a good request.

These assumptions are where fragile behavior starts.

So I try to make them explicit.

If a system depends on something, I want that dependency defined.
Not implied.
Not guessed.
Not “usually true.”

Defined.

Final thought

I insist on clear contracts for robust AI systems because contracts create clarity, and clarity creates reliability.

They reduce ambiguity.
They protect downstream systems.
They make debugging easier.
They improve team coordination.
They make user experience more stable.
And they make it possible to evolve AI products without turning every change into a risk.

Models can be flexible.

System design should not be.

That is why I keep coming back to the same idea:

if you want AI systems to feel dependable, the boundaries between components need to be stronger than the uncertainty inside the model.

And clear contracts are one of the best ways to make that happen.

Closing question for DEV readers:

Do you think the most fragile part of AI systems is usually the model itself, or the unclear assumptions between the layers around it?

How I Think About Reliability in LLM Applications

Jamie Gray — Thu, 12 Mar 2026 01:25:21 +0000

A lot of people evaluate LLM applications by asking one question:

“Does it give a good answer?”

That matters, of course.

But once you start shipping LLM-powered features to real users, a different question becomes much more important:

“Can this system be trusted to behave well over time?”

That is how I think about reliability in LLM applications.

Reliability is not just about uptime. It is not just about whether the model provider is available. And it is definitely not just about whether the prompt worked on your favorite test case.

Reliability is about whether the full system can consistently produce useful outcomes in the messy conditions of real product usage.

That means handling weak inputs, inconsistent context, variable model behavior, latency spikes, provider issues, partial failures, and changing user expectations without turning the feature into a trust problem.

In my experience, that is where most of the real engineering work lives.

Reliability starts before the model call

One of the easiest mistakes in LLM product work is thinking that reliability begins at inference time.

It does not.

It starts much earlier.

Before a model ever sees a request, the system should already be doing important work:

validating inputs
normalizing structure
checking required context
trimming unnecessary noise
routing the request correctly
enforcing limits

If this layer is weak, the model ends up absorbing too much chaos.

And that usually leads to one of two bad outcomes:

the model produces low-quality output
the system produces inconsistent behavior that is hard to debug

I want the model to solve the right problem, not waste effort compensating for sloppy application design.

That is why I think of pre-processing as part of reliability engineering, not just convenience.

“Usually works” is not reliable enough

A lot of LLM systems feel good in internal testing because they work most of the time.

But “most of the time” is not a strong standard once real users depend on the feature.

Users remember the moments when the system feels unreliable:

when it ignores important context
when it returns a badly structured answer
when it times out
when it confidently says something weak
when it behaves differently for similar inputs
when the UI does not know how to handle the response

This is why I care less about best-case output and more about consistency.

A reliable LLM application should not just be capable of producing a good answer.

It should be engineered to reduce the chance of bad outcomes and contain the damage when they happen.

That sounds obvious, but it changes how you design the whole stack.

I separate model quality from system reliability

This distinction matters a lot.

A model can be strong while the application around it is unreliable.

Likewise, a model can be imperfect while the product still feels dependable because the surrounding system is well designed.

For me, model quality is about things like:

relevance
reasoning quality
factual alignment
formatting quality
task completion

System reliability is about things like:

request success rate
latency stability
validation behavior
fallback handling
error containment
monitoring
repeatability of output shape
resilience under bad inputs

These two areas affect each other, but they are not the same.

A lot of teams blur them together, and that makes debugging much harder.

If the product feels unstable, I want to be able to answer:

Is the issue in retrieval?
Is the issue in prompt construction?
Is the issue in provider latency?
Is the issue in output parsing?
Is the issue in business logic after the model call?
Is the issue in the frontend contract?

That level of clarity is critical if you want to improve a production LLM system instead of just guessing.

Structured output makes reliability much easier

One of my strongest opinions in applied AI is this:

If an LLM response needs to drive product behavior, it should be structured whenever possible.

Free-form text is flexible, but flexibility creates risk when the output feeds other parts of the system.

If the response is going to:

populate a UI
trigger a workflow
update a database
drive automation
affect downstream decisions

then the system needs predictable shape.

That usually means defining a schema and validating the response against it.

For example:

from pydantic import BaseModel, ValidationError

class DecisionResult(BaseModel):
    label: str
    confidence: float
    explanation: str

def parse_result(raw: dict) -> DecisionResult | None:
    try:
        return DecisionResult(**raw)
    except ValidationError:
        return None

This kind of pattern adds reliability in several ways:

malformed responses are caught early
downstream code becomes simpler
failure states become explicit
monitoring becomes clearer
fallback logic becomes easier to implement

The model may still be probabilistic, but the system around it becomes more disciplined.

That is a big win.

Fallbacks are a core reliability feature

I do not see fallback paths as optional polish.

I see them as part of the product contract.

If the AI path fails, the product should still behave in a controlled way.

Depending on the feature, that might mean:

retrying the request
returning a cached result
switching to a smaller or faster model
using a rules-based path for simple cases
showing a limited but safe response
asking the user for clearer input
returning a transparent failure state instead of weak output

A fallback is not an admission that the AI failed.

It is evidence that the system was designed responsibly.

In fact, I trust LLM products more when they clearly show that the team expected imperfect conditions and designed around them.

Latency is part of reliability

This is something AI teams sometimes underestimate.

If an application technically works but feels slow and unpredictable, users often experience it as unreliable.

That is why I treat latency as part of reliability, not just performance.

For every LLM feature, I want to know:

what response time users will tolerate
whether the task should be synchronous or asynchronous
whether partial streaming would improve experience
whether caching makes sense
what happens when a provider becomes slow
how the product behaves near timeout thresholds

A feature that returns strong results in 12 seconds may still feel worse than a feature that returns good-enough results in 2 seconds.

Reliability is not just about correctness.

It is also about dependable experience.

Monitoring needs to go beyond errors

Traditional backend monitoring is necessary, but it is not enough for LLM systems.

A request can succeed technically and still fail from a product perspective.

That means I want visibility into more than uptime and exceptions.

I care about things like:

malformed output rate
fallback rate
validation failure rate
latency distribution
token usage patterns
low-confidence outcomes
retrieval misses
prompt version changes
user correction patterns
output quality drift over time

Without that kind of visibility, it is very easy to assume the system is healthy when it is actually degrading in subtle ways.

For LLM applications, “no crash” is a very weak health signal.

Reliability improves when responsibilities are clear

As systems grow, I find reliability gets much better when each layer has a narrow responsibility.

A healthy LLM request path often looks something like this:

accept and validate request
normalize input
gather trusted context
assemble prompt in a predictable format
call model provider
validate response shape
apply business rules
return structured result
log the full path
route failures to fallback logic

This flow is not exciting, but that is exactly why it works.

The more explicit the boundaries are, the easier it becomes to:

debug failures
swap providers
improve retrieval
test edge cases
observe regressions
maintain the product over time

Unclear boundaries create unreliable systems.

Clear boundaries create systems that can evolve without constant fear.

Reliability is also a UX decision

I think engineers sometimes talk about reliability as if it lives only in backend architecture.

But a lot of reliability is really about user experience.

For example:

Does the user know what the feature is supposed to do?
Does the product make confidence visible when appropriate?
Does it avoid pretending to know more than it knows?
Does it recover gracefully when a request fails?
Does it set the right expectations about timing and behavior?

A feature can be technically sophisticated and still feel unreliable if the UX creates false confidence or hides system limits.

That is why I think product design and engineering discipline have to work together in AI applications.

The most reliable systems are usually the ones that align model behavior, system constraints, and user expectations.

My rule of thumb

When I look at an LLM application, I usually ask a simple question:

If this feature becomes important to users tomorrow, would I trust the current system design to hold up?

If the answer is no, the issue is usually not the model alone.

It is usually one of these:

weak contracts
weak validation
weak observability
weak fallback design
weak latency planning
weak separation of responsibilities

That is why I think reliability in LLM applications is mostly a systems problem.

The model matters.

But the surrounding engineering matters just as much, and often more.

Final thought

Reliable LLM applications are not built by hoping the model behaves well.

They are built by designing systems that reduce uncertainty, constrain risk, and recover gracefully when imperfect things happen.

That means:

clear inputs
structured outputs
strong validation
careful monitoring
fallback paths
thoughtful UX
disciplined architecture

To me, that is what separates an AI demo from a real product.

A demo proves a model can do something interesting.

A reliable application proves users can depend on it.

Closing question for DEV readers:

What do you think contributes most to reliability in LLM applications: structured output, fallback design, monitoring, or better UX around model behavior?

What Building AI in Healthcare Taught Me About Engineering Discipline

Jamie Gray — Tue, 10 Mar 2026 19:16:03 +0000

When people talk about AI, the conversation usually goes straight to model quality, prompt design, or the latest tooling.

But building AI in healthcare changes your perspective very quickly.

In healthcare, you do not get to treat software as “mostly fine.” You cannot hide behind a good demo, a clever prototype, or a model that performs well on average. The standard is different. Systems need to be reliable, understandable, and disciplined because the environment itself leaves very little room for ambiguity.

That is one of the biggest lessons I took away from working on AI systems in healthcare-related environments.

It was not just a lesson about compliance or process. It was a lesson about engineering discipline.

And honestly, I think it made me a better engineer overall.

1. “Good enough” stops being a comfortable mindset

In a lot of software environments, teams can move fast, release early, and improve through feedback. That approach is often the right one.

But healthcare forces you to think much more carefully about what “good enough” really means.

If your product touches sensitive workflows, patient-related data, or any kind of clinical context, sloppy assumptions become expensive very quickly. A vague output, a weak edge-case path, or poor traceability is not just a technical inconvenience. It becomes a trust problem.

That changes how you build.

You start asking better questions:

What exactly is this system supposed to do?
Where can it fail?
How do we know when it fails?
What does the user see when confidence is low?
Can we explain the result?
Can we audit the path that produced it?

That kind of thinking is valuable everywhere, but healthcare makes it unavoidable.

2. Reliability matters more than novelty

One of the easiest traps in AI is chasing impressive output.

A system can look exciting in a controlled demo and still be weak in practice. It might handle the happy path beautifully but struggle with inconsistent inputs, poor data quality, missing context, or workflow friction.

Healthcare taught me to care much more about dependable behavior than flashy behavior.

That means I pay close attention to things like:

validation before any AI step
well-defined input and output contracts
fallback behavior
clear failure states
predictable latency
monitoring around the full workflow
human-readable logs and audit trails

None of those things are glamorous. But together, they are what make a system usable in the real world.

A product becomes valuable when people can rely on it repeatedly, not when it produces one amazing example in a meeting.

3. Sensitive domains punish vague system design

AI systems already contain uncertainty because model behavior is probabilistic.

So if the rest of the application is also vague, things get messy fast.

Healthcare pushed me toward more structured system design. I became much more careful about defining boundaries between:

data ingestion
preprocessing
model or rules-based logic
post-processing
validation
storage
user-facing presentation

That separation matters because when something goes wrong, you need to know where it went wrong.

If everything is mixed together, debugging becomes painful. Observability becomes weak. Auditing becomes harder. And trust drops.

In practice, disciplined engineering often means making the system a little more boring in the best possible way.

It means fewer hidden assumptions.
More explicit contracts.
Cleaner service boundaries.
Less magic.

That is especially important in AI products, where the model itself already introduces enough variability.

4. Trust is not a “nice to have” feature

One thing I think engineers sometimes underestimate is how much trust is part of product quality.

In healthcare, trust is not built by branding or UI polish alone. It is built by system behavior.

Users trust products that feel:

consistent
transparent
stable
understandable
recoverable when something goes wrong

That affects technical decisions more than people realize.

For example, sometimes the right engineering decision is not to maximize model flexibility. Sometimes it is to constrain the output, add stronger validation, or reduce the feature scope so the user experience becomes more predictable.

That might sound less ambitious, but in practice it often produces a better product.

A smaller, clearer, more dependable AI feature is usually more valuable than a broader feature that behaves inconsistently.

Healthcare made that tradeoff feel obvious.

5. Data quality is a product concern, not just a pipeline concern

Another major lesson: bad data does not stay in the data layer.

It shows up everywhere.

It affects model quality, retrieval quality, downstream logic, user trust, and operational load. In healthcare-related systems, where data may come from multiple sources and vary in structure or quality, this becomes very visible.

That is why engineering discipline cannot stop at the model layer.

You need discipline in:

schema design
ingestion pipelines
normalization logic
validation rules
metadata handling
storage decisions
monitoring for drift or inconsistency

A lot of AI issues that look like “model problems” are really system design problems upstream.

Once you work in an environment where the stakes are higher, that becomes easier to see.

6. Auditability changes how you think

One of the healthiest engineering habits I picked up from healthcare-related work was thinking more carefully about traceability.

If a system produces a result, can you answer basic questions later?

What input was used?
What context was retrieved?
Which rules or model path ran?
What output was generated?
What was shown to the user?
What version of the system handled the request?

Those questions matter a lot in sensitive environments, but they also matter in normal product engineering.

Auditability makes systems easier to debug.
It makes incident response faster.
It makes quality reviews more grounded.
It makes teams less dependent on memory and guesswork.

In other words, it turns engineering from intuition-driven to evidence-driven.

That is a huge shift.

7. Guardrails are part of product design

Before working closely on AI systems in more sensitive domains, I think I saw guardrails mostly as a safety layer.

Now I see them as part of product design.

Guardrails are not just about preventing disaster. They are about shaping the behavior of the product so it remains useful, understandable, and aligned with user expectations.

That can include:

input constraints
output schema validation
confidence thresholds
fallback responses
role-based access patterns
content restrictions
review workflows for certain actions

These choices define the real experience of the product.

In healthcare, this mindset becomes natural because the cost of ambiguity is more obvious. But I think it applies to nearly every serious AI product.

8. Speed still matters — but reckless speed hurts more

Working in healthcare does not mean moving slowly by default.

It means being intentional about where speed is safe and where discipline is non-negotiable.

That distinction matters.

You can still move quickly in architecture, iteration, and delivery if you are building on strong foundations. In fact, disciplined systems often let you move faster later because they are easier to reason about.

I have seen the opposite too: teams move fast early, skip structure, and then lose months later untangling fragile workflows, poor validation, weak logging, and unclear responsibilities between components.

Healthcare taught me that speed is only useful when it compounds.

And discipline is what allows speed to compound.

9. AI does not remove the need for strong engineering

If anything, it increases it.

When a system includes probabilistic behavior, the surrounding software needs to be even more deliberate.

That means:

better contracts
better observability
better evaluation
better fallback paths
better user-facing clarity
better system boundaries

The model is only one piece of the product.

I think that is one of the biggest misconceptions in AI discussions today. People talk as if model capability is the main differentiator.

In real products, especially in high-trust environments, engineering quality is often the real differentiator.

The teams that win are usually the teams that can make AI capabilities dependable.

Final thought

Building AI in healthcare taught me that engineering discipline is not bureaucracy.

It is not unnecessary caution.
It is not the opposite of innovation.

It is what makes innovation usable.

It is what turns a promising model into a reliable product.
It is what protects user trust.
It is what helps teams scale without losing clarity.
And it is what makes systems hold up when real people start depending on them.

That lesson has stayed with me far beyond healthcare.

Because once you learn to build in an environment where trust, traceability, and reliability truly matter, it becomes hard to go back to casual engineering habits.

And I think that is a good thing.

Closing question for DEV readers:

Has working in a high-trust or highly regulated environment changed the way you think about software quality?

Why FastAPI Is a Great Fit for AI Products

Jamie Gray — Fri, 06 Mar 2026 21:13:46 +0000

A lot of AI product discussions focus on models, prompts, and retrieval.

But once you start building real features, the backend matters just as much as the model.

You still need clean APIs, input validation, error handling, observability, authentication, background jobs, and predictable response shapes. In other words, you need the same software engineering discipline as any other production system, with even more attention to reliability because AI behavior is already probabilistic by nature.

That is one reason I keep coming back to FastAPI when building AI products.

It is not the only good option in Python, and it will not solve architecture problems for you. But if you are building AI-powered APIs, internal ML services, evaluation tools, or product backends that need to expose model-driven capabilities, FastAPI gives you a lot of useful structure without a lot of unnecessary weight.

In this post, I want to break down why FastAPI works so well for AI applications and where I think it really shines.

1. AI products need clear contracts

One of the biggest backend challenges in AI systems is dealing with uncertainty.

Your model output may vary. Your retrieval results may vary. Your latency may vary. The last thing you want is for your API layer to add even more ambiguity.

That is where FastAPI helps immediately.

Its request and response models make it easy to define strict contracts around inputs and outputs. That becomes very important when you are exposing AI features to:

frontend applications
internal services
external customers
automation pipelines
evaluation workflows

Even if the model is probabilistic, your API should still be predictable.

With FastAPI, you can define schemas that make your service behavior explicit.

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class PromptRequest(BaseModel):
    user_input: str
    max_tokens: int = 300

class PromptResponse(BaseModel):
    answer: str
    status: str

@app.post("/generate", response_model=PromptResponse)
def generate(request: PromptRequest):
    result = f"Processed: {request.user_input}"
    return PromptResponse(answer=result, status="ok")

That level of structure is simple, but it matters a lot.

It means the rest of your stack does not need to guess what the AI backend will return.

2. Validation is especially important in AI systems

In a traditional backend, validation protects your application from bad inputs.

In an AI backend, validation protects both your application and your model usage.

That matters because AI requests often include:

long text inputs
optional context blocks
file metadata
user configuration parameters
model-specific settings
workflow state from previous steps

Without strong validation, it becomes easy to waste tokens, trigger bad outputs, or break downstream logic.

FastAPI makes validation feel natural instead of bolted on.

I like that because it encourages better engineering habits by default.

For example, you can validate:

required fields
type correctness
input length limits
allowed enum values
optional vs required context
nested payload structure

That is not flashy work, but it is exactly the kind of work that makes AI features stable.

3. FastAPI is a strong fit for service-oriented AI architecture

A lot of useful AI systems are not giant monoliths.

They are usually built as a set of focused services such as:

inference API
document ingestion service
retrieval service
evaluation pipeline
feedback collection API
internal admin or model-testing tools

FastAPI fits this style really well.

It is lightweight enough to use for small services, but structured enough that it still feels maintainable as things grow. That balance is important in AI teams because you often need to move quickly at the beginning without creating something impossible to manage later.

I have found it especially useful when building services that need to do one thing clearly, such as:

accept a request
enrich it with context
call a model or ranking layer
validate the result
return structured output

That pattern shows up constantly in AI product development.

4. Automatic docs are more useful than people think

This is one of the most underrated FastAPI features.

The automatic OpenAPI docs are extremely helpful when you are building AI systems across multiple teams.

AI product work often involves close collaboration between backend engineers, frontend engineers, ML engineers, product teams, and sometimes operations or domain specialists. Clear API docs reduce friction between all of them.

When your service is changing quickly, good documentation stops being a nice bonus and becomes part of the development workflow.

FastAPI gives you that documentation almost for free.

That helps with:

faster frontend integration
easier internal testing
better debugging across teams
cleaner handoffs between product and engineering
simpler onboarding for new engineers

For AI teams moving fast, small reductions in coordination overhead add up quickly.

5. Async support is useful for modern AI workflows

AI products often depend on I/O-heavy operations:

calling model providers
reading from vector databases
fetching documents
hitting external APIs
storing logs and traces
saving outputs for later review

FastAPI’s async-first design is helpful in this environment.

That does not mean every endpoint should be async just because it can be. But when you do have concurrent I/O work, FastAPI makes it easier to design around it cleanly.

This is especially relevant for:

streaming LLM responses
orchestration across multiple services
document processing pipelines
AI features with retrieval and re-ranking steps
external provider integrations

In many AI systems, the model is only one part of the full request path. A lot of the latency lives in the surrounding workflow. Good async support gives you more flexibility when optimizing those paths.

6. FastAPI works well with the broader Python AI ecosystem

This one is obvious, but still important.

Python remains the most practical language for a lot of AI work because the ecosystem is so strong. Whether you are working with data pipelines, model inference, evaluation tooling, or orchestration, the Python ecosystem is deep.

FastAPI fits naturally into that world.

You can use it alongside tools like:

NumPy and pandas
PyTorch and TensorFlow
scikit-learn
Hugging Face transformers
Celery or task queues
Redis
PostgreSQL
vector databases
observability tooling

That makes it easier to keep your AI application logic and service layer close together when it makes sense.

Sometimes that means faster iteration. Sometimes it means fewer translation layers between experimentation and production.

That can be a real advantage for smaller teams and fast-moving product groups.

7. It encourages better API design for AI features

One mistake I see in early AI projects is exposing model calls too directly.

The API becomes a thin wrapper over a prompt or inference call, and that usually creates problems later.

A better pattern is to design the API around the product behavior, not around the model itself.

For example, instead of creating an endpoint that feels like this:

send prompt
get raw model output

It is often better to create an endpoint that reflects the user-facing intent:

summarize this document
classify this message
extract structured fields
generate recommendations
score this content for relevance

FastAPI supports that style well because it makes endpoint design, schemas, and documentation easy to keep aligned.

That alignment matters.

It helps you build systems that are easier to test, easier to observe, and easier to evolve when the underlying model or provider changes.

8. Great for prototypes, but still solid for production

I think one reason FastAPI has become so popular is that it works across multiple stages of product maturity.

You can use it for:

a quick internal proof of concept
an MVP for a startup product
an internal ML platform endpoint
a production API that supports user-facing features

That flexibility is valuable because AI products often evolve fast.

The first version may be a basic prompt flow. The next version may add retrieval. Then evaluation. Then logging. Then guardrails. Then background processing. Then admin tools. Then multiple models.

FastAPI does not block that evolution.

It gives you a clean starting point while still supporting production patterns like:

dependency injection
authentication
middleware
typed request and response models
background tasks
health checks
metrics integration

That does not mean every FastAPI codebase is automatically well designed. But the framework makes good structure easier to maintain.

9. Where I would still be careful

I like FastAPI a lot, but I do not think it is magic.

There are still common mistakes teams make when using it for AI systems.

For example:

putting too much business logic directly in route handlers
mixing experimental model logic with stable production paths
skipping background jobs for slow workflows
exposing raw provider behavior without normalization
treating validation as optional because “the model can handle it”

The framework helps, but architecture still matters.

For production AI systems, I still want:

service boundaries
clean abstractions around providers
observability
strong error handling
versioned APIs when needed
separation between synchronous and asynchronous flows

FastAPI makes these patterns possible. It does not enforce them for you.

10. My practical rule of thumb

When I think about backend frameworks for AI products, I usually ask one question:

Will this help me build a system that is both flexible for experimentation and disciplined enough for production?

FastAPI is one of the few tools that consistently answers yes.

It gives you:

speed during development
strong schema support
clean Python integration
useful documentation
good async capabilities
enough structure to keep a growing service maintainable

That combination is hard to beat for a lot of AI applications.

Final thought

AI products do not just need smart models.

They need dependable systems around those models.

That means the backend layer has to do real work: validate inputs, shape outputs, protect downstream systems, expose clear contracts, and keep the product stable even when the model behavior is not fully predictable.

That is why FastAPI is such a strong fit.

It helps turn AI capabilities into usable product infrastructure.

And in my experience, that is where a lot of the real engineering value gets created.

Closing question for DEV readers:

If you are building AI products in Python, what matters most to you in a backend framework: speed of development, schema validation, async support, or ecosystem fit?

What “Production-Ready LLM Feature” Really Means

Jamie Gray — Fri, 06 Mar 2026 20:14:22 +0000

When people talk about LLM features, they usually talk about prompts, models, and demos.

But in real products, that is only the beginning.

A feature does not become production-ready because it generated a few impressive outputs during testing. It becomes production-ready when it can survive messy user input, system failures, inconsistent model behavior, latency spikes, and changing business expectations without breaking trust.

That gap between "it works in a demo" and "it works for real users" is where most of the engineering effort actually lives.

Over the last several years, I have worked across AWS, startups, and AI-focused teams building systems that had to be reliable in real environments. One of the biggest lessons I learned is that an LLM feature is never just a model integration. It is a product surface, a backend system, a reliability problem, and a user trust problem all at the same time.

In this post, I want to break down what I think production-ready actually means when you are shipping an LLM feature.

1. A good prompt is not a production strategy

A lot of early LLM work starts with a prompt that performs well on a few test cases.

That is a good start, but it is not enough.

Prompts are fragile. User input changes. Business rules change. Context formatting changes. Upstream data changes. The model provider may even update model behavior underneath you. Something that looked stable on day one can become noisy very quickly.

That is why I do not think about prompts as the product. I think about them as one layer inside a larger system.

A production system needs:

structured inputs
validation before the model call
post-processing after the response
fallback behavior when output quality drops
logging and evaluation around the full workflow

The model should not be the only thing holding the feature together.

2. Reliability matters more than cleverness

One of the easiest mistakes in AI product work is over-optimizing for impressive output instead of dependable behavior.

Users usually do not judge a feature by its best response. They judge it by whether it is consistently useful.

That changes how I design LLM features.

I care less about whether the model can occasionally produce something amazing, and more about whether the system can:

return a result within an acceptable time
avoid obviously wrong or unsafe output
recover gracefully from provider or network failures
handle empty or incomplete inputs
produce results in a format the rest of the product can use

In practice, this means adding engineering layers that are not very glamorous but matter a lot:

retries with limits
timeouts
schema validation
output guards
confidence checks
deterministic fallbacks
feature flags
monitoring and alerting

That is the part many demos skip.

3. Structured output is a huge unlock

I think one of the most important shifts in applied LLM engineering is moving from free-form output to constrained, structured output.

As soon as an LLM response needs to feed another part of the product, structure becomes critical.

If a feature needs to power a UI, trigger a workflow, populate a database field, or drive downstream logic, you cannot rely on vague paragraphs and hope everything works out.

You need predictable output.

That usually means defining a schema up front and forcing the system to validate against it.

A simple example in Python might look like this:

from pydantic import BaseModel, ValidationError

class SummaryResult(BaseModel):
    title: str
    summary: str
    risk_level: str


def parse_llm_output(raw: dict) -> SummaryResult | None:
    try:
        return SummaryResult(**raw)
    except ValidationError:
        return None

This is not fancy, but it changes everything.

Once output is structured, you can:

reject malformed responses
add fallback behavior
keep your UI stable
write cleaner tests
measure failure rates more clearly

For me, production readiness starts increasing the moment the system becomes easier to validate.

4. Evaluation should be continuous, not one-time

A lot of teams evaluate LLM quality once, feel good about the results, and then move on.

That is risky.

LLM systems drift in subtle ways. Sometimes the model changes. Sometimes your retrieval layer changes. Sometimes user behavior changes. Sometimes your own prompt edits introduce regressions.

You need an evaluation loop that continues after launch.

That does not have to be complicated at first. A practical starting point is:

define a small set of representative test cases
score outputs against the behaviors you care about
review failures manually
track quality over time after prompt or model changes

I like to treat evaluation as part of the product lifecycle, not just part of experimentation.

If you do not have a repeatable way to measure quality, you are mostly relying on intuition.

And intuition does not scale well.

5. Latency is part of the user experience

Sometimes teams focus so much on output quality that they forget speed is part of quality.

A response that is technically good but takes too long can still feel broken.

That is especially true in user-facing products where people expect immediate feedback.

When I think about production readiness, I always ask:

what is the acceptable latency budget?
what happens if the provider is slow?
can we stream partial output?
do we need caching for repeated requests?
should this be synchronous or asynchronous?

These are product questions as much as backend questions.

A great LLM feature is not just intelligent. It feels responsive and dependable.

6. Fallbacks are not a weakness

I actually think fallback logic is one of the clearest signs that a team understands production engineering.

Not every request needs to go through the full AI path.

Sometimes the best experience is:

a rules-based response for simple cases
a cached answer for repeated requests
a smaller model for speed-sensitive tasks
a human-readable error state when confidence is low
a safe default when validation fails

Fallbacks protect the user experience.

They also protect trust.

A feature that occasionally says, "I could not generate a reliable result for this input" is often better than one that confidently returns something weak or incorrect.

7. The real job is reducing uncertainty

This is the biggest mindset shift for me.

Building an LLM feature is not just about adding intelligence. It is about reducing uncertainty across the system.

You are dealing with a probabilistic component inside a product that users expect to behave predictably.

So the engineering work becomes:

narrowing input variation
constraining output shape
measuring quality
isolating failures
protecting the UI and downstream systems
creating graceful paths when the model underperforms

That is what turns AI from a cool experiment into a dependable feature.

A simple architecture I like

When I build or review LLM-backed systems, I usually want the flow to look something like this:

User input enters the API
Input is validated and normalized
Context is gathered from trusted sources
Prompt is assembled in a predictable format
Model response is generated
Output is validated against schema
Business rules are applied
Result is logged, scored, and returned
Failures are routed to a fallback path

Nothing in that flow is magical.

That is the point.

The more predictable the system design is, the easier it becomes to maintain quality as the product grows.

Final thought

A production-ready LLM feature is not defined by how exciting the demo looks.

It is defined by whether the feature is reliable, measurable, maintainable, and useful when real users start depending on it.

That usually means the most important work is not the prompt itself. It is the surrounding engineering discipline.

And honestly, that is what makes applied AI interesting to me.

The challenge is not just generating output. The challenge is building systems that people can trust.

Closing question for DEV readers:

What do you think is the biggest gap between an LLM demo and a real production feature: evaluation, reliability, latency, or product design?

Navigating Healthcare as a Founding Engineer

Jamie Gray — Thu, 05 Mar 2026 14:26:17 +0000

When I joined a healthcare startup as a founding engineer, I knew the technical challenges would be significant. What I underestimated was just how different the healthcare ecosystem is compared to traditional software environments.

Despite having years of experience building production systems, healthcare introduced an entirely new layer of complexity—regulatory requirements, sensitive patient data, clinical workflows, and the responsibility of building technology that directly impacts real people’s lives.

Early on, the learning curve was steep. Building in healthcare isn’t just about writing code; it requires understanding compliance frameworks, data governance, and how clinicians actually interact with technology in real-world environments. There were moments where early implementations had to be rethought, especially when navigating requirements like HIPAA compliance, data security, and system reliability.

Rather than viewing those challenges as obstacles, I treated them as an opportunity to deepen my understanding of the domain. I spent time working closely with healthcare professionals, studying regulatory requirements, and designing systems that could balance innovation with the strict reliability standards healthcare demands.

Over time, those efforts translated into a platform the team and our users could trust—one capable of securely processing sensitive data, supporting clinical workflows, and delivering meaningful insights.

What this experience reinforced for me is that building technology in healthcare requires more than strong engineering skills. It demands patience, domain understanding, and a commitment to building systems that are not only scalable but also responsible.

Looking back, the early uncertainty was simply part of the process. It pushed me to grow as an engineer and ultimately helped me build technology that contributes to better healthcare outcomes.

What Is the First Step When You're Stuck? A Practical Guide to Getting Started

Jamie Gray — Thu, 05 Mar 2026 14:14:22 +0000

Key Takeaways
When you're stuck, the solution isn't more thinking—it's taking action. Here are the essential insights for breaking through paralysis and building momentum:

• Acknowledge you're stuck first: Recognition without judgment is the crucial starting point in any decision-making process.

• Start with the tiniest possible action: Set a 5-10 minute timer and begin—momentum builds naturally once you're in motion.

• Shift from "I don't know" to "I can learn": This simple reframe opens possibilities instead of creating dead ends.

• Track behaviors, not just results: People who monitor progress are 70% more likely to achieve their goals.

• Use your past wins as evidence: You already have a track record of figuring things out—resourcefulness is a skill you've already proven.

The key insight? Clarity emerges through action, not analysis. Stop waiting for perfect conditions and take one small step today. Your future self will thank you for starting now rather than waiting for the "right" moment that may never come.
When you're stuck and wondering what is the first step to take, the hardest part isn't figuring out the answer. It's convincing yourself to move at all. Whether you're facing a big decision, starting a new habit, or working through a problem, that initial moment of action feels impossibly heavy.

Here's what I've learned: clarity doesn't come from more thinking. The only way to get clarity on something is if you get into action. You don't need perfect conditions or complete information. You need a practical system for identifying what's the first step in the decision-making process and a mindset that values progress over perfection.

In this guide, we'll walk through why starting feels so difficult, how to identify your true first step, and actionable strategies to build momentum even when you don't have all the answers.

A. Why Taking the First Step Feels So Hard

The Psychology of Starting vs. Continuing
Starting something demands a cognitive shift that continuing doesn't require. When I begin a new task, my brain must change states, make decisions, and tolerate uncertainty. Before any progress occurs, effort has already been spent.

Ambiguity sits at the heart of this friction. At the beginning, the path forward remains unclear. Where do I start? What does success look like? How long will this take? Each unanswered question activates additional mental processing. The brain prefers clarity because it reduces cognitive load. Continuing a task, by the same token, benefits from real-time feedback. Abstraction collapses into something concrete, and momentum replaces hesitation.

Emotionally, beginnings create vulnerable spaces. Before I start, outcomes stay theoretical. Once I begin, they become measurable. This shift triggers subtle threat responses in the nervous system, which doesn't sharply differentiate between social and physical risks. Evaluation, potential failure, and uncertainty all register as dangers.

Common Reasons People Get Stuck
Procrastination reflects struggles with self-control rather than poor time management. For habitual procrastinators, who represent approximately 20 percent of the population, "I don't feel like it" takes precedence over goals and responsibilities.

The process involves self-deception. At some level, I'm aware of my actions and their consequences, but changing habits requires greater effort than completing the task itself. Procrastinators often lean toward perfectionism. It becomes psychologically more acceptable to never tackle a job than face the possibility of not doing it well.

Fear of failure drives much of this avoidance. Some people worry so intensely about others' judgments that they risk their futures to avoid evaluation. Others convince themselves they perform better under pressure, though research shows this generally isn't the case.

The Cost of Waiting for the 'Perfect' Time
Perfectionism disguises itself as a noble pursuit, but it only holds us back. What stops us is fear cloaked in the idea of perfectionism because it gives a seemingly logical reason not to ship the work.

Nothing is ever completely perfect. If I wait until something reaches that impossible standard before putting it into the world, I'll never do it. The quest for perfection keeps us paralyzed. Instead of taking action, we remain frozen by the fear of making mistakes or not being good enough.

As a result, waiting for the right time becomes dangerous. We get so focused on finding the perfect moment that we never actually start. The regret of waiting too long stings, especially when we know we've missed opportunities.

B. What Is the First Step in the Decision-Making Process

Acknowledge That You're Stuck
The first step in the decision-making process is recognizing you need to make a decision. This sounds obvious, but most of us skip right past it. We stay busy, convince ourselves things aren't that bad, or wait for circumstances to magically improve.

Acknowledging the problem doesn't mean dwelling in it. Ignoring what needs attention invites denial, and denial only allows issues to grow. I need to note the problem exists, accept it without judgment, and shift my focus toward finding solutions. Once identified, I can start looking for answers rather than staring at the obstacle.

Identify What You Actually Want
Knowing what I want is the first and most important step in creating a better future. When I ask myself what I want my life to look like, the answer might not come easily. Most people have a clearer idea of what they don't want.

I can flip that around. If I don't want to work in a job where no one appreciates me, I want a job that stimulates me intellectually and allows me to be creative. The more specific I get, the better. Deciding what I want now doesn't lock me in forever. As I learn more, I can adjust course. That's not giving up, that's making informed decisions with new data.

Separate the Real Problem from the Excuse
Excuses avoid personal responsibility. Explanations help me grow. An excuse shifts blame to something outside my control, keeping me stuck. An explanation builds self-awareness and reveals how I arrived at this point.

When I catch myself saying "I don't have time" or "I'm not ready," I need to ask whether that's a reason or an excuse designed to remove my accountability. What looks like a problem often masks something deeper. By approaching the situation differently and changing my perspective, I can identify what's actually blocking progress.

C. Adopting a Resourcefulness Mindset to Move Forward

You Already Have a Track Record of Figuring Things Out
Resourcefulness isn't something you need to develop from scratch. You already possess this skill. Think back to challenges you've faced in your life. How did you handle them? What clever solutions did you find? Even if these moments happened years ago, they prove you know how to figure things out.

The great news is that resourcefulness combines creativity, adaptability, and forward thinking to find practical solutions, especially when facing limitations or uncertainty. You've already demonstrated this capacity. Every challenge you overcame started as something you hadn't solved before.

Shifting from 'I Don't Know' to 'I Can Learn'
Fixed mindset thinking says "I can't do this." Growth mindset thinking adds one powerful word: "yet". This small shift opens a door of possibility instead of a dead end. When your inner voice says "I'm not good at this," change it to "I haven't learned it yet".

This reframe isn't about ignoring reality. It's about choosing a more useful interpretation that keeps you in motion instead of stuck. The belief in your capacity to improve actually changes how your neural pathways form during learning.

Building Confidence Through Past Wins
Confidence builds through evidence. When you reflect on what you've accomplished, you create your own archive of competence and resilience. Neuroscience shows that acknowledging achievements triggers dopamine release, which creates a positive feedback loop of motivation and self-belief.

Regularly remind yourself of previous obstacles you navigated. This helps turn isolated successes into a pattern of competence your confidence can stand on.

Reframing Uncertainty as Opportunity
Uncertainty and possibility are two sides of the same coin. History shows that some of the greatest breakthroughs emerge during uncertain times. Instead of seeing change as a threat, ask yourself what opportunities it presents. This mindset shift removes ambiguity and helps you move past uncertainty.

D. Practical Strategies for Taking Your First Step

Break the Goal into Smaller Segments
Breaking down big goals into more manageable chunks had a meaningful and sustained impact, with people who focused on smaller subgoals volunteering 7 to 8 percent more than peers who simply aimed for the big goal. The more flexible your framing, the more durable the benefits over time. Instead of "write the report," try "open the document and type the title".

Start with the Smallest Possible Action
A tiny step lowers the barrier to entry significantly. Set a timer for just 5 to 10 minutes and work until it rings. Nine times out of ten, you'll be in flow and decide to continue. If you struggle to get going, you're not lazy; you just need the right tools to build momentum.

Create a Loose Plan Based on What You Know
Action plans transform high-level objectives into concrete, assignable steps so you know exactly what to do and when. Explicitly identify tasks needed to complete milestones, then allocate resources and prioritize by importance and sequence. Keep your plan visible and revise it as circumstances change.

Get Into Action Before You Have All the Answers
Rather than wait on information you don't have, gather the contextual information you already possess to narrow your scope of options. Layer in what you know about your skills, resources, and constraints. When you commit early, stakes stay lower and you gain freedom to course correct.

Use the 'Next Obvious Step' Method
If you always have an obvious next task to return to, you keep momentum and work through your task list. Having clear priorities is half the battle. Keeping these visible ensures you spend less time working out what to do next and more time actually working.

Track Your Progress to Build Momentum
People who track their progress are significantly more likely to achieve their goals, with regular monitoring increasing goal achievement rates by up to 70 percent. Tracking reveals patterns and makes cumulative effort visible and motivating. Focus on tracking your behaviors rather than just results, because you can control your choices and actions.

Conclusion
Getting unstuck doesn't require perfect conditions or complete information. As I have noted, clarity comes from action, not endless planning. The strategies we've covered give you a practical system for identifying your first step and building momentum from there. Start small, track what works, and adjust as you learn. Waiting for the perfect moment keeps you paralyzed. Take one tiny action today, and you'll be closer to your goal than you were yesterday.

The Journey From Uncertainty to AI Innovation

Jamie Gray — Thu, 05 Mar 2026 13:57:28 +0000

When I first started my career, I knew I was fascinated by technology—but I wasn’t entirely sure where to begin.

About ten years ago, I was standing at a crossroads. I wanted to become a software developer, but like many people early in their careers, I didn’t yet know what path would take me there. That changed when I landed an internship at Amazon Web Services (AWS), one of the world’s leading cloud platforms.

My time at AWS was both challenging and incredibly motivating. I was suddenly surrounded by talented engineers working on systems at a massive scale. Every day pushed me to learn faster—about software development, distributed systems, and how high-performing teams collaborate to build reliable technology.

What started as an internship quickly became a defining experience for me. I wasn’t just learning theory; I was contributing to real systems used around the world. It gave me a deeper understanding of how large technology platforms operate behind the scenes and what it takes to build production-grade infrastructure.

After that experience, I felt far more confident stepping into the broader tech industry. Over the years, I continued refining my skills and eventually found myself drawn toward startups—environments where things move quickly and where building something meaningful from the ground up is part of everyday work.

More recently, my focus has shifted toward applied AI. I’ve worked on generative AI systems, machine learning platforms, and data-driven products that bring advanced technology into real-world use. What excites me most is turning complex AI systems into practical tools that people can actually use.

Today, with more than a decade of experience across large tech companies and fast-moving startups, I still look back at my early days at AWS as a foundational moment in my career.

I started with uncertainty, like many engineers do. But that experience helped me build the confidence, skills, and curiosity that continue to drive my work today.

And the journey is still just getting started. 🚀