Forem: Jose Crespo, PhD

The Two Programming Styles of AI — and Why Everyone Uses the Wrong One

Jose Crespo, PhD — Wed, 26 Nov 2025 11:04:37 +0000

AI keeps crashing against the same walls

Yep, everybody, even Tesla, is using the old damn math from 2 centuries ago, and hence it’s not surprising to watch many of scenes like this all over YouTube when you rely on AI driving your Tesla:

Apparently, not even Tesla - with its 1.4 Trillion valuation and army of PhDs - knows about this math. Or maybe they do, and just enjoy watching their cars perform interpretive dance routines at 60 mph.

Either way, here’s the greatest hits compilation you’ve seen all over YouTube:

The Tesla Self-Driving Blooper Reel:

🎬 Phantom Braking - The car slams the brakes for a shadow. Because apparently, shadows are the #1 threat to highway safety in the 21st century.

🎬 The Surprise Party Turn - Takes curves at full speed, then goes “OH SHIT A CURVE!” and throws a mini-chicane out of nowhere, comedy for everyone, except your neck..

🎬 The Seizure Shuffle - Steering adjustments so jerky you’d think the car is having an existential crisis. Left, right, left, right.. it’s not driving, it’s vibrating down the highway.

🎬 The “Why Did It Do That?” - Does something so inexplicable that even the AI researchers watching the logs just shrug and mutter “gradient descent, probably.”

If this post is sparking your curiosity, you may enjoy exploring deeper analyses, geometric AI concepts, and ongoing research on my personal site:

👉 https://josecrespo.substack.com

The Fix That Nobody’s Using

Tesla could solve this - easily - by using second derivatives (Hessian-vector products, or HVP for the cool kids).

So could Google, Meta, OpenAI, and pretty much every company with an “AI Strategy” PowerPoint deck.

But they’re not. See the table below - notice a pattern?

Wait — These Are Different Problems, Right?

Not exactly. They are different symptoms, but the same disease.

They’re all using math that can answer “Which way should I go?”

but not “How sharply is this about to change?”

It’s like asking a GPS for directions but never checking if there’s a cliff ahead.

The Root Cause: Your Great-great-grandfather’s Calculus

As said, in the case of Tesla what is happening is that their cars are reacting to what’s happening right now, not anticipating what’s about to happen.

It’s like playing chess by only looking at the current board position - no planning, no strategy, just “I see a piece, I move a piece.”

Chess players call this “beginner level.” Tesla calls it “Full Self-Driving.”

Ready for the diagnosis? Tesla engineers, like everyone else in Silicon Valley, are still using 19th-century limit-based calculus — the math equivalent of trying to stream Netflix on a telegraph machine.

Meanwhile, the solution has been sitting on the shelf for 60 years: dual/jet numbers.

Nobody thought to check the manual. Seriously, who bothers with that “wacko, exotic math” they don’t teach in university CS programs?s?

And yet, these hyperreal-related algebras (duals and jets) make second derivatives (HVP) a computationally trivial operation through the elegant composition of two first-order operators (JVP ∘ VJP).

Hold Up — Are You Telling Me…

that the “gold-standard” h-limit calculus makes it a slog, while duals/jets make it trivial.…that what’s computationally intractable with the traditional h-limit calculus so many Ivy-League courses treat as the gold standard is trivial with dual/jet numbers, that can fix most of those damn curve-related problems in our current AI?

Yes. Exactly that.

And it gets worse.

The Hyperreal Revolution: Your Calculus Professor Never Told You This

The calculus you learned in college — the one that got you through differential equations, optimization theory, and machine learning courses — isn’t wrong. It’s just incomplete.

It’s like learning arithmetic but never being taught that multiplication is just repeated addition. You can still do math, but you’re doing it the hard way.

Here’s the specific problem:

Traditional calculus (the h-limit approach):

f'(x) = lim[h→0] (f(x+h) - f(x)) / h

This defines derivatives as limits — which means:

✅ Mathematically rigorous
✅ Great for proving theorems
❌ Computationally nightmarish for anything beyond first derivatives

f'(x+h) = lim[h'→0] (f(x+h+h') - f(x+h)) / h'

But f'(x+h) itself requires computing

f'(x+h) = lim[h'→ 0] (f(x+h+h') - f(x+h)) / h'

So, summing up: either you end up with nested limits and two step sizes (h,h′) that interact unstably, or you resort to higher-order stencils that are exquisitely sensitive to step size and noise. In both cases you lose derivative structure, so two first-derivative passes (JVP → VJP) don’t compose into a true second derivative - you’re rebuilding guesses instead of carrying derivatives.

For a third derivative? Three nested limits or or use even higher-order stencils.

For the k-th derivative: either nest k layers or use wider stencils - noise blows up as O(h^-k), truncation depends on stencil order, and you still lose derivative structure, so JVP→VJP won’t compose into HVP in an FD pipeline.

So your self-driving car keeps crashing against sun-set lit walls.

And for GPT-5’s approximately 1.8 trillion parameters? Computational impossibility.

Sharp Readers Will Notice:

“Hold on, if we know the function f, can’t we just compute f’ and f’’ analytically? Why do we need any of this limit or dual number stuff?”

Great question!

Here’s why that doesn’t work for neural networks:

The Problem: Neural Networks Are Black Boxes
When you write a simple function, you can compute derivatives analytically:

# Simple case - analytic derivatives work fine
f(x) = x² + 3x + 5
f'(x) = 2x + 3      # Easy to derive by hand
f''(x) = 2          # Even easier

But a neural network with 1.8 trillion parameters looks like this:

f(x) = σ(W₁₇₅·σ(W₁₇₄·σ(...σ(W₂·σ(W₁·x))...)))


Where:
- Each `W` is a matrix with billions of parameters
- Each `σ` is a nonlinear activation function
- There are hundreds of layers (GPT-style)
- The composition is **dynamically computed** during runtime

You literally cannot write down the analytic form of f'(x) because:
1. The function changes every time you update parameters (every training step)
2. It's too large to express symbolically
3. It contains billions of nested compositions


### Why Traditional Calculus Fails Here

**The h-limit formula:**

f''(x) = lim[h→0] (f'(x+h) - f'(x)) / h


**Requires you to evaluate 

f'(x+h)`**, which means:
f'(x+h) = lim[h'→0] (f(x+h+h') - f(x+h)) / h'

And here’s the trap:

You can’t compute f' analytically (the function is too complex)
So you approximate it using finite differences (the h-limit)
Now you need f'(x+h) for the second derivative
So you approximate that using another finite difference (with step size h’)

Result: You’re approximating an approximation — errors compound catastrophically.

The skeptical reader might continue objecting:

”But can’t we use something like SymPy or Mathematica to compute derivatives symbolically?”

In theory, yes. In practice, we face a similar problem.

For a 1.8 trillion parameter model!:

The symbolic expression for f' would be larger than the model itself 👀
Computing it would take years
Storing it would require more memory than exists
Simplifying it would be computationally intractable

Example: Even for a tiny 3-layer network with 1000 neurons per layer:

Symbolic f' lands in the millions of terms.
Symbolic f'' jumps to the billions of terms.
Growth is combinatorial with depth/width; common-subexpression tricks don’t save you enough.

For hundred of layers? Forget it.

clear now?

Let’s Bring Back Our Hyperreals Flavor for AI Computing and let’s see what happens when hyperreals face similar scenarios:

What Dual/Jet Numbers Do Differently: Automatic Differentiation
Dual numbers don’t use limits at all. Instead, they:

- Encode the differentiation rules in the arithmetic
- Evaluate f with special numbers that carry derivative info
- Derivatives emerge through rule-following arithmetic

Jets generalize this. k-jets carry truncated Taylor lanes up to order k (nilpotent ε^k+1=0), so higher-order derivatives fall out in one pass.

Here’s the key: The calculus rules (power rule, chain rule, etc.) are built into the jet arithmetic operations, not applied symbolically! So you get all the advantages of analytical solution without using them!

The Three Fundamental Differences
Calculus with Symbolic Rule Application ( impractical at modern AI scale)
Process:

Write down the function: f(x) = x³
Recall the power rule: d/dx[xⁿ] = n·xⁿ⁻¹
Apply it symbolically: f’(x) = 3x²
Store both formulas separately

For neural networks: Must build the entire derivative expression — exponential memory explosion.

Traditional h-Limit Calculus: Numerical Approximation:

Process:

Choose a step size h (guesswork)
Evaluate: (f(x+h) — f(x))/h
Get an approximation with error
Problems:

Not exact (always has truncation or roundoff error)
Can’t compose cleanly
Breaks down at higher orders

Dual/Jet Numbers Algebra: Evaluation with Augmented Arithmetic(practical at modern AI scale)

Process:

Extend the number system with ε where ε² = 0
Evaluate f at (x + ε) using this arithmetic
Derivatives appear as ε-coefficients automatically

For neural networks: No expression built — just evaluate once with special numbers. Linear memory scaling.

How It Actually Works: The Binomial Magic with dual numbers

Let’s see as a toy example how the power rule emerge without applying any calculus:

Example: compute derivative of f(x) = x³

Step 1: Evaluate at augmented input

Example: compute derivative of f(x) = x³

Step 1: Evaluate at augmented input

f(x + ε) = (x + ε)³

Step 2: Expand using binomial theorem (combinatorics, not calculus)

(x + ε)³ = x³ + 3x²ε + 3xε² + ε³

Step 3: Apply nilpotent algebra (ε² = 0)

= x³ + 3x²ε + 0 + 0
= x³ + 3x²ε

Step 4: Read the dual number

x³ + 3x²ε = (x³) + ε·(3x²)
            ↑         ↑
         value   derivative

The derivative f’(x) = 3x² emerged through:

Binomial expansion (algebra)
Nilpotent simplification (ε² = 0)
Coefficient reading

NOT through:

❌ Power rule application
❌ h-limit formula
❌ Symbolic differentiation

> You don’t apply the power rule — you let binomial expansion reveal it.

Why This Scales When Symbolic Differentiation Doesn’t

Symbolic Differentiation (Analytical):
With AI working with neural networkd you must build expressions:

Layer 1 derivative: thousands terms
Layer 2 derivative: millions terms (combinatorial explosion)
Hundreds of layers: expression size grows exponentially in depth/width; even with common-subexpression elimination it becomes intractable to construct, store, or simplify. Memory required: More than all atoms in the universe 👀

Dual Number Evaluation:
Never builds expressions:

Each instrumented tensor stored value + ε·derivative
Memory: 2× base model (for k=1)
Or 3× base model with Jets (for k=2 with second derivative)

For GPT-5 (1.8T parameters):
k=1: ~14.4 TB → 18.0 TB (totally practical)
k=2: ~14.4 TB → 21.6 TB (fits on ~34 H100 nodes)

BUT WAIT — YOU’RE FLYING FIRST CLASS IN AI MATH

And there’s still more.
The algebra of dual/jet numbers lets you use composition of functions (yup, if you want to do yourself a favor and write real AI that works, learn category theory now!).

Here’s your genius move:

With composition of functions, we can get second derivatives for the price of a first derivative!!

Woah. 🤯

How? Just by using composition of functions — otherwise structurally impossible with limit-based calculus.

In Plain English: Why Composition Fails With h-Limits

Traditional calculus can’t do JVP∘VJP = HVP because:
JVP via finite differences gives you a number (an approximation of f’(x)·v)
That number has no derivative structure for VJP to differentiate
You must start over with a new finite-difference approximation
The operations don’t chain — each one discards the structure the next one needs

Dual numbers CAN do JVP∘VJP = HVP because:

JVP with duals gives you a dual number (f(x), f'(x)·v)
That dual number carries derivative structure in its ε-coefficient
VJP can differentiate it directly by treating it as input
The operations chain naturally — each preserves the structure the next needs

Dual numbers are algebraically closed under composition.
The Practical Consequence
what the new paradigm can compute that the old one can’t:

Why This Is The Key To Fixing AI
Current AI (k=1 only):

Can answer: “Which direction should I go?”
Cannot answer: “How sharply is this direction changing?”
Result: Reactive, not anticipatory

With composition (JVP∘VJP):

Get second derivatives for 2× the cost of first derivatives
Can anticipate curves, detect trajectory changes
Result: one of many examples, Tesla stops phantom braking; AI stops hallucinating.

With explicit k=3 jets:

Get third derivatives for 3× the cost
Can verify topological consistency (winding numbers)
Result: Mathematically certified AI outputs

The Functors + Composition Advantage

And why Hyperreal Algebra Matters:

Without it (finite differences):

Each derivative order requires starting from scratch
Errors accumulate with each nesting
No compositional structure to exploit

With it (dual/jet numbers):

Higher-order derivatives = compose lower-order operations
Exact (within floating-point)
Automatic (chain rule built into ε-arithmetic)
This is why:

✅ Dual/Jet numbers scale to hundred of layers (linear memory)

✅ Composition works (JVP∘VJP = HVP automatically)

✅ Higher orders accessible with Jet numbers ( k=3, k=4 feasible)

And why:

❌ Symbolic differentiation explodes (exponential expressions)
❌ Finite differences can’t compose (no functoriality)
❌ h-limit methods break at higher orders (error compounds)

SUMMING UP

The entire AI industry is stuck at first-order optimization because:

They learned calculus as h-limits (doesn’t scale)
They implement derivatives as finite differences (doesn’t compose)
They never learned about Group Theory and Hyperreal Numbers (not in CS curricula)

Meanwhile:

Dual numbers make derivatives algebraic objects (not approximations)
Jets make higher orders linear in cost (not exponential)
Functorial composition makes second derivatives cheap (JVP∘VJP)

The math to fix Tesla’s phantom braking, OpenAI’s hallucinations, and Meta’s moderation chaos has been sitting in textbooks since 1960s.

Waiting for someone to connect the dots among: the binomial theorem (~400 years old), nilpotent algebra (~150 years old), and functorial composition + hyperreals (~60 years old).

To the biggest unsolved problems in AI.

Now you know what Silicon Valley doesn’t and see what they cannot.

NOTE: In this article, “traditional calculus” means the finite-difference (h-limit) implementation used in practice — pick an h, approximate, repeat — not analytic/symbolic derivatives.

If this post has sparked your curiosity, you may enjoy exploring deeper analyses, geometric AI concepts, and ongoing research on my personal site:

👉 https://josecrespo.substack.com

LLMs Are Dying - The New AI Is Killing Them

Jose Crespo, PhD — Sat, 15 Nov 2025 09:07:15 +0000

LMs are already museum pieces

Yes, ChatGPT, Claude, Gemini, all of them. Brilliant fossils of a linguistic age that’s already ending. They’re decomposing in public, billions are still being spent to polish their coffins: bigger models, longer contexts, more hallucinations per watt.

The “predict-the-next-word” LLM era is over.
The new killer on the stage isn’t language, it’s world modeling.
An AI that understands reality like a conceptual puzzle, where you don’t need every piece to see the whole picture.

We just haven’t dragged the old LLM stars off the stage yet.
If you’re still building your career around prompting chatbots, you’re preparing for a world that’s already gone.

We don’t need machines that shovel tokens like brainless parrots. We need machines that carry objects through time, that bind cause to effect, that keep the scene consistent. “Stochastically parroted” was cute for 2023. It’s not intelligence. It’s probability cosplay.

And here’s what many of us have seen already before in the tech industry: if AI keeps stitching likelihood instead of reality, the industry that burned 1 trillion dollars to make chatty cockatoos will burst. Soon.

Is JEPA the New AI Hope?

In a galaxy not so far from Menlo Park, a new acronym rises from the ashes of the chatbot empire: JEPA!

The promised one. The architecture said to restore balance to the Force after the LLM wars.

Before we let Meta’s marketing department baptize it as the next Messiah, let’s decode the acronym.
JEPA stands for Joint Embedding Predictive Architecture - four words that sound profound until you realize they describe a fairly old idea wearing a new coat of GPU varnish:

Joint, because it trains two halves of a scene — one visible, one masked — and forces them to agree in a shared latent space.
Embedding, Because instead of dealing with raw pixels or words like LLMs do, it operates in dense vector representations: the hidden space where modern AI stores meaning.
Predictive, because its only trick is to predict the missing chunk from the one it still sees.
Architecture, because every new AI concept needs an impressive noun at the end to look academic.
As Yann LeCun - the new high priest of the post-LLM cult - likes to put it, in plain English:

Intelligence isn’t language, nor is it predicting the next word. Intelligence is predicting what will happen in the world.

And that’s exactly what JEPA tries to do.
It learns to guess what’s missing in a world it only half perceives — not by generating text or pixels, but by aligning internal representations so that context can explain absence.

It doesn’t write; it completes.
It doesn’t imagine; it infers.

Too Good to Be True ?

But you’re probably wondering whether this is the real deal - or just another half-baked tech product with brighter LEDs and duller intelligence

Let’s cut through the grand talk for a second.
Meta presents JEPA like a revelation carved in silicon: the dawn of “true intelligence,” the end of chatbots, the start of world-model gods.

And at first glance, it really does look that way — until you
strip off the marketing halo, and you’ll see what’s actually there.
Not quite the miracle LeCun sells, but still something valuable: a step in the right direction, but still long way to go

For now, JEPA lives mostly in videos and physical-world scenarios, while the wording layer still leans heavily on the same stochastic parrot LLMs they claim to have surpassed (see chart 1 below).

Of course, Meta just prefers not to mention that part in the brochures. No surprise there.

So yes, In the blueprint JEPA looks clever - new modules, shiny arrows, and a fresh sense of purpose, but underneath, we’re still stirring the same pot.
The word-soup problem just got an upgrade to concept-soup.
Different flavor, same indigestion.

To be fair, JEPA isn’t pure marketing incense.
As the chart below shows — humorously stripped of Meta’s triumphal glow its context encoder, target encoder, and predictor form a neat little triad that genuinely breaks from the token tyranny of LLMs.

So what can you actually do with JEPA, you’re asking?
Well - you can show it half an image, and it predicts the missing part for you, not by painting pixels, but by reasoning in latent space.
That’s progress: perception over parroting.

And here’s the lineup:

I-JEPA recognized ImageNet objects with only a few labeled samples, impressive. But it still fails under fixed distractor noise and never touches language.

Then comes V-JEPA, trained on videos to learn motion intuition, what moves, when, and why. But not real physics: collisions, forces, contact still out of reach.

More into robotics? V-JEPA 2 guides robotic arms, predicting how objects behave before acting. But it still trails humans on physical benchmarks and, irony of ironies, needs an LLM to talk about what it sees.

So yes, progress, enough to declare LLM technology as a fossil in artificial life maintenance - but still flatland thinking dressed up as revelation.

And that’s the part Meta doesn’t want you to see.

Here’s the fatal secret hiding behind JEPA’s slick diagrams and LeCun’s confident declarations: they solved the linguistic problem by creating an architectural one that’s mathematically far worse.

They escaped the word-prison only to build a concept-prison with thicker walls.

Think about it: LLMs were bad because they treated reality as a linear sequence of tokens: a one-dimensional march through probability space. JEPA says “we’re smarter than that” and jumps into high-dimensional representation space, where concepts float as vectors in a 768-dimensional cloud.

Sounds impressive, right?

Wrong.

They just traded a bad neighborhood for a worse one: the kind where the laws of mathematics themselves guarantee failure.

The Fatal Mathematical Doom of the New AI Architectures: They Still See a Flat World
And now, get ready, dear reader, to learn what you won’t read anywhere else. How the mathematical poison is silently killing every AI architecture — from the LLM dinosaurs to the new kids who swear they’ll fix everything.

Stop. Breathe 😄 This is where the story gets dangerous.

But don’t worry, we have the antidote. And you can use it to your own advantage.

Before you go on, a small recommendation.
If you want to dive deeper - into the gritty, nerdy, beautiful mathematics that prove beyond doubt how the dual-number toroidal model cures hallucinations and the myopic doom haunting today’s AI architectures - I’ve left a few doors open for you:

→ [JEPA-AI: Core Technologies & Programming Stack]
→ The Mathematical Myopia of New AI Architectures
→ Mathematical Core Equations: AI JEPA’s Failures vs. AI-Toroidal Truth

Take your time there if you wish.
If not, stay with us.
We’re about to cross the event horizon.

Now, let’s go on.

Imagine you’re trying to organize your massive music library. You could throw all your songs into one giant folder and hope your computer can tell the difference between “Stairway to Heaven” and “Highway to Hell” based on… vibes? That’s basically what JEPA and the new AI architectures are doing.

Here’s the problem: these systems live in what mathematicians call “Euclidean space”, basically a flat, infinite spreadsheet where everything is a bunch of numbers floating around. Sounds reasonable, right? Wrong.

This is why you’ll find the same mathematical doom baked right into the so-called “next generation” AI systems: the ones sold as the antidote to LLM poison.
They promise salvation but inherit the same broken math.

— Welcome again to the Hall of AI Shame. Here they are. —

The Birthday Party Disaster
You know the birthday paradox? Put 23 random people in a room, and there’s a 50% chance two share a birthday. Now imagine your AI has to store a million concepts in this flat number-space. The math says: collision guaranteed. “White truck” ends up looking almost identical to “bright sky” because they landed in nearly the same spot in this giant number soup, and there’s how you got those self-driving Tesla killer cars.

It’s like trying to organize a library by randomly throwing books into a warehouse and hoping you can find them later by remembering their approximate coordinates… you can’t.

The Gradient Descent Hamster Wheel
Current AI architectures use something called “gradient descent” to find the minimum error given an error function, which is a fancy way of saying they stumble around in the dark, poking things with a stick, hoping to eventually find the exit.

The problem? They use fake infinitesimals with point-wise myopic vision. They can’t see the shape of the hill they’re trying to climb down, just one pebble at a time. It’s like trying to navigate San Francisco with a blindfold and a magnifying glass that only shows you one square inch of sidewalk.

But wait, it gets dumber: you have to pick your epsilon (the step size) before you start stumbling. Pick it too big? You’re that drunk guy at a party taking huge steps and crashing into walls. Too small? You’re inching forward like a paranoid snail, and you’ll die of old age before you get anywhere. Yup, this whole mess comes from 19th-century calculus and its epsilon–delta limit formalism.

But the craziest thing of all happens during training: AI runs billions of these tiptoeing optimization steps trying to minimize its loss function. Billions! Each one either jumps off the cliff like a buggy robot or advances like Windows loading at 1%. The computational waste is absolutely bonkers - all because this outdated framework, born in the 19th century, forces you to pick an epsilon value beforehand.

The second error caused by this outdated way of doing infinitesimal calculus is the compounding effect of tiny approximation errors. You start with something like 10^-8 and think, “eh, close enough to zero, right?” Wrong. Square it and you get 10^-16. Still. Not. Zero. After billions of iterations, these pretend infinitesimals pile up like compound interest from hell, spawning numerical explosions, instabilities, and rounding errors that eventually turn into full-blown AI hallucinations.

Yup, there is an easy solution: switch to a dual-number system that laughs at this entire clown show. No limits. No epsilon guessing games. No billion-step hamster wheel. When ε² = 0 by definition, not approximation, actual mathematical law: derivatives are exact, and topology just tells you where everything belongs. No stumbling required.

The Attention Apocalypse
Transformers (the tech behind ChatGPT, not the robots) use something called “attention” where every word looks at every other word. That’s N-squared complexity: which means if you double your text length, the computation goes up 4x.

With 1000 words? That’s a million comparisons. With 10000 words? 100 million comparisons. Your AI is basically reading a book by comparing every single word to every other word simultaneously. Exhausting and expensive.

How Our Toroidal Model Fixes the AI Flatland Doom
Instead of a flat spreadsheet, we use a donut (mathematically, a torus). Stay with me here.

On a donut, you can wrap string around it in different ways: around the hole, through the hole, or both. These “winding patterns” give every concept a unique address that cannot collide. It’s not probability, it’s topology. Different winding patterns are as different as a circle and a figure-8. They literally cannot become each other.

The Real Infinitesimals

We use dual numbers where ε² = 0 isn’t an approximation - it’s the definition. This means our layers are separated by actual infinitesimals, not fake ones. No numerical explosions. No gradient descent needed. The topology just… works.

Sparse by Design
Our attention mechanism only connects compatible winding patterns. Most connections are exactly zero - not “close to zero” but structurally impossible. This drops complexity from N-squared to linear. That 100 million comparisons? Now it’s 10,000.

The Bottom Line
JEPA and these new AI architectures are the right idea to replace our very primitive state towards truly AI. But still, like LLMs, are repeating the same error by trying to navigate a flat world with a bad compass and approximations.

The real leap won’t come from another tweak in parameters.
It will come from changing the space itself.

We need to abandon the Euclidean coffin that traps intelligence in two dimensions, and build in a topology where meaning can breathe.
In our toroidal model, concepts don’t collide or blur.They live in separate, protected neighborhoods: each one infinitesimally distinct, each one safe from the chaos of false merges.

Why Toroidal AI Is Not Being Built — Yet
Now, the sharp reader may ask:
“If this is so obvious, why hasn’t anyone built it yet?”

Fair question - and the answer itself proves the point.

Institutional Inertia
The entire AI establishment has sunk trillions into Euclidean architectures. Every framework, every GPU kernel, every optimization routine assumes a flat world. Replacing that geometry would mean rebuilding the cathedral from its foundations - and few engineers dare shake the pillars of their own temple.
The Workforce Barrier
A full generation of machine-learning engineers has been trained to think in gradients, not geometry. Retraining them to reason with curvature, continuity, and dual numbers is not a weekend tutorial - it’s a civilizational shift in mathematical literacy.
Patents and IP Locks
Big Tech doesn’t innovate; it defends its moat. Any true geometric paradigm shift would invalidate entire layers of existing intellectual property and licensing chains. The system isn’t optimized for truth - it’s optimized for control.
The Sunk-Cost Fallacy
From cloud infrastructure to AI chips, everything has been built for the flatland paradigm. Even when engineers know it’s broken, the machinery keeps running - because admitting it would collapse too many balance sheets and too many egos.

So yes - it hasn’t been built yet.
Not because it’s wrong.
But because it’s too right, too disruptive, and too costly for a world addicted to its own mistakes.

And that’s precisely why it will happen.
Because math doesn’t care who resists it.
It simply wins - always.

And soon, the new startups will notice the gap. You’ll watch Toroidal AI evolve exactly as every disruptive technology has before it:
First ignored - too threatening to a trillion dollars in sunk investments.
Then ridiculed “crackpot” accusations from people who don’t understand topology.
And finally, triumphantly accepted “Of course! We always knew Euclidean space was wrong.”

History doesn’t repeat itself.
It curves. 😁

Top 10 Essential References

On JEPA Architectures:
1.- LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures (September 2025)

2.- ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning (April 2025)

3.- Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud (February 2025)

On Transformer Attention Complexity:
4.- The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures (October 2024)

5.- FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning (October 2023, still widely cited in 2024–2025)

On Toroidal/Topological Neural Networks:
6.- Toroidal Topology of Population Activity in Grid Cells (January 2022, Nature — still foundational for 2024–25 work)

7.- Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry (June 2022)

On Dual Numbers & Automatic Differentiation:
8-. Dual Numbers for Arbitrary Order Automatic Differentiation (January 2025)

On LLM Hallucinations:
9.- Why Language Models Hallucinate (September 2025)

Bonus - Yann LeCun’s Vision:
10.- Navigation World Models (April 2025)

Why the Oldest, Simplest Algorithms Are Beating AI

Jose Crespo, PhD — Sat, 01 Nov 2025 11:12:45 +0000

Let’s put numbers to the AI overhype. Companies are burning more than $200 billion every year by choosing AI over simple, proven algorithms. That’s not an opinion. That’s math. See the chart below if you’re still rubbing your eyes in disbelief.

Chart 1: The numbers behind the $200B conservative burn. Independent studies point the same way: MIT reports 95% of GenAI implementations show no measurable P&L impact (Tom’s Hardware coverage), BCG finds 74% of companies struggle to achieve and scale value (BCG press release), HBR documents 2.5–14% profit drag from AI misuse (Harvard Business Review), and CFO surveys show 6–16% gross-margin hits (CFO Dive). Brookings adds the macro verdict: no measurable productivity gains to date (Brookings). Against an IDC baseline of ~$307B in 2025 AI spend, even conservative failure/no-value multipliers imply ≈$200B/year in burned spend that simpler algorithms could have avoided. Plot created by the author using Matplotlib library.

But here’s the question: Is AI the villain — or are CEOs burning money on systems they barely understand?

The truth is more brutal than either narrative. AI isn’t the disaster — it’s the excuse for the disaster.

Look at the chart below. This is your American Horror Story in three acts:

Act I: The Promise Solo AI vendors promise revolutionary decision-making. Fortune 500 CEOs, hypnotized by NVIDIA’s stock price and OpenAI’s demos, write checks their companies can’t cash.

Act II: The Bloodbath That red line? That’s $500 billion annually achieving -45% ROI (Return on Investment — how much you get back compared to what you spend). Not a typo. Negative forty-five percent. Companies are paying premium prices to make worse decisions slower. It’s like hiring McKinsey to tell you what you already know, except McKinsey occasionally delivers positive value.

Act III: The Coverup Nobody admits failure. The AI system becomes “strategic infrastructure.” The losses get buried in “digital transformation.” The vendors — Anthropic, OpenAI, the whole choir — keep singing about the future while cashing checks in the present.

Chart 2: The $2.2 Trillion Reality Check. This is what burning money looks like in three colors. The red line isn’t a projection — it’s IBM Watson’s $4 billion loss, Zillow’s $500M quarterly disaster, and hundreds of other AI failures achieving a median -45% ROI (Gartner 2023). Meanwhile, that green line soaring at 1200%? That’s Costco’s one-line markup rule, Southwest’s 47-year profit streak, and algorithms older than your grandparents still crushing it (Operations Research, Hillier & Lieberman). The blue hybrid line shows the only scenario where AI doesn’t destroy value — when it’s kept on a tight leash by old-school algorithms. Even then, 450% is less than half what simple algorithms achieve alone (MIT–BCG Study 2023). Plot created by the author using Matplotlib library.

Remember the California Gold Rush from the history books? You know who got rich in 1849. Not the miners. The pickaxe sellers.

Today’s pickaxe merchants sell GPU clusters and API tokens to CEOs panning for digital gold. They whisper sweet promises: “AI will revolutionize your business.” What they don’t mention is that graph showing AI never achieves positive ROI for operational decisions.

Instead they boast about having more billion parameters than the competition’s AI monster. Of course theirs is bigger, more insanely convoluted — an expanding minefield of compounding errors, Types I through IV, each cascade multiplying the disaster.

Meanwhile, see that green line in the chart above — simple algorithms from 1913 — keeps printing money at 1200% ROI. But nobody’s selling conferences about the Economic Order Quantity formula. There’s no TED talk about Little’s Law and the other older algorithms.

Yep, unfortunately you can’t raise billions in funding rounds by selling an algorithm that fits on a napkin from that expensive restaurant where you’re shaking hands with investors interested in a bigger AI beast.🤔

The tragedy is more nuanced than AI sellers admit — and more complex than CEOs with their broken quarterly-profit-maximization algorithms want to hear.

Yes, there’s a hybrid approach where AI becomes just one component alongside those ancient algorithms when complexity grows. But that requires two things the AI revolution explicitly avoids:

First, you need to understand your business from the molecular level to the stratosphere — every intricacy of your model, what you’re actually selling versus what you think you’re selling, where costs hide, where value emerges. These insights come from human minds. AI won’t solve what you can’t articulate, despite the sales pitch.

Second, you need to hire and respect professionals who combine programming excellence with mathematical rigor and the rare ability to translate both into business value. But your AI recruiting system will never surface these people. It’s optimized for commodities, not talent — screening for keywords, not capability.

Here’s the brutal reality with a massively failed AI that most companies are facing right now:
Look at the carnage in the table below (Chart 3) . This isn’t speculation — it’s documented history:

Costco: Running on a decision that fits on an index card. Investment: approximately zero. Return: Industry dominance for four decades.
IBM Watson Health: The crown jewel of AI healthcare. Investment: $4 billion. Return: Sold for parts at a 95% loss.
Southwest Airlines: One simple rule about load factors. Still the only major US airline to avoid bankruptcy.
Zillow iBuying: Cutting-edge AI pricing models. Burned $500 million in one quarter before shutting down entirely.

Chart 3: The companies destroying their competition aren’t using AI — they’re using formulas older than your grandparents. That $4 billion IBM lost? Costco made 100× that with a rule that fits on a Post-it. Look at the ROI column. Now look again. That’s not a typo. For references see Chart 2 above. Table created by the author using Plotly.

The only success story with AI? Amazon’s hybrid approach — yup, not pure AI, it’s old-school EOQ with some ML sprinkled on top when absolutely necessary. Even then, the ROI is a fraction of what simple algorithms achieve alone.

But let’s be honest here, let’s not throw the baby out with the bathwater: when your business complexity genuinely grows — meaning the actual number of variables you must account for — you need more flexibility. That might be AI/ML, or more likely, it’s just good programmers who understand your business and can architect the math you actually need.

Here’s what 20 years in the trenches taught most of us: around 90% of “complex” business problems can be solved by a competent programmer with decent math skills and simple algorithms, properly wired together.

Occam’s Razor still cuts: The simplest solution that works is usually right.

But simplicity doesn’t sell conference tickets. Simplicity doesn’t raise Series B funding. Simplicity doesn’t get you on the cover of Wired. 😂

So instead, companies take the lazy, prestigious route: throw their problems at whatever AI the vendors are pushing this quarter. It’s like hiring a top surgeon to apply a Band-Aid — expensive, unnecessary, and probably going to make things worse.

The real tragedy? That competent programmer with the simple solution costs $150K/year. The AI system that fails costs $15M. But the programmer doesn’t come with a sales team, a PowerPoint deck, or a promise to “transform your digital future.”

Guess which one the board approves.

The Seven Heroes of Business History

Our position shouldn’t surprise you by now, dear reader. We’re betting on what’s worked brilliantly for decades through every crisis, disaster, and market shift while delivering massive profits: the algorithm way.

The formula is simple: Hire, value, and respect your best asset — skilled programmers with solid math foundations who convert business problems into algorithmic solutions. Add modern cloud infrastructure and, when genuinely appropriate (maybe 5–10% of cases), deploy the hybrid Algorithm + AI/ML approach.

But let’s be clear: forget the magic they’re selling — the fantasy of throwing data at AI and getting perfect solutions every time. That’s not happening.

Yes, LLMs are useful as search tools, data navigators, even coding assistants. But they’re far from autonomous. In non-trivial cases, you need spend lot of time cross-referencing to catch their false positives and, worse, the errors nobody notices — Type III and IV errors that spaghettify and collide concepts into dangerous nonsense.

Meet the Magnificent Seven
More than 500 years of combined service. Trillions in value created. Under 1K lines of code total.

Look at this table (Chart 4). These seven algorithms have never failed. For decades — in some cases over a century — they’ve consistently delivered business value to most of mankind. The epitome of simplicity.

Chart 4: The Magnificent 7. Still blooming today, hidden inside most business software and powering modern networks. They quietly run the world’s critical domains — inventory, finance, perishables, risk, projects, operations, information — and they keep compounding value. Think about it: 468 years of proven success, trillions of dollars created, and all it takes is less than 1K lines of code. A handful of math functions generating returns that outshine most of today’s AI hype. This isn’t the full list — just a curated glimpse at the most profitable algorithms ever invented. Reference: Algorithms and ROI. Plot created by the author using Matplotlib library. Table created by the author using Plotly.

Each one is a specialist with embarrassingly simple code:

EOQ (1913): The inventory gunslinger
Square root of (2 × demand × order cost / holding cost). One line. Tells you exactly how much to order.
DuPont (1920): The financial sharpshooter
ROE = Profit Margin × Asset Turnover × Leverage. Three numbers multiplied. Instant diagnosis of what’s broken.
Newsvendor (1950s): The perishables ranger
Order up to the point where the chance of selling out matches the cost of running short vs. overstocking. A single threshold for “how much to make.”
Kelly (1956): The risk-sizing marshal
Bet a fraction of your bankroll based on edge and odds — when your advantage is bigger, size up; when it’s small, size down. Never overbet.
CPM (1957): The project management tracker
Find the longest path through your network. That’s your deadline. Everything else can slip; this can’t.
Little’s Law (1961): The operations enforcer
Items in system = arrival rate × time in system. It’s physics, not statistics. Works for everything.
PageRank (1998): The young gun who built an empire
A page’s importance is the sum of votes from important pages, each vote split by their outlinks, with a small random-jump factor to keep it stable. Keep iterating. Built Google.

Total code for all seven: comfortably <1K lines. The math that runs the world fits in a single GitHub gist.

Is a programming career still viable? Looking at this table — absolutely. Learning algorithms and programming built the past and will build the future, despite this transient AI-hypnotized present.

The AI hype has its place — pattern finding, searching, identifying. But it needs years of improvement to correct its statistical errors from Type I through IV. Even then, algorithms in the hands of competent developers remain irreplaceable and far from commodities.

The bottom line: While everyone chases AI complexity, these seven simple formulas keep generating thousands of percent returns. They don’t need updates. They don’t hallucinate. They just work.

That’s not outdated. That’s immortal. Let’s keep programming with algorithms while the true researchers work on fixing AI errors with better math.