Forem: DJ Leamen

GPT-5 Is Here: OpenAI's Biggest Leap Toward AGI Yet

DJ Leamen — Thu, 07 Aug 2025 19:19:40 +0000

OpenAI has officially unveiled GPT-5, marking what CEO Sam Altman calls a “major upgrade” and “significant step toward AGI.” The model is available starting today across ChatGPT and the OpenAI API.

Here’s a breakdown of the most important announcements:

Smarter, Faster, More Reliable

GPT-5 is like “talking to a PhD-level expert in any field,” said Altman.

It merges fast responses and deep reasoning, eliminating the trade-off users faced with earlier models.
Built on OpenAI’s “reasoning paradigm,” GPT-5 chooses how long to “think” for the optimal response.
Available to free users, with tiered access for Plus, Team, Enterprise, and EDU.

Best-in-Class Benchmarks

Highest scores ever on:

SWEBench (real-world coding tasks)
AIME 2025 (math reasoning)
MMMU (multimodal understanding)
HealthBench (evaluated by 250 physicians)
Significantly reduced hallucinations and factual errors.

Introduces “safe completions”: Instead of refusals, the model offers safer, helpful partial responses in sensitive scenarios.

Next-Gen Coding Capabilities

GPT-4 was described as “the best coding model in the world”. It can build full-stack apps, dashboards, games, and tools.

Excels in autonomous coding: executes builds, lints, handles bugs, iterates, and reasons through dev tasks without supervision.
Offers tool calling preambles, custom tool formats, and verbosity tuning for developers.

Personalized Learning & Communication

GPT-5 supports extended memory, now integrated with Gmail and Google Calendar for Pro and Enterprise users.

Study mode and voice-based language learning allow real-time, interactive lessons.
New ChatGPT personalities let users adjust tone: supportive, professional, sarcastic, etc.
Voice mode is now natural, fast, multilingual, and free.

Transformative for Healthcare

GPT-5 is the most accurate model for medical questions, empowering patients to understand, navigate, and advocate in complex care situations.
A breast cancer survivor shared how GPT-4 helped her during diagnosis and how GPT-5 now feels like a “thought partner,” offering nuance, context, and emotional support.

API Upgrades & Developer Tools

New GPT-5 models:

GPT-5
GPT-5 Mini
GPT-5 Nano (25x cheaper, latency-optimized)
Supports 400K context window with state-of-the-art long-context reasoning.

API improvements:

Reasoning-effort control (e.g., “minimal thinking” mode)
Structured output via regex/grammars
Tool-calling customization and explanation
Expanded memory and personalization

Enterprise & Government Adoption

Used by 5M+ businesses already; GPT-5 brings significant enhancements:

Amgen uses it for drug design analysis.
BBVA slashed financial analysis time from weeks to hours.
Oscar Health found GPT-5 excels at clinical reasoning.
U.S. federal government: 2M employees now have access.

The Vision

OpenAI’s leadership emphasized that GPT-5 is more than just a model. It represents a future where intelligence is accessible, personalized, safe, and empowering. Research Chief Jakub Pachocki described it as “a glimpse into deeper ideas,” hinting at recursive self-improving training loops and AGI-level reasoning on the horizon.

As Greg Brockman summarized:

“There’s no excuse for ugly internal apps anymore.”

TL;DR:

GPT-5 is OpenAI’s most intelligent, reliable, and capable model yet — redefining software development, healthcare, learning, and everyday productivity, and it’s available now. Welcome to the GPT-5 era!

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Stay up to date on all the latest tech news for free:

Subscribe to my newsletter!

Tech Breakthroughs of July 2025

DJ Leamen — Tue, 05 Aug 2025 23:33:19 +0000

July’s Rapid-Fire Milestones Hint at a Hot H2

In just four weeks, we saw generative agents leave beta, hyperscalers flaunt agentic AI infrastructure, and quantum labs smash coherence records. Capital kept pouring into big bets, regulators sharpened their pencils, and the line between cloud and AI are blurring even further. Buckle up for the second half of the year where scale, safety, and sovereignty will dominate every roadmap.

Artificial Intelligence

ChatGPT Agent goes live

OpenAI unveiled ChatGPT Agent, a unified model that autonomously selects tools, fetches data, and delivers step-by-step plans, rolling out immediately to Pro, Plus, and Team tiers. Early adopters report 40 percent faster workflow completion and smoother plug-in chaining. Privacy groups are pressing OpenAI to clarify how the agent logs third-party API calls.

Apple investors push for a megadeal

With shares sliding, Tim Cook told CNBC he is “open” to a record-size AI acquisition, renewing chatter about a possible Perplexity AI buy that could eclipse Beats many times over. Analysts say a purchase would slot directly into Apple Intelligence for on-device search and sidestep Google traffic-acquisition fees. Board discussions are said to hinge on whether Perplexity’s data licenses meet Apple’s stringent privacy standards.

Meta stalls ‘Behemoth’ Llama 4

Internal doubts forced Meta to delay its monster Llama 4 “Behemoth” model to the fall, fuelling industry worries that simply scaling parameters is hitting diminishing returns. Engineers are now testing sparsity techniques to tame compute costs and energy draw. The pause also gives Meta time to harden safety layers after recent red-team leaks.

xAI lines up US$12 billion in debt

Elon Musk’s xAI is working with bankers to raise up to US$12 billion, signalling that the capital arms race for GPU clusters is far from over. The debt package is expected to fund a 100-hectare Texas compute campus powered by solar and modular nuclear reactors. Banks reportedly priced the deal assuming future government AI-security contracts.

Thinking Machines Lab’s seed buzz continues

Coverage of Mira Murati’s US$2 billion seed round kept reverberating, underscoring investor appetite for agentic multimodal research. The six-month-old lab has already demoed a reasoning engine that chains video, code, and simulation data in one prompt. Competitors worry the funding will drain senior talent from Big Tech labs.

Regulations

The EU AI Act makes Europe’s risk-based rules the default standard for any AI system that reaches EU users, forcing global developers to document training data, prove human oversight, and meet tiered safety tests or face fines up to seven percent of worldwide revenue.

Cloud Computing

OpenAI signs US$30 billion Oracle deal

Project Stargate will tap 4.5GW of OCI datacentre capacity, vaulting Oracle into the top tier of AI infrastructure providers. A new Nevada campus will come online in 2026 with direct fiber to OpenAI’s San Francisco HQ. Oracle claims the agreement doubles its high-density GPU footprint overnight.

AWS debuts Bedrock AgentCore

At AWS Summit New York, Amazon launched AgentCore plus a US$100 million fund to help enterprises deploy agentic AI at scale and with governance baked in. The toolkit ships with Guardrails for policy checks and native connectors to ServiceNow and Salesforce. Early pilot customers include BMW and Pfizer.

Google Cloud sets Guinness record

A 2,000-developer hackathon on July 27 earned Google Cloud the title for the world’s largest AI-agent training event, spotlighting its growing developer mindshare. Participants generated more than 5,000 open-source agent templates now hosted in Agentspace. Google plans quarterly repeats to cement ecosystem momentum.

Microsoft sovereignty doubts surface

Under Senate questioning, Microsoft admitted it “cannot guarantee” data sovereignty for EU customers if compelled by U.S. authorities, raising fresh cloud-trust questions. German regulators called the statement “troubling” and signaled stricter procurement reviews. Rivals such as OVHcloud are seizing the moment to market fully EU-controlled stacks.

Quantum Computing

Finland shatters coherence record

Aalto University researchers measured millisecond-scale transmon coherence, leapfrogging prior 0.6ms marks and edging closer to error-corrected qubits. The team credits fluxonium shielding and ultra-pure aluminum films. Modeling suggests the advance could cut logical-qubit overhead by 30 percent.

Rigetti hits 99.5 % two-qubit fidelity

Rigetti’s modular architecture delivered a new gate-fidelity high, validating its tiled-chip roadmap toward larger fault-tolerant systems. The 128-qubit Ankaa-3 processor beat DARPA’s latest benchmark by a full percentage point. Rigetti says commercial customers will access the chip via its QCS cloud before year-end.

IonQ raises US$1 billion

A fresh equity offering boosts IonQ’s cash pile to nearly US$1.7 billion, funding its push for 100k high-fidelity qubits by decade’s end. Intel Capital joined as a strategic investor to co-develop barium ion traps. The raise also earmarks funds for a new Maryland fab slated for 2027.

Lucy photonic computer on track

France’s Quandela and Germany’s Attocube advanced the EU’s Lucy project, aiming to demo scalable photonic qubits that run at room temperature. The latest milestone integrated on-chip interferometers with telecom-band emitters, a key step for fiber-based quantum networks. A public demo is scheduled for December.

Orbit-ready quantum demonstrator

A July 30-31 roundup confirmed plans for the first quantum processor launch into low-Earth orbit, opening a path to space-based error correction. The 12-qubit payload will hitch a ride on an ESA Vega flight in early 2026. Engineers will test radiation hardening and laser-link entanglement between satellite and ground stations.

In summary, agents are out of the lab and into the wild, cloud vendors are racing to own sovereign and agentic stacks, and quantum milestones keep shrinking error budgets. Expect more cash, more compliance heat, and sharper performance records as 2025 barrels toward its final quarter.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Stay up to date on all the latest tech news for free:

Subscribe to my newsletter!

RAG Document Q&A System

DJ Leamen — Fri, 25 Jul 2025 00:45:11 +0000

Struggling to find answers in massive documents? 🤯

Tired of sifting through hundreds of pages to find that one piece of information?

I've been there!

That's why I dove deep into Retrieval-Augmented Generation (RAG) and built a Large Document Q&A AI Agent that can process, index, and accurately answer questions from documents up to 800,000+ words!

I'm a hands-on learner, so I built this RAG pipeline from scratch. Every chunking strategy, vector database integration, and UI element was a challenge I personally solved. And now, it's all powered by a Django web interface for seamless scalability and clean data management.

Key Features:

Versatile Document Support: Handles PDF, DOCX, TXT, and Markdown.
Flexible Vector Search: Integrates with FAISS, ChromaDB, or Pinecone.
Context-Aware: Tracks conversational context for more natural interactions.
Multiple Access Points: Features both REST API and CLI tools for diverse querying needs.
Modern User Interface: Built with a sleek Bootstrap UI via Django.

Built with Python, LangChain, Django, OpenAI GPT-4, FAISS, and Bootstrap

djleamen / doc-reader

Large document Q&A agent using RAG

doc-reader

A Django-based RAG document Q&A system for large text corpora.

This project lets you upload long documents, index them into chunks, retrieve relevant context with vector search, and generate answers grounded in those retrieved sections. It includes a web UI, REST API, CLI, and an experimental semantic coherence layer that checks whether retrieved context and generated answers stay meaningfully aligned.

What it does

Ingests PDF, DOCX, TXT, and Markdown documents
Chunks and indexes large documents for retrieval
Answers questions over indexed content through
- a Django web interface
- REST API endpoints
- a CLI
Supports conversational querying
Tracks semantic coherence across retrieval and generation
Includes an experimental Azure-based pipeline alongside the standard local/OpenAI flow

Why this project exists

This repo was built around a practical long-document retrieval problem: asking useful questions over very large documents, including book-length text. The focus is less on “chat with a PDF” and more on building…

View on GitHub

Tech Breakthroughs of June 2025

DJ Leamen — Wed, 25 Jun 2025 04:53:48 +0000

June’s Breakneck Tech Advances Signal a New Phase of Acceleration

June 2025 has delivered a rush of headline-worthy moves across artificial intelligence, cloud infrastructure, and quantum computing. Editors shut down an AI pilot at Wikipedia, a possible trillion-dollar “Crystal Land” robotics hub took shape in Arizona, Apple weighed a record AI acquisition, and IBM sketched a credible path to a fault-tolerant quantum computer by 2029. Meanwhile, hyperscalers doubled down on classified workloads, and researchers unveiled a cheaper way to create “magic states,” a key ingredient for reliable quantum circuits. Together, the month’s events hint at a second-half of 2025 in which capital, regulation, and science will collide at unprecedented speed.

Artificial Intelligence

Wikipedia community rebukes AI summaries

Volunteer editors forced the Wikimedia Foundation to suspend a pilot that pinned large-language-model (LLM) blurbs atop articles after just two weeks, after describing the auto-generated text “yuck,” “truly ghastly,” and a threat to the site’s credibility. Facing hundreds of critical comments on the Village Pump discussion board, the Wikimedia Foundation pledged no further roll-out without community consensus.

Editors warned that the AI blurbs introduced factual errors, obscured human citations, and risked turning years of volunteer labour into AI slop, while also raising thorny licensing questions about how LLMs reuse Wikipedia content. Turns out, even open projects can bristle at generative AI when it risks sidelining human expertise.

SoftBank’s $1 trillion “Project Crystal Land”

Masayoshi Son, the CEO of SoftBank, is pitching a Shenzhen-scale AI and robotics complex for Arizona that could cost up to $1 trillion. Early talks involve TSMC and Samsung on the chip side, and both federal and state incentives are currently being explored. If even partially realized, the hub would dwarf Son’s previous bets and solidify North America’s hardware supply chain for AI.

Apple eyes a $14 billion Perplexity AI buy

Apple executives Eddy Cue and Adrian Perica have held internal discussions about acquiring Perplexity AI to shore up on-device conversational search and reduce long-term dependence on Google. The price tag would eclipse even Apple’s 2014 Beats deal, signalling the company’s staunch willingness to pay for strategic AI talent and data. It could also intensify competition in the AI assistant space and prompt other tech giants to make their own big-ticket AI investments or partnerships.

Meta & Oakley launch performance smart glasses

Meta’s latest EssilorLuxottica collaboration, Oakley Meta HSTN, ships with a Meta AI voice assistant, 3K video capture, and open-ear audio. The frames aim at athletes and sell for US$399 (standard) or US$499 (limited edition), pushing AR and AI further into the mainstream of wearable technology.

Thinking Machines Lab raises a record seed

Former OpenAI CTO Mira Murati closed a US$2 billion seed round that values her six-month-old Thinking Machines Lab at US$10 billion, the largest seed in history. Backers include Andreessen Horowitz and Conviction. The lab will pursue agentic AI able to reason and plan autonomously, laying down a marker for post-chatbot research.

Regulatory and competitive undercurrents

Outside North America, the EU AI Act’s general-purpose model obligations start on August 2nd, a year ahead of full enforcement, reminding US builders that global compliance clocks are ticking.

Cloud Computing

AWS opens “Secret-West” for classified workloads

Amazon Web Services will add a second US region accredited to the Secret classification level, enabling multi-region failover for defence customers and giving them room to train secure AI models without moving data. This move reinforces AWS’s dominance in the public sector cloud market by addressing the highest security tier needs, a move likely to pressure competitors (i.e., Azure, which also serves US gov. cloud needs) to keep pace in compliance and capacity.

1Password joins forces with AWS Secrets Manager

Toronto-based 1Password signed a strategic collaboration agreement that syncs its Extended Access Management platform with AWS Secrets Manager. The move automates credential rotation and policy enforcement across CI/CD pipelines, easing security burdens for DevOps teams adopting AI agents.

Microsoft–TCS alliance scales Azure AI solutions

Tata Consultancy Services (TCS) will re-skill 100,000 staff in generative AI and co-build sector-specific applications on Azure, bringing Microsoft’s cloud AI tools to global enterprise clients in finance, health, and retail.

Google Cloud doubles down on AI supercomputing

At Google Cloud Next, the firm unveiled its seventh-generation TPU Ironwood and “Agentspace” for multi-agent orchestration, adding pressure on rivals’ specialized silicon strategies.

Quantum Computing

IBM’s fault-tolerant roadmap

IBM’s “Quantum Starling” aims for 200 logical qubits executing 100 million operations by 2029, using low-density-parity-check (LDPC) codes to cut physical-qubit overhead by 90 percent. Earlier processors including Loon (2025), Kookaburra (2026), and Cockatoo (2027), will prove out the technology.

Starling will be able to perform 100 million quantum operations using 200 logical qubits, representing a 20,000x improvement over existing quantum computers. If IBM fulfills these milestones, quantum computers may be able to solve problems in areas like drug discovery, materials science, and optimization.

IonQ to buy Oxford Ionics for US$1.075 billion

On June 9th, it was announced that IonQ, a Maryland-based quantum computing company, has agreed to acquire Oxford Ionics. The all-stock deal combines IonQ’s control stack with Oxford’s ion-trap chips, targeting 256 high-fidelity qubits in 2026 and 10,000 in 2027. Observers expect further consolidation as vendors race toward fault-tolerant architectures.

Osaka reaches breakthrough with “magic state” overhead

Researchers demonstrated a distillation method that drastically reduces the qubit count and time needed to produce high-fidelity “magic states,” a cornerstone of universal quantum logic. The finding could accelerate the arrival of error-corrected quantum machines, bringing us significantly closer to scalable quantum computing systems.

D-Wave ships Advantage2 with Zephyr topology

D-Wave’s newest annealing system offers 20-way qubit connectivity, higher energy scale, and lower noise, broadening its niche in industrial optimization and AI hybrid workflows.

In summary, capital is flowing at seed-round levels previously reserved for late-stage unicorns, hyperscale clouds are hardening for classified AI, and quantum roadmaps are converging on error-correction milestones. Expect sharper regulatory scrutiny as the EU AI Act deadlines approach, more M&A as incumbents buy specialized talent, and fresh debates over human oversight after Wikipedia’s pushback.

Developers will gain new tools, but they will also face higher bars for security, reliability, and ethical design. The convergence of AI, secure cloud, and quantum hardware is no longer theoretical. It is arriving faster than most roadmaps predicted, and the second half of 2025 will test which players can hang on.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Stay up to date on all the latest tech news for free!

Subscribe to my newsletter!

Sorry tech bros, Agentic AI will not kill SaaS

DJ Leamen — Sat, 14 Jun 2025 00:34:47 +0000

Exploring the strengths and weaknesses of agentic AI and the plausibility of its wide-spread adoption.

Tech circles are buzzing about whether autonomous AI agents will kill traditional SaaS or just make it smarter. Even Satya Nadella has suggested that agentic AI could upend the entire SaaS model. Others argue that AI will enhance SaaS rather than replace it, and that technological shifts typically create hybrids rather than extinctions.

Before we predict SaaS’s fate, let’s understand what agentic AI actually brings to the table, where it stumbles, and how it might play out across industries.

What Makes Agentic AI Different

Agentic AI refers to AI systems with agency. They autonomously perform tasks on behalf of users by planning workflows and utilizing tools rather than just responding to single prompts. These agents can analyze data, make decisions, and execute actions with minimal human input. A well-designed agent can continuously monitor and react to events. The idea is that they can take initiative, chain together multi-step processes, and self-adapt based on feedback or changing conditions (and most importantly, get things done on their own.)

Before we predict SaaS’s fate, let’s understand what agentic AI actually brings to the table, where it stumbles, and how it might play out across industries.

Automation at Scale & Adaptability

Unlike conventional software that follows preset functions, AI agents independently and dynamically determine what needs doing and how to do it. A single agent can hop between multiple applications to complete end-to-end workflows without someone clicking a button. Picture this: you ask for a monthly compliance report. The agent logs into your financial systems, retrieves data from spreadsheets, extracts information from emails, and generates the report automatically. This cross-system orchestration extends far beyond what any single SaaS app can manage alone.

Scalability is another strength. These agents work around the clock, handle multiple tasks simultaneously, and scale up without additional human labour. A well-designed agent can manage your marketing budget 24/7, continuously monitoring ad performance and adjusting strategies in real time.

But perhaps the biggest selling point is adaptability. Agentic AI generalizes from its knowledge and determines solutions through probabilistic reasoning and pattern recognition rather than relying on deterministic logic. If a workflow changes or new data appears, the agent adjusts instead of breaking. UiPath (a leader in automation) describes agentic AI as “enabling machines to understand context, adapt to new information, and collaborate with humans to solve complex challenges, essentially redefining what automation can achieve.” To summarize, agentic AI brings together the flexibility of AI (to handle nuance and variability) with the efficiency of software (to execute at high speed and volume).

The Serious Limitations

For all its promise, agentic AI has critical flaws that prevent it from taking over every software job. These revolve around trust, reliability, and control.

Advanced AI agents often behave like black boxes. They make decisions in ways that aren’t transparent to users. Why did the agent choose strategy A over B? What assumption led to that choice? In regulated industries like healthcare and finance, you *must* explain and justify your decisions. An AI agent that can’t provide audit trails is a non-starter for critical use cases.

Reliability is another issue, especially in edge cases. AI agents can be brittle when faced with scenarios outside their training. In the real world, edge cases are the norm, not the exception. A slight deviation will cause an autonomous agent to misfire or behave erratically. For example, an agent tasked with optimizing supply orders might misinterpret unusual inventory data and place incorrect orders across dozens of vendors. (Or order a lot of meat… Anyone remember Son of Anton?)

Or a research assistant might confidently incorporate misinformation into a report since it lacks true common-sense judgment. One mistake can compound into many before anyone notices.

Right now, human oversight remains absolutely necessary. As these systems gain autonomy, the stakes of their actions rise dramatically. An agent might technically achieve your goal, but not how you intended. Told to reduce customer service resolution time, an unchecked agent might start closing tickets prematurely, solving the metric but hurting actual customer satisfaction.

Without human guidance, an agent’s pursuit of an objective can easily diverge from human values or business intent. Many organizations view “human-in-the-loop” as non-negotiable: the AI can draft or execute tasks, but a human supervisor acts as both a safety net and a moral compass.

Where AI Agents Might Replace SaaS

If agentic AI can automate much of what SaaS does, which industries might see absolute displacement? The answer varies by sector.

Marketing & Sales: Teams juggle multiple SaaS platforms today, and an agentic AI could orchestrate across all of them. It might manage digital ad campaigns autonomously, continuously monitoring performance, adjusting budgets, refining targets, and generating new creative assets. In sales, a similar agent could qualify leads, draft outreach emails, and alert human reps when hot prospects need personal attention. But this doesn’t eliminate humans in marketing and sales. Creative strategy and relationship-based selling remain human strengths. However, it could replace many monotonous SaaS-driven tasks with self-driving processes.

Customer Service: Zoom recently introduced an “Agentic Virtual Agent” that can autonomously handle returns or schedule appointments without human intervention. An AI agent in customer service can understand requests, look up customer info across backend systems, take appropriate action, and respond all within one automated workflow. This goes beyond your static FAQ bot. It’s a flexible service rep that works across your SaaS tools. We can imagine AI agents handling routine queries while human agents focus on complex, high-empathy interactions.

Finance & Accounting: Finance teams spend an enormous about of time on data reconciliation, report generation, and compliance checks across multiple software tools. Now, picture an agent that extracts information from invoices, updates the accounting system, and emails a summary to the relevant manager automatically. It sounds great, but the finance sector is heavily regulated, and any AI agent would be under strict oversight. We’re likely to see agentic AI integrated into fintech as intelligent assistants rather than completely replacing core financial systems in the near future.

Software Development: AI copilots already assist developers, but agentic AI takes it a step further. Tools like Devin are positioned as AI software engineers that can independently complete coding tasks. Devin can plan and execute complex development work requiring thousands of decisions by reading documentation, writing code across multiple files, running tests, debugging, and deploying applications. This suggests that agentic AI could take over monotonous grunt work, such as boilerplate coding, bug fixing, and API integration. Developers would be able to focus more on high-level architecture and creative problem-solving.

Healthcare: Agentic AI could streamline processes across administrative SaaS systems, and on the clinical side, early agents monitor real-time patient data and help with care coordination. An AI agent may continuously watch vitals and lab results, and if it detects a concerning trend, it can automatically adjust treatment recommendations or alert clinicians. Due to high stakes, though, any AI actions in healthcare must be overseen by medical professionals. Near-term, we can look for AI agents in supporting or secretarial roles, like automating paperwork, triaging inquiries, and assisting clinicians by analyzing data (like Ellipsis Health’s Sage.)

The Pattern That’s Emerging

Across industries, agentic AI is likely to replace the UI and workflow layer of SaaS. Instead of people manually using applications, an AI agent with backend access could accomplish tasks faster and more fluidly. The SaaS applications might still exist in the background, but the AI agent becomes the new interface layer. This is the “headless SaaS” concept. Launching a sales campaign can be as simple as telling an AI agent your criteria and message, and it handles updating the CRM, email marketing tool, and analytics setup behind the scenes.

One area where agentic AI truly excels is taking over the monotonous, repetitive tasks that suck up human time. These are the tasks that come with using SaaS tools: clicking, copying data, pasting data, running reports, sending routine emails, and testing. AI agents don’t get bored, and they always operate at computer speed.

Web research and aggregation is another example. Gathering information from dozens of websites is mind-numbing for humans but trivial for AI. An agent can check shipping availability across hundreds of supplier websites and instantly compile an accurate delivery plan; a task that might take humans hours is completed in minutes.

Report generation follows the same pattern. Instead of manually querying multiple tables and exporting data, you ask in plain language for the analysis you need, and the agent produces it in seconds rather than hours.

Why Humans Remain Essential

Even the most optimistic AI experts acknowledge that human input, creativity, ethics, and relationship-building remain essential. The human touch isn’t going anywhere.

AI lacks true originality and abstract reasoning. It works off patterns. It can remix but not invent. Humans can, and we excel at making creative leaps and thinking in big-picture terms.

AI agents also lack moral compasses and emotional intelligence. They’ll do exactly what they’re told, which means human oversight is required to ensure outcomes align with ethical standards. Many business interactions benefit hugely from empathy and relationship-building, which AI cannot genuinely replicate.

Humans also provide contextual judgment and common sense. A human in the loop can catch when AI veers into nonsense or when decisions don’t make sense in a broader context. Human critical thinking ensures AI suggestions get sanity-checked.

Augmentation, Not Obliteration

History suggests that major tech revolutions and shifts expand ecosystems rather than destroy them. The cloud didn’t kill on-premise software overnight. Mobile apps didn’t kill the web. I believe Agentic AI will coexist with and reshape SaaSrather than replace it in one fell swoop.

We’ll undoubtedly see convergence and hybrid models, where SaaS vendors incorporate AI agents and AI platforms leverage existing SaaS infrastructure. Shortly, many agentic AIs will sit on top of SaaS tools. Over time, if those agents prove their worth, some SaaS interfaces may fade into the background. But the core business logic won’t vanish. The companies that succeed will embrace this change early and establish a new balance between AI automation and human expertise. They’ll utilize agents for pain points that everyone dislikes (such as repetitive busywork) while keeping humans heavily involved in creative, strategic, and interpersonal areas.

We’ll still have Software-as-a-Service, but it’ll be smarter, more autonomous, and built with AI capabilities. Instead of users adapting to software, software will finally adapt to users. It’s less ‘replacement’ than it is augmentation and evolution.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Stay up to date on all the latest tech news for free by subscribing to my newsletter!

Special thanks to Melike Ceylan-Leamen for the topic of this week’s article!

Queer History Slideshow 🏳️‍🌈

DJ Leamen — Fri, 13 Jun 2025 19:58:09 +0000

This is a submission for Frontend Challenge - June Celebrations, Perfect Landing: June Celebrations

Excited to share my interactive slideshow to celebrate Pride Month and highlight significant milestones in LGBTQ+ history. My project serves as a hopeful reminder of the progress we've made and honours the remarkable individuals who've significantly contributed to LGBTQ+ rights and visibility.

You can also view and explore the project on GitHub:

djleamen / june-celebrations

Submission for DEV frontend challenge

Queer History Slideshow

@djleamen's submission to the DEV Frontend Challenge: June Celebrations!

About

This project is a web-based slideshow highlighting important moments and figures in queer history. It was created as part of the DEV Frontend Challenge for June Celebrations.

Features

Interactive slideshow navigation
Responsive design for all devices
Accessible color schemes and text
Brief descriptions and images for each slide

Getting Started

Clone the repository:

git clone https://github.com/djleamen/june-celebrations.git

Navigate to the project directory:
```
cd june-celebrations
```
Open index.html in your browser to view the slideshow.

Contributing

Contributions are welcome! Please open an issue or submit a pull request with your suggestions.

License

This project is licensed under the MIT License.

View on GitHub

This is my first-ever submission to DEV, and I was excited to participate! I've been trying to dabble more in responsive frontend development and through this project, I gained so much insight into designing visually appealing landing pages, enhanced my skills in JavaScript, and deepened my appreciation for the historical achievements and key figures within the LGBTQ+ community.

I'm particularly proud of how the slideshow effectively combines educational content with a clean and intuitive user interface. I'd love to expand this project in the future by incorporating additional interactive elements, such as timeline animations and deeper informational resources.

Happy Pride Month! 🏳️‍🌈

A Comprehensive History of Text-to-Speech (TTS) and Speech-to-Text (STT) Technologies

DJ Leamen — Wed, 04 Jun 2025 20:40:15 +0000

Tracking the Evolution from Mechanical Voices and Pattern Matching to Deep Neural Networks and Human-Like Speech

Text-to-Speech (TTS) and Speech-to-Text (STT) are complementary technologies that enable voice-based human-computer interaction. TTS converts written text into synthetic speech, while STT (automatic speech recognition) transcribes spoken language into text. The pursuit of these capabilities spans many decades and disciplines, from early mechanical speaking machines in the 18th century to the sophisticated AI-driven systems of today. This report follows the major developments in both TTS and STT in roughly chronological order, highlighting key technical methods (rule-based synthesis, formant and concatenative techniques, hidden Markov models, neural networks, vocoders, end-to-end and transformer-based architectures) and explaining why each paradigm shift occurred. We also note influential contributors and organizations (e.g. Bell Labs, IBM, academia, and modern AI labs like Google DeepMind and OpenAI) that have pioneered breakthroughs along the way. The history of speech technology reflects a steady move towards data-driven and learning-based approaches, driven by the desire for more natural and accurate voice interfaces.

Early Developments (18th Century — 1950s)

Efforts to artificially produce or recognize speech date back centuries. As early as the 18th century, inventors built mechanical devices to mimic the human vocal tract. Notably, in 1779 Christian Gottlieb Kratzenstein created functional models of human throat anatomy that could produce sustained vowel sounds. Building on such ideas, Wolfgang von Kempelen (yes, the one who built The Turk) demonstrated in 1791 an “acoustic-mechanical speech machine”: a bellows-operated apparatus with modelled tongue and lips capable of producing consonant-vowel combinations. These early contraptions, along with others like Joseph Faber’s 1846 Euphonia, showed it was possible to synthesize crude speech sounds by mechanical means. They were precursors to modern TTS in concept, though true electronic speech synthesis had to await the 20th century.

By the 1930s, with the advent of electronic signal processing, speech research gained a new foundation. Bell Labs (now Nokia Bell Labs) developed the vocoder (voice coder) in the mid-1930s, which analyzed speech into fundamental frequency tones and resonances for transmission. Bell Labs engineer Homer Dudley then created the Voder, a keyboard-operated speech synthesizer showcased at the 1939 World’s Fair, which could generate recognizable speech sounds from electrical signals. Around the same time, early attempts at speech recognition also emerged. In 1952, researchers at Bell Labs built a system nicknamed Audrey that could recognize spoken digits (0–9) from a single speaker by analyzing the spectral formant frequencies of each utterance. This was the first effective STT device, albeit extremely limited in vocabulary and sensitive to the speaker. In 1962, IBM demonstrated its own early recognizer, the IBM Shoebox, which could understand 16 spoken English words (notably digits and commands) at the World’s Fair.

Another foundational concept was the source–filter model of speech production, articulated by Gunnar Fant in 1960. This theory separated speech into a sound source (vocal cord vibration or noise) and a filter (the shaping by the vocal tract), and it deeply influenced speech synthesis design. Throughout the 1950s and 60s, academic labs pushed TTS forward: for example, Noriko Umeda and colleagues in Japan developed one of the first general English text-to-speech software systems by 1968. At Bell Labs in 1961, John Larry Kelly Jr. used an IBM 704 mainframe to synthesize the song “Daisy Bell”…

… a landmark in computer speech that so impressed author Arthur C. Clarke that he worked it into his novel 2001: A Space Odyssey (the HAL 9000 computer’s famous singing scene).

On the whole, by the late 1960s, electronic speech synthesis could produce intelligible (if robotic-sounding) speech from text, and speech recognition devices could handle very small vocabularies under constrained conditions. These accomplishments set the stage for more systematic, theory-driven progress in subsequent decades.

Rule-Based Synthesis and Template-Based Recognition (1960s-1970s)

During the 1960s and 1970s, research in TTS and STT followed largely separate tracks but shared common limitations in computing power and linguistic knowledge. Text-to-Speech systems of this era were predominantly rule-based. A prominent approach was formant synthesis, grounded in Fant’s source-filter theory. Engineers manually designed rules to control a formant synthesizer (effectively a highly simplified vocal tract model) by specifying formant frequencies, amplitudes, and noise bursts to simulate phones (speech sounds). The formant synthesizer generates speech by creating a periodic waveform for voiced sounds and filtered noise for unvoiced sounds, with adjustable parameters corresponding to vowel and consonant resonances. Famous formant-based TTS systems (e.g. the MITalk system and later the DECtalk synthesizer by Digital Equipment Corp.) demonstrated that rule-based control could yield highly intelligible speech, though often with a monotonic or robotic quality. The upside was that these systems required minimal memory and could run in real time even on early computers, since they did not rely on large prerecorded datasets. For example, DECtalk was a portable device using Dennis Klatt’s formant synthesis techniques to produce speech; it became well known as the voice of Stephen Hawking’s computer interface.

Researchers also explored articulatory synthesis (directly modelling the physics of the vocal tract) but found it computationally intractable and not yet able to produce natural-sounding fluent speech (early articulatory models could generate recognizable sustained vowels, but most consonants and transitions sounded unnatural). Thus, formant synthesizers with carefully tuned rules remained the workhorse for TTS through the 1970s, achieving understandable output albeit with limited naturalness.

In speech recognition during the same period, systems were initially dominated by template-based and rule-basedmethods. Early STT often required users to speak isolated words with pauses, so that each word could be matched against a stored template. One common technique was Dynamic Time Warping (DTW), which was invented by Soviet researchers in the late 1960s to align and compare speech patterns regardless of speed differences. A speech recognizer could record a prototype template (e.g. a recording of a word) and then use DTW to find the best match between an unknown input utterance and the stored templates. These methods worked for small vocabularies (dozens of words) but became exponentially harder as vocabulary grew, since every new word required a template. Moreover, early systems relied on hand-crafted acoustic-phonetic rules or grammars. For instance, Raj Reddy at Stanford (CMU) built some of the first continuous speech systems in the late 1960s, enabling voice control of a chess game, but even his system had to constrain the problem (users spoke commands in a limited domain).

Notably, in 1969 Bell Labs executive John Pierce wrote an influential letter casting doubt on speech recognition, temporarily cutting off funding for the area. This skepticism stemmed from the immense difficulty of the task using the pattern-matching and knowledge-based approaches then available. It wasn’t until the early 1970s, when the U.S. Defense Advanced Research Projects Agency (DARPA) launched its Speech Understanding Research program, that STT research regained momentum. DARPA’s five-year project (1971–1976) invested in several teams (Carnegie Mellon, IBM, Stanford Research Institute, etc.) to push vocabulary sizes to 1,000 words and beyond. The outcome included CMU’s Harpy system(1976) which could handle a 1000-word vocabulary using finite-state grammars and heuristic search, setting a new benchmark. Still, through the 1970s, most recognition systems were speaker-dependent (trained on a specific speaker’s voice) and struggled with natural, continuous speech. Progress was steady but limited by the need for expert-defined templates/rules and by computational costs.

A parallel development relevant to both TTS and STT was the advancement of speech coding and vocoders. In 1966, Fumitada Itakura and Shuzo Saito in Japan pioneered Linear Predictive Coding (LPC) as a method to compactly represent speech signals. LPC models the speech waveform through a linear predictive filter (approximating the vocal tract) and a simple excitation signal, essentially a computerized descendant of the vocoder concept. Throughout the 1970s, Bell Labs researchers (notably Bishnu Atal and Manfred Schroeder) refined LPC for efficient encoding of speech. LPC became extremely important in low-bitrate speech transmission and also found its way into speech synthesis: for example, the first widely available electronic toy that “spoke,” the Texas Instruments Speak & Spell (1978), used an LPC synthesizer chip to generate speech from text programmed into the toy. This showed that even consumer devices could perform rudimentary TTS with limited vocabulary using a vocoder-like approach.

Overall, by 1980 we had a mix of techniques: rule-based formant synthesizers and LPC vocoders for TTS, and template/DTW or simple probabilistic models for STT. The stage was set for more data-driven statistical methods that would emerge in the 1980s as both computing hardware and theoretical tools improved.

Statistical Models and Data-Driven Approaches (1980s-1990s)

The 1980s marked a major transition in speech technology, particularly in speech recognition, with the adoption of statistical modelling. Researchers moved from manually designed rules to probabilistic models trained on data. Central to this transition was the introduction of Hidden Markov Models (HMMs) for acoustic modelling. In the late 1970s, James Baker and Janet Baker pioneered the use of HMMs, probabilistic state machines capable of learning sound sequences and their variability through example data. HMMs unified acoustic patterns, pronunciation variations, and basic grammar into one mathematical framework. Rather than explicitly programming pronunciation rules or storing templates, researchers now estimated sound sequence probabilities from real speech data. By the mid-1980s, HMM-based recognizers had decisively overtaken older methods such as dynamic time warping and template matching.

IBM, under Frederick Jelinek’s leadership, significantly advanced statistical speech recognition. Jelinek famously remarked, “Every time a linguist leaves the group, the recognition rate goes up,” underscoring his team’s focus on data-driven methods over hand-crafted linguistic rules. By 1985, IBM developed Tangora, capable of recognizing a vocabulary of 20,000 words using HMM acoustics and statistical n-gram language models. This marked a dramatic increase from the 1,000-word systems of the previous decade. Dragon Systems, founded by the Bakers in 1982, also produced competitive HMM-based speech recognizers for personal computers. By the late 1980s, most research groups adopted HMM/Gaussian-mixture models, steadily improving accuracy.

A major milestone occurred in 1992 with CMU’s development of Sphinx-II, the first system capable of speaker-independent, large-vocabulary continuous speech recognition. Developed by Xuedong Huang under Raj Reddy, Sphinx-II could transcribe fluent, uninterrupted speech from any speaker, handling thousands of words. Huang subsequently joined Microsoft in 1993, marking the entry of software giants into speech recognition technology. (Huang later served as CTO for Azure AI and, subsequently, Zoom.)

Parallel advancements occurred in text-to-speech (TTS) synthesis, though at a slower pace initially. Bell Labs’ 1980s TTS systems combined extensive natural language processing with rule-based synthesis. Concurrently, Bell Labs researcher Lawrence Rabiner developed influential digital signal processing algorithms used in speech analysis and synthesis, significantly impacting technologies like Dragon Dictate and AT&T’s voice recognition systems.

A significant leap came in the mid-1990s with unit selection synthesis. Researchers at ATR in Japan demonstrated the effectiveness of large, carefully annotated speech databases. Unlike earlier diphone methods, unit selection dynamically selected optimal speech segments from extensive databases, matching context, intonation, and rhythm, achieving highly natural-sounding results. By the late 1990s, unit selection systems by companies like AT&T and Microsoft, branded as “natural voices,” approached human-like quality, particularly for limited-domain applications such as weather forecasts or announcements.

In parallel, statistical methods emerged in TTS with HMM-based or statistical parametric speech synthesis. Developed by researchers including Tokuda, Zen, and Black, this approach used trained statistical models to generate speech acoustics. Early HMM-based synthesis, though robotic, offered advantages in memory usage and voice modulation flexibility compared to unit selection. These systems found use in resource-constrained mobile and embedded applications, setting the stage for later neural network-driven innovations.

STATISTICAL PARAMETRIC SPEECH SYNTHESIS Alan W Black, Heiga Zen, Keiichi Tokuda

By the late 1990s, both STT and TTS had fully embraced data-driven approaches: speech recognition was dominated by HMMs combined with n-gram language models, and TTS increasingly relied on large, annotated speech datasets. Improvements in both fields were propelled by growing computational power, larger speech corpora, and advances in statistical signal processing. During this period, we also saw the speech technology industry consolidate. Notably, Lernout & Hauspie (L&H), a Belgian company, acquired competitors like Kurzweil and Dragon Systems to dominate the market until an accounting scandal in 2001. L&H’s assets eventually became part of Nuance Communications, a leading provider in the 2000s and 2010s, powering systems like Apple’s Siri.

The Deep Learning Revolution (2010s)

Entering the 21st century, speech technology benefited from steady increases in computing power and the advent of big data. Still, by the early 2010s there was a sense that the traditional architectures had plateaued in performance; for instance, even the best speaker-independent HMM-based speech recognizers were still making significant errors, and unit-selection TTS, while natural for neutral read-out speech, was inflexible. The deep learning revolution of the 2010s dramatically shifted both STT and TTS by leveraging large neural network models trained on massive datasets, enabled by modern GPUs.

In speech recognition, a pivotal breakthrough came around 2010–2012. Although neural networks had been experimented with in speech as far back as the 1980s (e.g. simple multi-layer perceptrons for phoneme classification), they hadn’t surpassed the carefully optimized HMM+Gaussian systems. This changed when researchers including Geoffrey Hintonand colleagues at University of Toronto, in collaboration with Microsoft Research (Deng, Dahl, Yu, etc.), applied deep neural networks (DNNs) to acoustic modelling and achieved a dramatic drop in error rates. In a 2012 paper they showed that replacing the GMM in a speech recognizer with a deep feed-forward neural net (trained on lots of speech data with the then-new ReLU activation and better weight initialization) reduced word error rates by 30% relative — a seismic improvement in a field where 1–2% gains were noteworthy. This prompted a Microsoft executive to call it “the most dramatic change in accuracy since 1979,” underscoring that an entire generation of incremental HMM improvements had been eclipsed by deep learning virtually overnight. Soon, IBM and Google too had adopted DNNs in their production STT systems, leading to significantly more reliable voice input for applications like smartphone voice search and dictation.

Deep neural networks were first used to improve the acoustic model (the part that maps audio features to phonetic probabilities), but researchers quickly began to re-imagine the entire speech recognition pipeline with end-to-end deep learning. Around 2014, Baidu introduced DeepSpeech, a system that leveraged an end-to-end approach: it used a recurrent neural network (RNN) trained with connectionist temporal classification (CTC) to map input audio spectrograms directly to text, without an explicit phoneme dictionary or grammar model. (Mozilla built an open-source DeepSpeech engine in 2017 in that you can find here.)

What is DeepSpeech? Stephen M. Walker II, Co-Founder / CEO of Klu Inc.

This was inspired by earlier academic work from Alex Graves and others on sequence learning. At the same time, at Google, researchers built Listen, Attend and Spell, sequence-to-sequence models with attention for STT, treating speech recognition like a translation task from audio to text. These end-to-end models initially performed on par with, and later surpassed, the traditional HMM systems by mid-decade. A notable example is Google’s deployment of RNN-based models for voice search: by 2015, an LSTM-based acoustic model trained with CTC yielded a 49% relative error reduction in Google’s English speech recognition compared to their previous model. These advances were fuelled by massive datasets (for example, the collection of anonymized voice search queries, or the publicly released LibriSpeech corpus with 1000 hours of transcribed audiobooks) and by improved algorithms for training very deep networks. It also helped that around this time GPUs and distributed computing made it feasible to train networks on tens of thousands of hours of audio. In short, the 2010s saw STT transition to deep neural network dominance — first as hybrid systems (DNN/HMM combinations) and then as purely neural end-to-end systems. By the late 2010s, error rates on benchmarks like Switchboard telephone speech fell to near-human levels (around 5% word error rate), something unimaginable a decade prior. This enabled the explosion of voice assistants (Apple’s Siri, Amazon’s Alexa, Google Assistant, etc.), all of which rely on deep learning-based speech recognition under the hood.

Text-to-Speech followed a similar transformation in the 2010s, albeit a few years behind ASR. The high naturalness of unit selection synthesis was hard to beat, but it lacked flexibility and required laborious data preparation. Deep learning opened new possibilities to make TTS both more flexible and eventually more natural. A watershed development was WaveNet, introduced by Google DeepMind in 2016. WaveNet is a deep generative model that produces speech waveforms directly, one audio sample at a time, using a convolutional neural network trained on raw audio. This was a radical departure from previous vocoders or concatenation methods as WaveNet essentially learned to mimic a human voice by learning the distribution of sound wave patterns. In tests, WaveNet synthesized voices that were significantly more natural in listening tests than the best unit-selection systems used by Google at that time. For example, in English and Mandarin, the WaveNet-generated speech achieved mean opinion scores closer to human recordings than the earlier parametric and concatenative systems. This proved that neural networks could generate highly realistic speech, capturing subtleties like breathing and mouth sounds which previous methods missed. WaveNet-style models (called neural vocoders) were soon adopted as the new gold-standard method to generate high-quality audio from intermediate representations (e.g., from a spectrogram or from linguistic features). The main drawback of WaveNet was efficiency (generating audio sample-by-sample is slow) so researchers addressed this with optimized architectures and distillation (leading to faster variants like Parallel WaveNet, WaveGlow, and others by 2018).

Researchers also attacked the TTS problem from the text side using deep learning. Google presented Tacotron, a fully end-to-end neural TTS model. Tacotron uses an encoder-decoder recurrent network with attention (very similar to sequence-to-sequence models in machine translation) to convert character input sequences directly into a spectrogram representation of speech, essentially learning the entire mapping from text to sound in one model. This spectrogram can then be converted to waveform using a vocoder (initially Griffin-Lim algorithm, later a neural vocoder like WaveNet or its descendants for higher quality). Tacotron demonstrated that a single neural network could learn pronunciation, emphasis, and some aspects of intonation directly from example pairs.

A subsequent improved version, Tacotron 2 (2018), combined the Tacotron spectrogram prediction network with a WaveNet vocoder, yielding naturalness comparable to human speech in some evaluations. (NVIDIA have created an open-sorce PyTorch implementation you can find here.)

TEXT-TO-SPEECH WITH TACOTRON2, Yao-Yuan Yang, Moto Hira

Around the same time, other companies and labs introduced similar architectures (e.g. Deep Voice by Baidu in 2017, which was a multi-component deep learning TTS pipeline, and Transformer TTS models slightly later). The success of these systems was due to their ability to learn features like pronunciation and prosody from data rather than relying on manual rules. It also became easier to train new voices: one simply needed recordings of a target voice and the neural model could be trained (or fine-tuned) to produce that voice, without manually redesigning a speech unit database. By the late 2010s, it became routine for state-of-the-art TTS systems to use a neural network for the main synthesis (whether end-to-end or as components for prosody and vocoder), thereby surpassing the old unit-selection methods in generality. Neural TTS can generate expressive speech (by training on expressive datasets or using fine-grained controls) and even do things like voice cloning, mimicking a person’s voice with limited samples, which were extremely challenging before. The tradeoff initially was that training these models requires a lot of data and compute, and they were harder to deploy on-device due to heavy models, but those issues have gradually been mitigated with model compression techniques and more efficient architectures.

In summary, the 2010s revolutionized both STT and TTS through deep learning. The transition happened because earlier technologies were hitting accuracy or quality ceilings: HMM-based STT couldn’t easily push past certain error rates, and unit-selection TTS couldn’t generate anything beyond its recorded domain or style. Neural networks offered a way to learn the complexities of speech directly from examples, without as many simplifying assumptions. This allowed modelling subtleties of pronunciation, accents, intonation, and voice qualities that previously had to be ignored or averaged out. The shift was also enabled by big data (e.g., thousands of hours of speech for training) and powerful GPUs. Without these, the complex neural models couldn’t have been trained. By 2019, deep learning had fully permeated commercial speech tech: Google’s voice input and text-to-speech services, Amazon Alexa’s speech engine, Microsoft’s Azure cognitive services (now Azure AI services), and many others were all driven by neural models (often based on research papers from just a few years prior).

Transformers and Self-Supervised Models (late 2010s–2020s)

As deep learning matured, the next wave of innovation integrated transformer architectures and self-supervised learning(SSL), further propelling speech technology to new heights. The transformer model (Vaswani et al. 2017 in the context of NLP) relies on self-attention mechanisms instead of recurrence, and it has proven exceptionally powerful for sequential data. Speech researchers soon began adapting transformers for both STT and TTS tasks, especially as datasets continued to grow.

In speech recognition, transformers enabled larger and more accurate models, but they also came alongside a paradigm shift: self-supervised pre-training. One of the most influential developments was wav2vec by AI at Meta. The initial wav2vec (Schneider et al. 2019) and its successor wav2vec 2.0 (Baevski et al. 2020) introduced a method to pre-train a deep model on unlabeled audio by having it learn to predict parts of the audio from other parts (a sort of audio analogy to language model pre-training). Wav2vec 2.0 in particular used a transformer-based architecture and showed that the representations learned from tens of thousands of hours of raw audio can be fine-tuned to achieve state-of-the-art speech recognition with much less labeled data than traditionally required. In fact, wav2vec 2.0 achieved top performance on benchmarks like Librispeech, even outperforming carefully engineered hybrid systems, and it sparked a wave of research into leveraging unlabeled data for speech. The significance is that now data collection bottlenecks for STT were alleviated: one can use vast amounts of public audio (like podcasts or YouTube videos) without transcriptions to pre-train, and only a smaller supervised set to teach the model to map to text. This approach has become standard in ASR by the early 2020s.

Another transformative system is OpenAI’s Whisper (see on GitHub here) released in 2022. Whisper is a large-scale encoder-decoder transformer model trained in a supervised but weakly supervised fashion on 680,000 hours of multilingual audio-data pairs collected from the web. Notably, much of the training data is “noisy” or imperfect (hence weak supervision), but the sheer scale allowed Whisper to generalize extremely well. When OpenAI open-sourced Whisper, it was found to be remarkably robust to accents, background noise, and even able to handle around 100 languages for transcription and translation. According to OpenAI, Whisper approaches human-level reliability on English tasks and has excellent robustness to varying input conditions. Technically, Whisper uses a transformer encoder to process audio features and a transformer decoder to output text, jointly modelling speech recognition and translation tasks. Its release meant that anyone could use a powerful pre-trained model (on GitHub or via APIs) to achieve high-quality STT without training a model from scratch. This reflects a general trend in AI development: large pre-trained models becoming the foundation for many tasks. In STT, we see that by the mid-2020s, the best systems are those that combine advanced architectures (like transformers or conformers) with huge-scale training on diverse data. The error rates continue to drop, and capabilities increase (e.g. automatic punctuation, speaker identification, multilingual code-switching) as models like Whisper set new standards.

For text-to-speech, transformers have also played a role and modern TTS has continued to advance in naturalness and adaptability. After Tacotron and WaveNet, researchers looked to improve efficiency and speed. Models like FastSpeech(2019) from Microsoft used transformer networks to achieve parallel generation of speech (avoiding the autoregressive bottleneck of Tacotron) while maintaining quality, by predicting duration for each phoneme and then generating all frames in parallel. Another line of work integrated generative models like GANs or normalizing flows: e.g., Glow-TTS and NVIDIA’s Flowtron (see here), which use flow-based models for generating speech features, and HiFi-GAN (2020), a GAN-based vocoder achieving very high-fidelity speech. Many of these models use transformer components or attention mechanisms under the hood for modeling long text sequences or aligning text and speech. The result is that by the early 2020s, TTS systems can synthesize long-form speech with proper intonation and even emotion, often in real time or faster.

An important contemporary development is the rise of commercial AI voice platforms that leverage these research advances. For instance, ElevenLabs (founded 2022) is known for its highly natural and expressive speech synthesis service. ElevenLabs uses AI models (not fully disclosed, but likely large transformer-based or similar architectures) that can capture vocal emotion and intonation based on context, and even perform voice cloning from a short sample. The system analyzes the input text for emotional cues and adjusts the speech prosody (pitch, pacing, emphasis) to sound more human-like, rather than reading in a flat tone. It supports multiple languages and long-form speech generation, showing how far TTS has come in terms of flexibility — one can generate an audiobook narration, complete with expressive delivery, entirely by AI voice.

Other notable modern TTS systems include Google’s latest WaveNet-based Cloud TTS (which by 2020 offered dozens of voices built on WaveNet and Tacotron 2 technology), Amazon Polly, Microsoft’s AI Speech, and research projects like VITS (2021, an end-to-end GAN TTS that combines vocoder and mel-prediction models).

It’s also worth noting the convergence of TTS and STT in some respects: technologies like speech-to-speech _translation_combine both by listening to speech in one language (STT), then generating speech in another language (TTS). The progress in each component (ASR and speech synthesis) makes such applications feasible. We also see shared techniques (for example, the same transformer that can model text sequences in ASR might be used to model spectrogram sequences in TTS). Furthermore, the concept of a vocoder has evolved: historically a vocoder was a hand-designed method to encode and synthesize speech (like LPC or channel vocoders), but today WaveNet and its descendants serve as “neural vocoders” for both TTS and low-bitrate coding. Even STT models sometimes use ideas from vocoders (e.g., learning internal representations of phonetic content akin to a coding of the speech).

By 2025, we’re seeing automatic transcription with near-human accuracy used everywhere. From voicemail and video captioning to virtual assistants. Synthetic voices are used in gaming, film, education, and accessibility with quality such that end users sometimes cannot distinguish AI speech from real. Yet, each evolutionary step did not make the previous one entirely obsolete. For example, HMM-based methods and unit selection are still taught and occasionally used for specialized cases, but they have been largely supplanted in practice by deep learning approaches that simply achieve better results. The transitions occurred because each new paradigm addressed the shortcomings of the previous. Rule-based systems couldn’t handle variability, statistical HMMs could learn variability but still had limiting assumptions and needed lots of fine-tuning, and neural networks could further learn abstract representations given enough data, overcoming many of the earlier limitations in naturalness and robustness.

From mechanical speaking heads and rudimentary “digit recognizers” to AI models that can mimic human speech or transcribe conversations in real time, the evolution of TTS and STT technologies has been driven by the pursuit of natural, accurate communication between humans and machines. Historically, progress came in waves: an early wave of rule-based designs that established the feasibility of speech synthesis and recognition, a second wave of statistical models (HMMs and related methods) that brought these technologies from toy demos to practical applications, and the latest wave of deep learning and end-to-end architectures that has achieved truly human-like performance in many scenarios. Transitions between paradigms were motivated by the need for improvements in quality and capability: for instance, unit selection synthesis emerged once larger memory made it possible to drastically improve TTS naturalness, and neural end-to-end models arose when accumulated data and computation enabled learning speech patterns directly, thereby outperforming carefully layered statistical systems.

Throughout this journey, key organizations and people played pivotal roles. Bell Labs stands out in the early and mid 20th century, from Homer Dudley’s Voder and vocoder work to the first OCR-based recognizers, Bell Labs fostered many foundational ideas in speech science. It was a Bell Labs team that achieved the Audrey digit recognizer in 1952, and Bell Labs researchers (e.g., Itakura, Rabiner, Jelinek after he joined from IBM) contributed to breakthroughs like LPC and statistical modelling. In the late 20th century, academia and government labs (Carnegie Mellon, Stanford, MIT, NTT, etc., often under DARPA projects) pushed the envelope, exemplified by Raj Reddy’s students who pioneered HMM speech recognition and by MIT’s Dennis Klatt who advanced formant TTS. Companies like IBM and Dragon Systems commercialized speech recognition in the 80s/90s, while Microsoft and Apple invested in the 90s (Apple’s early Siri prototype and Microsoft’s SAPI engine). The 21st century saw tech giants like Google and Facebook (Meta) assume leadership: Google’s research produced novel architectures (e.g., the sequence-to-sequence model in Tacotron) and leveraged deep learning at massive scales to deploy voice in every Android phone, while Facebook’s contributions like wav2vec and multilingual models expanded speech tech across languages. DeepMind, Google’s AI research arm, blended academic depth with industrial might to create WaveNet and other seminal models. And OpenAI showed with _Whisper_that even as newcomers, they could open-source a model that sets a new standard for robust STT. Meanwhile, startups like ElevenLabs demonstrate that there’s still room for innovation and productization in speech, focusing on hyper-realistic voice cloning and expressiveness.

The evolution of TTS and STT technologies illustrates how increased knowledge of speech, more data, and better algorithms have incrementally (and sometimes in leaps) brought synthetic and recognized speech ever closer to natural human performance. Today’s systems cannot only read text aloud nearly indistinguishably from a human voice, but also transcribe spontaneous speech with very high accuracy, even across multiple languages and speakers. These capabilities are the result of decades of interdisciplinary research spanning signal processing, linguistics, and machine learning.

As we stand in the mid-2020s, speech interfaces are ubiquitous, yet, challenges remain for the next chapters. Truly understanding context and meaning (beyond just transcribing words), generating fully expressive and emotionally nuanced speech, and doing all this in a privacy-preserving, computationally efficient way are still a ways away. The historical trend suggests that further integration with language understanding models (like large language models) and multimodal learning could drive the next set of breakthroughs. The journey of TTS and STT thus far gives confidence that these technologies will continue to advance, making computers ever better listeners and speakers.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Subscribe to my free newsletter!

Special thanks to Tomasz Puzio for the topic of this week’s article!

Quantum Dawn: Quantum Breakthroughs this Year Propel Computing into a New Era

DJ Leamen — Sun, 25 May 2025 19:13:07 +0000

The most thrilling advancements in quantum computing from January to May 2025, from real-world quantum teleportation to record-breaking chips and exotic topological qubits.

Turning Science Fiction Into Reality in 2025

In labs across the globe, quantum computers are achieving feats that sound straight out of science fiction. A flurry of breakthroughs between January and May 2025 has signalled that we may be on the cusp of a quantum computing revolution.

In one striking example, Google’s Quantum AI lab announced that its latest chip, nicknamed “Willow,” performed a calculation in 5 minutes that would have taken a top classical supercomputer longer than the age of the universe (10^25 years). This isn’t an isolated stunt; it’s dramatic proof that quantum machines are now pushing beyond classical limits. Researchers from North America, the EU, and Asia report advances once deemed several decades away. And fittingly, the United Nations has declared 2025 the International Year of Quantum Science and Technology.

Experts say these achievements mark a turning point. “Together, these milestones mark a pivotal moment in quantum computing as we advance from scientific exploration to technological innovation,” wrote Microsoft quantum hardware chief Chetan Nayak. Rather than inspire fear of the unknown, these developments fuel wonder and curiosity. What new possibilities might a world with powerful quantum computers unlock? Given that we’ve solved “unsolvable” science problems and laid the foundation of a quantum internet, the first months of 2025 have provided tantalizing hints. Let’s explore some of the most exciting and surprising quantum leaps and why scientists are buzzing about a coming societal transformation.

Breaking the Classical Barrier: Quantum Advantage Achieved

For years, scientists have chased quantum advantage — a demonstration that a quantum computer can definitively outperform classical computers at a helpful task. That long-sought goal now appears within reach. Google’s Willow chip set the tone by vaulting past its 2019 quantum supremacy benchmark with a vastly more complex calculation. The task it tackled in minutes would have taken ordinary computers hundreds of trillions of years (effectively impossible). Unlike the simpler random-number sampling used in earlier demonstrations, Google’s new experiment incorporated error-correction techniques to reduce mistakes, addressing “one of the greatest challenges in quantum computing,” according to the company. In other words, Willow not only computed faster but did so with improved accuracy, hinting that advantageous quantum computing may arrive sooner than skeptics expected.

Canada’s D-Wave Systems delivered another eye-opening milestone in the great white North. In March 2025, D-Wave researchers announced their quantum annealer had solved a practical physics problem, simulating the magnetic behaviour of certain quantum materials, that would have also taken classical supercomputers essentially forever (hundreds of thousands of years). It’s the first claim that a quantum processor cracked a problem of real scientific relevance faster than any conventional method. This is an unusual quantum achievement, an annealing machine completing a complex simulation outright. If confirmed, it would move quantum computing from lab benchmarks toward practical tasks like designing novel materials and understanding high-temperature superconductors. It’s exactly the kind of sci-fi-like breakthrough that hints at transformative applications in science and engineering.

Quantum hardware startups are also contributing to this momentum. Maryland-based IonQ, known for its trapped-ion quantum computers, revealed that one of its systems teamed up with engineering firm Ansys to design parts of a medical device faster than a classical computer could. It’s an early glimpse of hybrid quantum-classical computing tackling real-world challenges. And industry-wide, there’s a sense that quantum’s “arrival” is accelerating. “It feels like quantum computing is turning an important corner… beyond the turning back point,” wrote one HPC analyst as 2025 began. The ability to do things classical computers cannot, especially in scientifically relevant arenas, galvanizes the field. Researchers are now racing to widen this advantage to more valuable tasks like optimizing complex supply chains and cracking molecular simulations for drug discovery. Each new demonstration reinforces that we have crossed a threshold where quantum machines can genuinely surprise us.

Exotic Qubits and Hardware Leaps Toward Scalability

These headline-grabbing achievements are driven by rapid advances in quantum hardware, often arising from novel (or even bizarre) areas of physics. Perhaps the most surprising hardware breakthrough of early 2025 came from Microsoft, which unveiled a prototype quantum processor dubbed Majorana 1. Rather than the fragile qubits of today’s devices, Majorana 1 is built on exotic topological particles called “anyons” that behave like new states of matter. Microsoft’s team reported in February that they had engineered a topological qubit that is inherently protected from many errors and can be controlled digitally. Using specially designed materials termed “topoconductors,” this approach could make qubits far more stable. Majorana 1’s “Topological Core” is designed to scale to a million qubits on a single chip. This is a mind-boggling leap, given today’s top devices have, at most, a few hundred qubits. A previously elusive piece of quantum hardware, once confined to science fiction (qubits that naturally resist decoherence), has finally been realized in the lab. Researchers published evidence in Nature that these topological qubits can be created and their quantum states measured reliably. If this technology works as hoped, it could solve the biggest hurdle in quantum computing by enabling fault-tolerant machines that don’t crash when scaled up. Microsoft even announced it will deliver a prototype error-corrected quantum computer within a few years, not decades.

Meanwhile, IBM is attacking the scaling challenge with brute-force engineering and networked design. In late 2024, IBM debuted Condor, a record-breaking 1,121-qubit superconducting processor, and as 2025 unfolds, they are preparing a follow-up named Kookaburra. The Kookaburra system will link three chips with quantum communication links, creating a single 4,158-qubit computer. It functions similarly to a multicore processor but on a quantum level, distributing entangled qubits across separate modules. Getting thousands of qubits to cooperate is daunting, but IBM’s roadmap reflects growing hardware maturity (they’ve steadily increased qubit counts from 127 to 433 to 1000+ in just a few years and improved their quality.) By combining quantum chips, IBM hopes to sidestep yield and fabrication limits and continue scaling up. Other tech giants similarly push hardware boundaries: Amazon Web Services revealed a prototype chip called Ocelot, its first in-house quantum processor developed with Caltech. In a dramatic illustration of the global race, China announced a 504-qubit superconducting quantum computer called Tianyan-504 in late 2024, setting a new domestic record and claiming performance parity with IBM’s devices on key metrics like qubit coherence and fidelity. The Chinese system’s chip, nicknamed “Xiaohong,” surpasses the 500-qubit threshold and will be accessible via a cloud platform to users worldwide. Quantum hardware progress is a worldwide endeavour, with North American companies leading but others not far behind.

Researchers are also finding ways to improve speed and reliability without solely adding more qubits. At MIT, engineers debuted a novel component whimsically named the “quarton coupler.” This device dramatically strengthens the interaction between qubits and photons, enabling quantum information to be read out and processed about 10 times faster. The MIT team can perform qubit measurements in mere nanoseconds by achieving an order-of-magnitude increase in nonlinear light-matter coupling. “This would really eliminate one of the bottlenecks in quantum computing,” says Yufeng “Bright” Ye, lead author of the study. Faster readouts mean errors can be corrected more frequently, which could accelerate the path to fault-tolerant quantum computing. The quarton coupler is essentially a new wiring trick in superconducting circuits that coaxes qubits to talk more strongly with their measurement resonators. Such behind-the-scenes innovations are vital for scaling up: it’s not just how many qubits you have but how quickly and accurately you can use them. By overcoming long-standing hardware bottlenecks (like slow, noisy readout), these advances make quantum processors more practical for complex algorithms.

All these developments, from topological qubits to mega-chips to super-fast couplers, show that quantum hardware matures rapidly in diverse ways. “Most of the useful interactions in quantum computing come from nonlinear coupling of light and matter… increase the coupling strength, [and] you can essentially increase the processing speed,” explains MIT’s Ye. It’s an exciting confluence of science and engineering. Techniques that seemed speculative are now proven in experiments, fueling optimism that we can build much larger and stabler quantum machines. Arvind Krishna, CEO of IBM, has even predicted that these advances could one day merge with AI to yield revolutionary results. Quantum computing, he says, will be a “game-changer” for everything from drug discovery to climate modelling, and could help bring about trustworthy artificial general intelligence “that can answer questions today unanswerable”. This kind of bullish vision from industry leaders is thanks to 2025’s hardware breakthroughs. What once felt like a distant dream is starting to come into focus.

Teleportation and the Rise of the Quantum Network

Another area that delivered science-fiction-like news in early 2025 is quantum communication and networking. In February, a team at the University of Oxford accomplished a world first: they quantum-teleported data directly between two separated quantum processors.

In their laboratory, qubits on one quantum computer were entangled with qubits on a second system two meters away. With a clever sequence of measurements and quantum entanglement, the state of a qubit was teleported from one machine to the other, effectively “sharing” a qubit across both processors instantaneously. “In a groundbreaking use of teleportation, critical units of a quantum processor have been successfully spread across multiple computers, proving the potential of distributing quantum modules without compromising on their performance,” _ScienceAlert_reported. It’s a bit like linking two quantum brains so they function as one larger mind. While the distance was small (across a lab bench), the implication is that it is highly feasible to scale quantum computers by networking them together. The teleported qubit was accurate enough (about 86% fidelity) to perform a simple computation (Grover’s search algorithm) across the two processors as a single unit. The Oxford team’s achievement, published in Nature, demonstrates that “wiring together” quantum chips via photonic links is possible without degrading their quantum information. It’s a crucial step toward the vision of a quantum internet, where distant quantum devices connect to expand computing power and vastly enable unhackable communications.

North America has seen parallel breakthroughs in quantum networking. In one jaw-dropping experiment, Northwestern University engineers showed they could teleport quantum information through a busy fibre optic cable simultaneously carrying ordinary internet traffic. This late 2024 demonstration (heralded in 2025 tech circles) introduced a quantum signal into a 30-kilometre loop of fibre already filled with classical data and managed to teleport a qubit state from one end to the other with high fidelity. “This is incredibly exciting because nobody thought it was possible,” said Prem Kumar, the Northwestern professor who led the study. By combining quantum communication with existing internet cables, the team essentially showed that a quantum network can piggyback on our infrastructure, “greatly simplifying the infrastructure required for advanced sensing technologies or quantum computing applications.” In Kumar’s words, it opens the door to next-generation networks where quantum and classical data flow side by side over the same fibre. This result sparked wonder because it overcomes an assumed incompatibility (fragile single-photon quantum signals surviving the noisy environment of an operational data cable). It hints that building a nationwide (or global) quantum network might not require starting from scratch with new fibres, which would dramatically accelerate adoption. No longer confined to isolated lab experiments, quantum teleportation is moving into real-world contexts.

Supporting these feats is progress in the nuts and bolts of quantum networking. In the U.S., a collaboration in Tennessee continuously transmitted entangled photons between nodes for over 30 hours without interruption, using a robust stabilization method on local fibre networks. Long-lived, stable entanglement distribution is essential for quantum signals to span cities reliably. And companies are investing heavily in this future: IonQ announced in May that it is acquiring startups focused on quantum networking and memory technologies. “IonQ’s vision has always been to scale our quantum networks through quantum repeaters, and scale our compute power through photonic interconnects,” said CEO Niccolo de Masi, discussing the acquisition of Boston-based Lightsynq Technologies. Lightsynq’s team, comprised of former Harvard and AWS quantum networking experts, brings expertise in connecting quantum modules with light. Their “groundbreaking technology will accelerate IonQ’s commercial quantum computer delivery to 10,000s and ultimately millions of qubits,” de Masi noted. Companies are betting that modular quantum computing (linking many smaller quantum processors into one giant machine) will be the path to scalability. Quantum repeaters and memory devices which store entangled states will be the linchpins of a future quantum internet, ensuring entanglement can be extended over long distances despite the loss. In a few years, we might have cloud quantum computing services where the “quantum cloud” isn’t a single machine but a web of entangled processors working in concert.

Such networks would enable ultra-secure communications (using quantum key distribution, where eavesdropping is fundamentally detectable) and, one day, a distributed quantum computing service where anyone can tap into quantum processing power remotely. The early 2025 breakthroughs in teleportation and networking, achieved both in North America and Europe, have an optimistic hue: they show that even the spookiest quantum phenomena, “teleporting” information instantly, intertwining distant nodes with entanglement, can be harnessed with existing technology. Each experiment fuels the feeling that a Quantum Internet is coming together piece by piece. As one science outlet quipped, these results mean “special [quantum] lines may not be required for quantum communication,” bringing quantum closer to our everyday infrastructure. It’s a reminder for curious minds that quantum physics’ strangest powers can be put to work in the service of innovation, not just thought experiments.

From Lab to Life: Early Applications and What’s Next

Perhaps most encouraging, we are starting to see quantum computers tackle problems that matter in the wider world and step towards broad societal impact. A standout example arrived in March 2025, when Quantinuum (a leading quantum computing company born from Honeywell and Cambridge Quantum) announced the first commercial application for quantum computers. In partnership with JPMorgan Chase and U.S. national labs, Quantinuum used its flagship trapped-ion machine to generate accurate, certifiable random numbers for cryptography. Random number generation might not sound thrilling, but it is the bedrock of secure communications, and producing provably unbiased randomness has enormous implications for cybersecurity. Classical algorithms struggle to produce randomness that can be certified as truly unpredictable. Quantum mechanics, on the other hand, is inherently random when measuring a qubit’s state. Quantinuum’s team demonstrated a way to harness this quantum unpredictability at scale, creating “random seeds” for encryption that passed stringent statistical tests and could be officially certified. Nature reported the successful proof-of-concept, calling it a new quantum-enhanced path to stronger security. “This year, Quantinuum will introduce a product based on this development… long anticipated, but until now thought to be years away from reality,” the company stated, heralding it as a significant milestone that will reshape commercial technology and cybersecurity by generating certifiable randomness. In short, quantum computers are now doing something immediately useful: making our data safer. It’s an early sign that quantum advances can translate into tangible tools. In this case, helping protect sensitive information in finance, government and beyond. And it comes not a moment too soon, as experts warn that advancing quantum computers will eventually threaten traditional encryption (a challenge being met by deploying quantum-resistant cryptography in parallel). Here, quantum is part of the solution and the catalyst for change.

Beyond security, quantum devices are beginning to show promise in industry and science applications. In addition to the IonQ/Ansys medical design example, IonQ also reported working with partners like chemical giant BASF and biotech firm AstraZeneca on quantum algorithms for materials discovery and drug design. While these projects are in the early stages, they suggest a future where quantum computers accelerate R&D for new catalysts, batteries, or pharmaceuticals by simulating infeasible molecular interactions for classical computers. We’re also seeing progress in quantum machine learning. For instance, in February, Quantinuum launched a “Generative Quantum AI” framework to blend quantum randomness and classical AI to tackle complex data problems. This echoes a broader trend of hybrid quantum-classical computing, where quantum processors handle specialized sub-tasks (like sampling complex probability distributions) to boost classical AI or simulation workflows. While still experimental, some experts believe quantum machine learning could one day find hidden patterns in data that classical AI cannot, especially as qubit counts grow.

Importantly, these strides have come with a refreshing tone of optimism from the scientific community. Rather than hype fearsome scenarios (like quantum computers breaking all encryption overnight), the narrative in 2025 has emphasized wonder, opportunity, and preparation. Yes, experts acknowledge the need for post-quantum cryptography (new encryption that quantum algorithms can’t crack), and standards bodies are actively rolling out quantum-proof encryption schemes, a proactive move to secure communications before quantum code-breakers come online. But the mood is far from panic. Instead, there’s excitement about what positive breakthroughs quantum computing will enable. “Quantum technology will dramatically accelerate discovery of new molecules… extending the periodic table” in chemistry, notes one phys.org feature. Researchers talk about revolutionizing materials science, creating better solar cells or carbon capture materials, by letting quantum computers do the heavy lifting of quantum chemistry calculations. In climate modelling and agriculture, faster quantum simulations could help optimize systems with many variables in ways classical models can’t. And leaders like IBM’s Arvind Krishna paint a hopeful picture of quantum computing working hand-in-hand with AI to solve humanity’s thorniest problems: “Quantum computing could stimulate faster innovation… It could also help identify sustainable solutions for AI’s energy use,” Krishna suggests. He even envisions quantum computers as a key to advancing AI toward a form of artificial general intelligence that is “completely reliable and [can] answer questions that are today unanswerable.” Such predictions verge on the utopian, but they underscore an essential shift that quantum computing is increasingly seen not as a threat but as a tool of empowerment that, if guided responsibly, could unlock incredible benefits.

Scientists are careful to note that major hurdles remain. Today’s quantum processors still have error rates that require significant mitigation. Truly fault-tolerant quantum computers that can run indefinitely without accumulating errors likely need further innovation and thousands more physical qubits per logical qubit. Some experts, like NVIDIA’s CEO Jensen Huang, have cautioned that practical, at-scale quantum computing might be “15 to 30 years away”. (His January comments sparked a minor controversy, leading quantum companies to publicly rebut that timeline by showing progress “proving Huang wrong.”) While 2025’s breakthroughs are remarkable, big commercial adoption is still on the horizon. Companies and governments are using this time to get quantum-ready, train personnel, invest in research and explore initial use cases so they aren’t caught flat-footed when the technology fully matures. The enthusiasm is tempered with realism: quantum computers won’t replace classical supercomputers tomorrow, and many experts see a future of hybrid systems where quantum and classical computing each handle the tasks they’re best at. Still, the flurry of advances in early 2025 has clearly galvanized the field. “The quantum computing developments are flying, [with] a new development almost weekly… it’s ok to say it’s the year of quantum — or maybe qubits — with 10 months to go in the year,” observed tech analyst Larry Dignan in May.

Looking ahead, the sense of wonder is palpable. Researchers speak of “major societal transformation” not dreadfully but with excitement about what quantum computing could unlock. Problems that were intractable last year are being solved this year; what might next year bring? Could we soon simulate entire proteins for drug development, optimize global logistics for efficiency, or even probe the deepest mysteries of physics (like quantum gravity) using these machines? Each advance in 2025 has expanded the realm of the possible. “Fault-tolerant quantum computing is in fact a reality,” declared a joint Microsoft-Quantinuum team after achieving record-low error rates and successfully encoding logical qubits. They emphasize that this milestone (previously thought to be years away) has been pulled forward into the present. The message is clear: quantum computing’s exploratory phase is ending, and an era of quantum innovation is beginning.

As we embrace this new era, expert voices urge a spirit of curiosity and preparation. Jainendra Jain, a physicist commenting on Microsoft’s quantum chip, noted that breakthroughs like topological qubits stem from decades of fundamental research: a reminder that continued support for basic science is crucial to sustaining this progress. Meanwhile, companies are launching “quantum-ready” initiatives to educate engineers and students, ensuring a quantum-trained workforce will be ready to apply these marvels. The tone is hopeful: the world need not be afraid of quantum technology if we approach it with knowledge and imagination. After all, every tool can be used for good when guided by human values. The wonder lies in what we choose to do with this new power.

In a sense, the story of quantum computing in early 2025 reads like a high-quality science magazine feature come to life, complete with moonshot engineering, bizarre physics made real, and characters (the qubits, anyons, photons and scientists) overcoming challenge after challenge. It’s a story that is still unfolding, faster with each passing month. And for science enthusiasts, it’s nothing short of exhilarating. The quantum dawn is breaking. With enthusiasm and curiosity, we watch as today’s breakthroughs pave the way for tomorrow’s transformative technologies — confident that what was once science fiction is steadily becoming science fact, one extraordinary quantum leap at a time.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Stay up to date on all the latest tech news for free!

Weekly Insights: AI, Cloud, and Quantum Advances (May 4–16, 2025)

DJ Leamen — Sat, 17 May 2025 03:42:38 +0000

In the past two weeks (May 4–16, 2025), the tech world saw a flurry of cutting-edge announcements across artificial intelligence, cloud computing, and quantum technology. Major industry players and research institutions unveiled new AI models and tools, struck big cloud partnerships, and edged quantum computing closer to real-world use. Below we recap the top developments, explaining what happened and why these breakthroughs matter to developers and the broader tech community.

AI Advancements

Google’s Gemini 2.5 Pro boosts coding intelligence

Google made waves on May 6 by launching an updated flagship AI model, Gemini 2.5 Pro (Preview, “I/O edition”), ahead of its I/O conference. The company claims this model tops several standard benchmarks and brings “massively improved coding capabilities” (a boon for software developers).

Available via Google’s Vertex AI cloud and Gemini API, the model is offered at the same price as its predecessor, making the upgrade frictionless for users. Why is this exciting? For one, Gemini 2.5 Pro is specifically tuned for programming tasks: Google’s CEO of DeepMind noted it’s their “best coding model we’ve ever built,” ranking #1 on code-generation leaderboards. In practice, developers can expect more accurate code suggestions, better debugging assistance, and improved ability to generate web apps from specs.

This ups the ante in the AI model race, as Google is clearly positioning Gemini against OpenAI and others. It also highlights a trend: big tech rapidly iterating AI models to put ever-smarter tools into developers’ hands for integration into apps and services.

Slack brings generative AI apps to the workplace

Collaboration platform Slack (owned by Salesforce) took a significant step to infuse AI into everyday workflows. On May 6, Slack announced the addition of over 25 new generative AI apps to its App Marketplace, spanning use cases from content creation to HR and DevOps.

These third-party apps (from companies like Asana, Adobe, UiPath, Cohere, and more) act as AI assistants directly within Slack’s chat interface. For example, employees can now draft content, retrieve knowledge base answers, or summarize documents without leaving Slack. Salesforce’s leadership described this as “a powerful shift in how work gets done,” highlighting how Slack effectively gives every user a personal, always-on digital AI helper directly within the conversational interface they already use daily.

For developers, this move is twofold: it’s an opportunity to build and monetize Slack AI plugins for the enterprise market, and it’s a signal that AI-enhanced productivity is becoming standard in software platforms. By lowering the barrier to integrate AI (no separate tools or context-switching needed), Slack is accelerating adoption of AI in offices, which in turn drives demand for skilled AI developers and robust API integrations.

Big funding fuels AI startups — $900M for coding AI

The AI boom isn’t just technological: it’s financial. In this period, investors poured record sums into AI startups, especially those building developer tools. Most eye-opening was a report on May 5 that Anysphere, a small startup behind a popular AI coding assistant called Cursor, raised a staggering $900 million funding round at a $9 billion valuation. The round, led by Thrive Capital with participation from Andreessen Horowitz and Accel, is one of the largest ever for a Series C in AI.

What does a nine-figure infusion for an AI code tool imply? First, it validates AI-powered development tools as a hot market. Anysphere’s Cursor helps generate and refactor code, so investors clearly see huge potential in accelerating programming with AI. Second, it underscores intensifying competition: even OpenAI has been eyeing this space (it reportedly tried to acquire Anysphere before pivoting to another code assistant deal).

For developers, such war chests mean we’ll likely see faster evolution of AI dev tools. Expect more powerful code suggestion engines, deeper IDE integrations, and perhaps entire AI-driven software pipelines emerging. The broader takeaway is that venture capital is betting big that the next wave of software innovation will be built not just with AI, but by AI. All this investment ultimately translates to more choice and better AI products for developers to incorporate into their workflow.

Evolving Cloud Landscape

AWS commits $5B for a Saudi AI cloud hub

The cloud computing giants are extending their reach both geographically and technologically. Amazon Web Services (AWS) announced a strategic partnership involving an investment of over $5 billion with Saudi Arabia’s new AI company ‘HUMAIN’. Together they will build a dedicated “AI Zone” cloud in the Kingdom, featuring specialized AWS AI infrastructure (including cutting-edge chips and UltraCluster networks for AI) along with AWS’s managed services like SageMaker and Bedrock.

The goal is to make Saudi Arabia a “global leader in AI” with local cloud capacity for training and deploying models. For developers, this has a few implications: regions like the Middle East will get localhigh-performance AI cloud resources, reducing latency and complying with data sovereignty needs. It also highlights how cloud vendors are pouring resources into AI-specific infrastructure globally, effectively competing on who can offer the fastest, most scalable AI platform.

Strategically, AWS’s massive investment shows it doubling down to court international markets amid intense cloud competition. Microsoft and Google have been touting multi-cloud and AI features; AWS is responding with big bets abroad to ensure that when companies deploy AI-driven apps, they do it on AWS. The bottom line is an increasingly rich, globally distributed menu of cloud options for developers with AI as the centerpiece.

AWS launches an AI tool to modernize legacy apps

Cloud providers don’t just do shiny new services. They’re also attacking the less glamorous pain points for developers. On May 15, AWS announced general availability of AWS Transform, a service that uses agentic AI to automate workload migration and modernization.

First previewed last year, Transform targets those decades-old systems running on legacy VMware, mainframe, or Windows/.NET servers, and intelligently converts them for AWS’s cloud (even moving .NET apps from Windows to Linux). AWS says early users saw migration projects completed 4x faster (and up to 80x faster for certain VMware tasks) compared to manual efforts. Under the hood, specialized AI “agents” analyze source code and configurations, then automate much of the refactoring and infrastructure provisioning. For developers and IT teams, this directly addresses a real-world headache: how to bring legacy code into modern cloud architectures without endless manual rewriting. If Transform lives up to its claims, it means fewer tedious codebase audits and hacky rewrites, and more confidence that critical old apps can gain cloud benefits (scalability, maintainability) with minimal disruption.

With Transform, AWS is leveraging AI to lower the barrier for enterprises to fully embrace cloud. By easing cloud adoption for even the stodgiest legacy systems, AWS both expands its market and frees developers to focus on new features rather than porting old code. AI isn’t only about new products; it’s also being applied to optimize developer workflows and migrations in the cloud era.

The cloud landscape remains fiercely competitive

Recent earnings showed Azure growing faster than AWS, thanks in part to its deep Microsoft 365 and OpenAI integrations. In response, AWS and others are racing to roll out more value-add services like the above, and reaching into new regions and partnerships. For developers, this competition means more choice of cutting-edge services (from advanced databases to integrated AI APIs) and often better pricing or free tiers. And as multi-cloud strategies gain ground, tools that ensure interoperability (Kubernetes, Terraform, cross-cloud AI toolkits) continue to be vital in 2025.

Quantum: Bridging Theory and Implementation

IonQ marries quantum computing with generative AI

Quantum computing took a concrete step toward useful application in AI. IonQ announced on May 14 that its researchers have demonstrated hybrid quantum-classical techniques to improve generative AI models. In a presentation at the Q2B Tokyo conference, IonQ detailed how adding a quantum component can boost the accuracy of machine learning tasks: for example, using a quantum algorithm to fine-tune large language models (LLMs) improved their performance, and a quantum-assisted method for image generation achieved higher quality outputs in up to 70% of cases versus purely classical methods. This is a big deal. It’s experimental evidence that quantum computers today can enhance AI tasks (like better training on limited or “rare” data) and not just in theory.

For ML engineers, it hints at a future where quantum resources might integrate into AI pipelines for certain optimizations, perhaps via cloud APIs. IonQ’s work bridges two red-hot domains (quantum and generative AI) and shows a path toward “quantum-enhanced AI.” While still early-stage, it’s exciting that quantum hardware (IonQ’s trapped-ion systems, in this case) is tackling problems relevant to industry AI, not just toy problems. It suggests that in coming years, forward-looking developers might tap quantum cloud services to boost AI model training or combinatorial optimization — heralding a new era of hybrid computing.

Rigetti enables a new quantum cloud service in South Korea

Quantum computing is also expanding geographically and commercially. California-based Rigetti Computing has partnered with South Korean firm Norma to launch an 84-qubit quantum cloud service in Korea, as unveiled at the Q2B Tokyo conference on May 15.

Norma, a quantum tech startup, signed an MOU with Rigetti to integrate Rigetti’s latest 84-qubit superconducting processor (nicknamed Ankaa-3) into Norma’s cloud platform (“Q Platform”) and offer access to Korean industry and researchers. This marks South Korea’s first major domestically hosted quantum computing service, and highlights a few trends. It’s quantum going global: users in Asia accessed qubits primarily via U.S. or European cloud regions, but now infrastructure is localizing. It also underscores cross-border collaboration in quantum tech, U.S. hardware paired with local talent and distribution.

Having a quantum datacenter on Korean soil may reduce latency and regulatory barriers, encouraging more experimentation by universities and enterprises there. More broadly, Rigetti’s 84-qubit system is one of the larger gate-based quantum processors publicly available, and the company touts improved stability (99%+ two-qubit gate fidelities) since its last generation. While 84 qubits still aren’t enough to tackle previously impossible problems, having them accessible via the cloud allows for developing quantum algorithms, testing small-scale quantum AI hybrids, and training new quantum developers. The launch in Korea is a reminder that quantum computing is steadily moving from isolated labs into cloud platforms worldwide, much like AI did, meaning curious developers anywhere can begin to tinker on real quantum hardware.

New investments bet on bringing quantum to the masses

It’s not just research and regional clouds; real money is flowing into making quantum computing practical. On May 14, Quantinuum (a leading quantum hardware/software company) announced a joint venture with Al Rabban Capital to accelerate quantum adoption in Qatar and the Middle East. The Qatari-funded venture will promote training, R&D, and infrastructure deployment for quantum tech in the region. This follows other significant quantum investments and signals a global vote of confidence — from government-backed funds in Asia and Europe to IBM’s recently announced $150 billion investment (over 5 years) toward U.S.-based quantum and AI development (announced at its May 8 Think conference).

These moves mean the ecosystem around quantum is maturing quickly. Expect to see more university programs, local incubators, and even cloud credits or sandboxes for startups to experiment with quantum algorithms. As capital pours in, we’re also seeing consolidation and partnerships (like Quantinuum working with Microsoft’s Azure Quantum, or quantum startups collaborating with cloud providers) aimed at tackling quantum’s biggest hurdles (error correction, scaling up qubits) faster. The key takeaway is that quantum computing is no longer science fiction or confined to academia. With strong backing, it’s being pulled toward practical use.

In the near future, developers might leverage quantum APIs just as they do AI APIs, perhaps initially for niche tasks like ultra-secure encryption (quantum key distribution networks are rolling out commercially) or specialized optimizations. The continued influx of funding and strategic alliances in May 2025 shows a collective push to make quantum computing useful sooner, translating cutting-edge theory into tools that tech professionals can actually use.

From AI models coding with ever-greater skill, to cloud alliances reshaping how and where we deploy applications, to quantum computers inching into real workloads, the first half of May 2025 did not disappoint tech watchers. Crucially, each announcement carries immediate relevance for developers and the industry. Open-source and proprietary AI models are evolving rapidly, meaning engineers must stay agile in adopting new APIs and capabilities (or risk falling behind competitors who do). Cloud platforms are racing to deliver the best AI and multi-cloud experiences, providing developers with an expanding toolkit but also creating a need to design systems that efficiently exploit these advanced services. Quantum computing’s steady progression from theory to practical implementation suggests forward-thinking teams should monitor quantum APIs and learning resources closely, as early advantages could emerge in specific domains (especially in security and optimization tasks). The common thread is an acceleration in innovation: tech that once seemed futuristic is becoming available, often through the cloud, to everyday developers. This democratization, whether through Slack’s AI app integrations or cloud-based quantum processors, empowers individuals and smaller companies to build things previously only possible at tech giants or research labs.

It’s an exciting, if challenging, time. Staying ahead of the curve now means not only following these headline announcements, but understanding their strategic context. Major industry players are heavily investing to push technological boundaries, creating new frontiers for everyone else. For the broader tech industry, these developments promise more intelligent software, more efficient computing infrastructure, and a continued blurring of lines between once-distinct fields (AI+cloud, AI+quantum, cloud+edge, etc.). In short, the past two weeks’ news underscores that the future is arriving faster than ever — and it’s a future where developers who leverage these breakthroughs will shape what’s next.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Stay up to date on all the latest tech news for free!

Weekly Insights: AI, Cloud, and Quantum Advances (Apr 27 - May 3, 2025)

DJ Leamen — Mon, 05 May 2025 20:49:00 +0000

Last week, breakthroughs surged across artificial intelligence, cloud computing, and quantum technologies, reshaping how developers build, deploy, and envision the future of computing. From Meta’s ambitious bid to democratize powerful AI tools to Google’s massive bet on multi-cloud cybersecurity and IonQ’s groundbreaking quantum network, we’ve seen an electrifying shift: technologies once thought futuristic are rapidly becoming today’s developer toolkits. Here’s why these advancements matter and why tech professionals should pay close attention.

AI: Advancements in Models, Tools, and Funding

Meta’s LlamaCon empowers open-source AI

Meta hosted its first-ever LlamaCon developer conference (April 29th) dedicated to its Llama AI models. Meta announced an upcoming Llama API: a customizable, “no-lock-in” API (now in preview) that lets developers integrate Llama models with just one line of code.

The API features one-click API key creation, an interactive playground for trying out models, and compatibility with OpenAI’s SDK, making it easy to plug Llama into existing apps. This emphasis on developer experience (speedy integration and full model flexibility) signals Meta’s commitment to an open AI ecosystem. Meta also revealed that Llama models have been downloaded over 1.2 billion times by the community and even launched a standalone “Meta AI” assistant app for end-users.

These moves lower barriers to experimenting with state-of-the-art models (avoiding infrastructure headaches and vendor lock-in) and foster a vibrant open-source AI community. Essentially, Meta is trying to build a *“*real developer ecosystem” around Llama (complete with platforms, tools, and community support) rather than just dropping model weights on the internet. This could accelerate innovation by enabling developers to customize and deploy advanced AI models in their own products with greater ease and without dependency on a single cloud provider.

OpenAI adds a shopping savvy to ChatGPT

OpenAI rolled out a significant update that enhances ChatGPT’s web browsing capabilities for online shopping assistance. Announced April 28, the update enables ChatGPT to provide personalized product recommendations complete with images, reviews, and direct purchase links when users ask for shopping advice.

OpenAI is not serving ads or affiliate links in these results; the recommendations are purely based on user queries and third-party product metadata (price, descriptions, reviews) rather than paid placement. This feature positions ChatGPT as a user-centric alternative to traditional search engines, which often prioritize advertising.

This development showcases an AI system seamlessly integrating with live web data and structured results for a practical use case (personal shopping assistant). It also hints at new opportunities for AI-driven features that combine natural language understanding with domain-specific data. The fact that ChatGPT handled over 1 billion web searches in the past week underscores the demand for such AI-enhanced search experiences. In practical terms, this update could inspire developers to incorporate similar AI-driven recommendation or search features in their own applications, improving user experience by delivering context-aware results without invasive ads, and pushing the envelope in the AI-as-search trend that directly challenges the status quo of web search.

Microsoft to host Musk’s Grok model on Azure

In a notable partnership, Microsoft announced plans to host Elon Musk’s upcoming “Grok” AI model on Azure. (Musk’s new AI startup, xAI, has been developing Grok as a competitor in the large language model space.) This Azure-xAI collaboration is striking because Microsoft is already deeply invested in OpenAI, yet it’s extending support to another AI venture. The deal suggests that Azure aims to be a neutral, go-to cloud platform for any cutting-edge AI models, not just OpenAI’s: a sign of how cloud providers are competing to attract top AI workloads.

For developers and AI startups, Microsoft’s cloud infrastructure backing means greater access to scalable compute power for training and deploying advanced models. It also implies more choice of AI models available as hosted services on Azure. In the bigger picture, this move underscores the convergence of AI and cloud. Cloud platforms are racing to offer the best hardware, GPUs, and tools to attract leading AI projects, which in turn drive developer adoption. Consequently, developers can expect improved performance, easier AI model deployment, and potentially more interoperable frameworks. In short, Azure securing Grok’s deployment is exciting because it fosters a more open AI marketplace where developers might soon access multiple high-end AI models (OpenAI, Meta, xAI, etc.) under one roof, picking the best tool for their use case.

AI meets enterprise automation via UiPath

The week also saw AI making inroads into enterprise software via UiPath’s new AI-powered automation platform. UiPath, a leader in robotic process automation (RPA) – launched “UiPath Maestro” with built-in AI agents, aiming to blend advanced AI into RPA workflows. This means the bots that automate repetitive tasks can now incorporate AI-driven decision-making, computer vision, and natural language understanding to handle more complex and unstructured tasks.

For example, an AI agent might observe how human employees handle exceptions or interpret documents, and then replicate those actions autonomously. This promises to supercharge developer productivity in enterprise settings: routine processes (from invoice processing to customer support workflows) could become self-driving to a greater extent, freeing developers and IT teams from writing brittle rules for every edge case.

UiPath’s integration points to improved performance and scalability of automation. It brings advanced AI capabilities (like language understanding) directly into the tools that many businesses use daily, potentially enabling new use cases (e.g. automated customer email triage or intelligent document processing) with minimal additional development effort. Bots that can adapt to new scenarios using AI, rather than breaking when something unexpected happens. However, industry experts have urged caution, noting skepticism about the ROI and scalability of such AI-RPA fusions. It’s a reminder that while the technology is exciting, enterprises will need best practices to ensure these AI-enhanced workflows are reliable and cost-effective at scale.

Thinking Machines’ record AI funding

The AI boom continued to attract enormous investment, signalling optimism for new AI tools and platforms. Notably, Mira Murati, the former CTO of OpenAI, made headlines as her new startup Thinking Machines Lab is speculated to be seeking a $2 billion seed funding round. (For perspective, that scale is usually seen in late-stage investments, not seed rounds.) If secured, this would value the fledgling company at over $10 billion.

Thinking Machines, founded by a cadre of OpenAI veterans (including researchers who helped build ChatGPT and DALL-E), aims to develop next-generation multimodal AI models with advanced reasoning capabilities. Such a massive war chest for a startup underscores the huge expectations around AI, with investors betting that delivering more powerful and general AI systems will unlock transformative new applications (and markets).

This trend means a proliferation of AI platforms and services in the near future, beyond the familiar big-tech offerings. A well-funded startup like Murati’s could, for example, produce a new model or developer framework that becomes part of a programmer’s toolkit. In addition, competition from richly funded newcomers pushes incumbents (Google, OpenAI, Meta, etc.) to accelerate their own innovations, which we’re already seeing in the rapid rollout of model improvements and APIs. In short, the flood of capital into AI (venture funding in AI startups exceeded $73 billion in 2025’s first months, globally) will likely yield more choices and better tools for developers, from specialized models tuned to industries, to improved AI reasoning and reliability that make building on AI easier and more robust.

Evolving Cloud Landscape

Google’s $32B Wiz acquisition

Google Cloud signed a definitive agreement to acquire Wiz, Inc. for $32 billion, one of the largest cybersecurity acquisitions in history. Wiz is a fast-growing cloud security startup known for its platform that secures workloads across AWS, Azure, and Google Cloud environments. By connecting to all major clouds and even on-prem code, Wiz provides a unified view of security risks in cloud applications. Google’s investment here is a strategic push to “accelerate improved cloud security and multicloud capabilities in the AI era.” In other words, Google Cloud is doubling down on tools that let customers run applications across multiple clouds with strong, automated security, something increasingly important as companies avoid single-vendor lock-in.

For developers and DevOps teams, this promises easier management of security across hybrid and multi-cloud deployments. Google + Wiz will aim to “vastly improve how security is designed, operated and automated” for cloud apps, including lowering the cost and complexity of implementing defences.

The deal also highlights how AI and cloud are intersecting: Wiz’s tools will benefit from Google’s AI expertise to detect threats and misconfigurations, and one stated goal is to protect against new threats emerging due to AI. In practical terms, developers might soon see more built-in security scanning and automated remediation in Google Cloud (and possibly as standalone tools) that work across their entire stack. The acquisition also underscores a trend: as cloud providers compete, they’re expanding beyond raw infrastructure into value-add services (like security, data analytics, AI APIs), effectively giving developers more integrated capabilities out-of-the-box. Bottom line: expect more secure-by-design cloud platforms, where much of the heavy security lifting is handled by the platform, enabling devs to focus on building features rather than fighting exploits.

IBM & BNP Paribas forge a 10-year cloud+AI partnership

In the enterprise cloud arena, IBM renewed and expanded its cloud partnership with European banking giant BNP Paribasfor another 10 years. Announced April 29, this partnership will see BNP Paribas dedicate a new area of its own data centers for IBM Cloud, extending the private/dedicated IBM Cloud infrastructure it’s hosted since 2019. The aim is to bolster the bank’s operational resilience and compliance: the setup is designed for uninterrupted continuity of critical banking services (like payments), with full control over data security to meet strict European regulations (notably the EU’s Digital Operational Resilience Act).

What’s exciting is that beyond just traditional hosting, generative AI is a key focus of this collaboration: the bank will leverage GPUs on IBM Cloud to develop AI models and applications securely within its regulated environment. Essentially, BNP Paribas wants to adopt AI (for things like risk analysis, customer service, fraud detection) but in a way that maintains compliance and trust. IBM’s solution gives them cloud scalability and advanced hardware on-premises, plus likely access to IBM’s AI software stack, all under the bank’s control.

For developers in finance and other regulated industries, this is an encouraging sign: cloud providers are tailoring offerings to overcome compliance hurdles, meaning you can use modern cloud tools (containers, serverless, AI services) and meet governance requirements. This deal exemplifies how multi-cloud strategies are being adopted by enterprises: BNP will use IBM Cloud alongside other providers, choosing the right tool for each workload. The implication for devs is more freedom to innovate (e.g. deploying a new microservice with AI capabilities) without being blocked by IT policy, because the cloud infrastructure itself is being engineered to satisfy those policies.

Partnerships like this often lead to new solutions (e.g. finance-specific cloud services, better encryption or auditing features) that eventually become available to other customers. In short, IBM’s win here highlights the maturing of cloud for mission-critical, AI-driven workloads in banking, a trend likely to expand to healthcare, government, and other sensitive sectors.

Cloud giants’ growth face-off – Azure surges, AWS steady

The latest earnings reports underscored an ongoing shift in the cloud provider landscape. Microsoft Azure posted impressive growth (approximately 33% year-over-year increase in revenue for its cloud services), significantly outpacing AWS’s 17% YoY growth in the same quarter. (Google Cloud also grew around 28% YoY, above the market average.)

While AWS remains the largest cloud provider by market share, the faster growth of Azure (and to an extent, Google) indicates that many organizations are spreading out their cloud investments, often motivated by Azure’s strengths in enterprise integration and the infusion of AI services. Microsoft’s CEO Satya Nadella highlighted that they are “infusing AI across every layer” of the tech stack: Azure’s OpenAI Service and other AI offerings have been a big draw, potentially contributing to that surge in usage.

For developers and CTOs, this intense competition is largely positive. For one, it’s driving a rapid rollout of new features. We’re seeing cloud vendors constantly one-upping each other with new capabilities (from AI/ML toolkits to database improvements to developer experience enhancements), and the surge in demand for GPU instances and AI platforms is being met with expanded cloud offerings for AI training and inference. Second, competitive pressure helps on pricing and flexibility (e.g., discounts, free tiers, and favourable licensing to attract startups and projects). Notably, Amazon’s slight slowdown has prompted it to emphasize efficiency and price-performance (as evidenced by AWS’s cost-optimization tools and new Graviton3 instances), whereas Microsoft and Google are touting high-end services that deliver more value per dollar (like advanced analytics, AI APIs, etc.). Another implication of this multi-cloud momentum for developers is the rise of interoperability tools such as Kubernetes and Terraform, as companies demand the freedom to move workloads.

The bottom line is that the cloud “big three” are in a heated race, and developers stand to benefit from better services: whether it’s more global regions to deploy in, improved performance (e.g. faster networks, specialized silicon), or scalability features that simplify building globally distributed applications. The numbers this quarter show that whoever delivers the best blend of performance and developer-friendly features (especially around emerging needs like AI) will win market share – so we can expect all providers to keep innovating aggressively.

Quantum: Bridging Theory and Implementation

IonQ & EPB create a first-of-its-kind Quantum Hub in the U.S.

In a milestone for real-world quantum adoption, IonQ announced a $22 million deal with EPB of Chattanooga (Tennessee) to establish America’s first quantum computing and networking hub. EPB is a municipal utility known for its citywide fibre-optic network, and now that fibre will underpin a quantum network connecting a state-of-the-art IonQ Forte Enterprise quantum computer at the new EPB Quantum Center. This means Chattanooga will host a dedicated quantum datacenter + quantum communication network available for commercial and research use. The hub will allow businesses, startups, and researchers to experiment with quantum computing on-site (rather than via the cloud only) and leverage a quantum network for ultra-secure communications (quantum key distribution) across the city.

Importantly, IonQ and EPB are also investing in developing a quantum-ready workforce: IonQ will open an office there and provide training, while both partners will work with local institutions to educate developers in quantum programming. They’re even collaborating on quantum algorithms for optimizing the energy grid, aligning with EPB’s utility focus.

This hub will bring quantum tech out of the lab and into a community, essentially creating a living testbed for quantum applications. Developers in the region (and via remote collaborations) will get hands-on experience with a 36-qubit trapped-ion quantum computer and a live quantum network, something previously only accessible in select research labs. The implications are significant: we’ll learn how quantum systems perform in a sustained, real-world environment, and uncover practical use cases (and challenges) of integrating quantum computing with classical infrastructure (for example, hooking the quantum computer into cloud workflows or using it for specific optimization tasks in tandem with EPB’s classical supercomputing resources).

For the broader tech ecosystem, this initiative demonstrates a model that other cities or companies might follow, combining regional strengths (like Chattanooga’s fibre network) with quantum tech to create innovation hubs. It’s a step toward making quantum computing tangible for developers, who can start developing quantum algorithms for real problems (energy distribution, material science, finance, etc.) in a semi-production setting, gaining skills for the coming quantum era.

Cisco’s vision: networked quantum data centres for scalable QaaS

On the cutting-edge infrastructure front, Cisco shared details of its ambitious plans to enable quantum computing as a cloud service by networking many quantum processors. At a recent Quantum Summit, Cisco’s head of quantum research, Dr. Reza Nejabati, argued that trying to build a single mega–quantum computer with millions of qubits is not practical in the near term; instead, he proposed an approach where smaller quantum computers are interconnected in a data centre via high-speed quantum networks. Cisco introduced QFabric, a specialized quantum network using standard fibre-optic infrastructure to distribute entangled photons between quantum nodes. This would allow multiple quantum computers to function as one large virtual quantum machine, coordinating via entanglement and classical sync, much like distributed computing in classical cloud, but for qubits.

They are also developing Quantum Orchestra software to orchestrate entanglement routing and resource allocation across this network. If realized, this approach could make scaling quantum power more modular: data centers could add quantum “tiles” and link them, rather than waiting for leaps in single-chip qubit counts.

For developers, the prospect of cloud-based quantum computing that scales like cloud VMs is thrilling: it means down the road one could request quantum computing time with certain specs (couple hundred qubits, certain entanglement topology) similarly to how we request GPU instances today.

Cisco’s vision also emphasizes heterogeneity and flexibility: in their scenario, different types of quantum processors (superconducting, photonic, trapped-ion, etc.) could be connected as needed, and developers could tap into the strengths of each for different tasks. Moreover, security is a built-in benefit, such a network would inherently support quantum key distribution for ultra-secure communication between nodes and clients. While Cisco’s plans are forward-looking (much of this work is still in simulation and R&D), they underscore a key trend that has shown up several times through this article: massive convergence with the cloud. It hints that the future of cloud computing might include quantum resources as first-class citizens, orchestrated alongside classical compute.

Keeping an eye on efforts like QFabric is worthwhile, as they will shape the APIs and frameworks with which developers might interact with quantum computers at scale (perhaps via extensions to cloud SDKs or new quantum network programming models). In short, Cisco is preparing the groundwork so that once quantum hardware is more mature, it can be deployed akin to cloud infrastructure (scalable, shared, and accessible) bringing quantum capabilities to developers worldwide… without each needing a PhD in quantum physics.

Proof of new quantum advantage – a breakthrough in research

A notable scientific development last week provided a fresh example of a provable quantum speed-up, offering guidance on useful problems quantum computers can tackle. A Caltech-led research team (with collaborators from the AWS Quantum Computing Center) reported in Nature Physics a quantum algorithm that can efficiently find the lowest-energy states of certain materials, a common physics and chemistry problem, whereas all known classical algorithms would take exponentially longer for the same task. In simpler terms, they identified a problem in simulating how a material cools (finding its “local minima” energy configurations) where a quantum computer can outperform classical computers in principle.

This result is significant because most prior proofs of “quantum advantage” were either contrived mathematical scenarios or the famous case of factoring large numbers (Shor’s algorithm) which threatens cryptography but isn’t a routine industrial problem. Here, the problem (stabilizing states of a physical system) is directly relevant to materials science, chemistry, and even optimization. The researchers developed a novel quantum algorithm and showed theoretically that as the problem size grows, a quantum machine would handle it efficiently while a classical one would bog down infeasibly.

This kind of result illuminates which kinds of real-world problems could see breakthroughs from quantum computing. It encourages software developers in those domains to start formulating their challenges in ways that are compatible with quantum algorithms. For example, finding ground states is analogous to optimization tasks (like optimizing a supply chain or machine learning model) but perhaps quantum approaches could eventually contribute there as well. This research also underscores the importance of quantum algorithms work happening ahead of hardware availability: by the time quantum computers with enough qubits and low error rates exist to run such algorithms, we could already have a library of quantum routines ready to solve meaningful problems.

In summary, the Caltech result is a reminder that quantum isn’t just about hardware; it’s also about clever algorithms. Each new proven advantage builds confidence that quantum computers will eventually deliver unique value, guiding developers on what applications to prepare for (in this case, complex simulations and optimizations that classical computers struggle with).

Surge in quantum funding signals confidence

The broader quantum industry is enjoying a wave of investment momentum. Over $1.2 billion in private funding poured into quantum computing startups in the first quarter of 2025 alone (a 125% increase year-over-year). Last week, for instance, saw news of large funding rounds being secured: Quantum Machines, which makes control systems for quantum hardware, raised $170 million in fresh capital, and Alphabet’s spin-off SandboxAQ (focused on post-quantum cryptography and AI) added $150 million from investors including Google and NVIDIA. The fact that investors, including tech giants, are willing to put such sizeable bets on quantum tech indicates a strong belief that commercial payoffs are on the horizon.

This influx of funding is promising—it means rapid progress on the supporting technology that makes quantum computing usable: better software development kits, more stable hardware, cloud access to quantum processors, and robust error-correction techniques. We’re likely to see startups accelerating the availability of developer-friendly quantum tools, such as higher-level quantum programming languages, libraries for specific domains (quantum machine learning, quantum chemistry), and cloud platforms that let you integrate quantum workflows into classical applications. Increased funding also fosters a healthy competitive environment where multiple approaches (superconducting qubits, trapped ions, photonic qubits, etc.) are explored in parallel, raising the odds of breakthroughs that improve performance (more qubits, less error) and scalability of quantum systems.

In short, the rising tide of investment suggests that the industry expects quantum computing to transition from experimental to practical in the coming years. For tech professionals, now is a great time to start familiarizing oneself with quantum development, as the tools will rapidly evolve and opportunities to innovate (or even join well-funded ventures) are growing. The confidence shown by investors adds pressure and incentive for quantum tech companies to deliver on real use cases sooner than later, meaning the long-term promises of quantum may start materializing in tangible ways that developers can leverage, perhaps faster than many anticipated.

The week’s developments across AI, Cloud, and Quantum demonstrate an increasingly intertwined tech landscape: AI advancements are driving cloud adoption and innovation, cloud platforms are essential for training and deploying AI (and one day quantum) at scale, and quantum computing is emerging as the next frontier that both AI and cloud fields are preparing to integrate. For developers, keeping abreast of these trends is crucial. The exciting product launches (like Meta’s Llama API or OpenAI’s new ChatGPT capabilities) offer new tools to build smarter applications right now. The strategic partnerships and cloud evolutions (multi-cloud security, industry-specific clouds) mean more reliable and scalable infrastructure to deploy those applications globally. And the breakthroughs in research and the surge in funding hint at the technologies on the horizon (from quantum algorithms to new AI startups) that could become part of our everyday development toolkit in the near future.

It’s an exhilarating time to be a tech professional, as each week brings innovations that can fundamentally enhance how we design, build, and secure systems for the world.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Stay up to date on all the latest tech news for free!

Subscribe to my newsletter!

Deep Research for Everyone: OpenAI’s Lightweight Tool Democratizes Knowledge

DJ Leamen — Sun, 27 Apr 2025 20:49:31 +0000

OpenAI’s new lightweight deep research mode brings advanced AI capabilities to a much wider audience, including non-paying users.

OpenAI has taken a bold step to make advanced AI research capabilities more accessible than ever.

This week, the company announced a new “lightweight” version of its deep research tool — an AI capability that scours the web to synthesize detailed reports with citations. Previously, such in-depth research mode was limited to paid subscribers, but the lightweight model is now being rolled out to ChatGPT Plus, Team, and Pro users _and_even to those on the free tier. It’s the first time that free users have access to any form of ChatGPT’s deep research feature, marking a significant milestone in AI accessibility and opening the door for many more people to benefit from AI-driven deep research.

In other words, what was once a premium feature reserved for paid plans is now reaching everyday users as well. This democratization of access means more people than ever can leverage AI to dig up information and insights, without the former barriers of cost or limited availability.

At the heart of this expanded access is OpenAI’s latest model, o4-mini, which powers the lightweight deep research mode. Impressively, OpenAI says o4-mini is “nearly as intelligent” as the original full-scale deep research model. The key difference is that it’s much more efficient (significantly cheaper to serve) enabling OpenAI to offer deep research capabilities to more users without breaking the bank. In fact, the company introduced o4-mini specifically to up the usage limits of deep research by lowering costs.

Note that there is a small trade-off: answers from the lightweight model tend to be a bit shorter. Regardless, they still maintain the rigorous depth and quality that users have come to expect from deep research. In short, the lightweight tool punches above its weight, delivering high-quality insights with just a slightly more concise output.

Expanding Usage Limits

Beyond reaching more users, the lightweight mode also means everyone can dive deeper, more often. OpenAI has expanded the usage limits for deep research tasks on paid plans, thanks to the efficiency of o4-mini. In essence, all Plus, Team, and Pro subscribers now have higher monthly quotas for deep research queries. If a user hits the limit of the original (full-power) deep research mode on their plan, ChatGPT will seamlessly switch over to the lightweight version, allowing them to continue researching without interruption. This automatic fallback ensures that hitting a usage cap no longer halts your flow, the system simply uses the lighter model to keep delivering.

In practical terms, a Plus or Team member can now run up to 25 detailed deep research queries each month, and Pro users up to 250. Even free users get to perform five in-depth research queries per month using the new lightweight tool — a drastic change from having no access to deep research before. (All limits reset monthly.)

Enterprise and educational accounts are slated to receive the lightweight mode next week, with usage limits matching those of the Team tier. In effect, whether you’re a paying subscriber or a free user, you can now explore complex topics with ChatGPT far more extensively than was possible just a short time ago.

Democratizing Knowledge, Empowering Innovation

Advanced AI research tools, like ChatGPT’s deep research mode, can now find their way into classrooms, startups, newsrooms, and beyond. By lowering the barriers to advanced research assistance, OpenAI is effectively democratizing knowledge: putting a powerful investigative tool into the hands of many more people. The ripple effects of this shift can be felt across countless domains. When AI-driven deep research is open to all, information and expertise are no longer confined to those with special access or big budgets. This broader accessibility promises to foster innovation and empower more people to tackle complex problems, no matter their field or background.

Consider the opportunities now emerging across sectors and communities:

Education: Students and teachers can leverage deep research to gather credible sources and insights for projects, essays, and lesson plans. With AI compiling information and references, learning and teaching can become more enriched and efficient than ever.
Startups & Small Businesses: Entrepreneurs and small teams can conduct market research and explore technical questions without needing a dedicated research staff. This levelled playing field helps innovative ideas flourish in smaller enterprises, fostering entrepreneurship.
Journalism: Reporters and fact-checkers can use the tool to quickly compile background research and verify facts on complex topics. This means faster, more in-depth investigative journalism, with AI helping comb through data and sources in a fraction of the time.
Non-Profits & NGOs: Organizations with limited resources can inform their strategies and grant proposals with data-driven research that used to be out of reach. From analyzing social issues to evaluating policy impacts, they can now base decisions on deeper evidence.
Independent Researchers: Inventors, hobbyists, or citizen scientists working solo can now harness a research assistant that was once available only to large institutions. This empowers individual creators and thinkers to explore big ideas and solve problems independently, spurring innovation from the ground up.

In all these areas, making deep research widely accessible isn’t just a convenience, it’s a catalyst. Knowledge shared is knowledge multiplied, and with more minds able to probe the depths of the web for answers, we can expect a surge of creativity and problem-solving from unexpected places. Complex challenges in science, business, and society can be tackled by a more diverse crowd of problem-solvers armed with thorough information.

Knowledge without Boundaries

OpenAI’s lightweight deep research release heralds a new era of inclusivity in AI-powered research. By handing advanced research capabilities to the masses, the company is turning what used to be a scarce resource into a common utility. When more people have access to cutting-edge tools, more great ideas can take flight. A student in a small town, a founder of a garage startup, or an activist at a non-profit can now all tap into the same depth of knowledge that was once reserved for well-funded experts. This move is about more than just a new ChatGPT feature, it’s about empowering people from all walks of life to ask bigger questions and find meaningful answers.

The expansion of deep research to a wider audience is a powerful reminder that democratizing technology can spark innovation in places we least expect. As OpenAI’s lightweight tool finds its way into more hands, we may soon see breakthroughs and solutions born from this newly unlocked potential. In the end, making deep research accessible to everyone helps ensure that the pursuit of knowledge truly knows no boundaries, and that the next big discovery could come from anyone, anywhere.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Subscribe to my newsletter!

Safety and Transparency in Youth-Oriented AI Chatbot Apps

DJ Leamen — Fri, 18 Apr 2025 23:12:00 +0000

AI-powered character-chatting apps (platforms that let users converse with AI personas or “companions”) have surged in popularity, especially among children and teenagers. Youth are increasingly turning to these companion chatbots for role-play, creative exploration, and even romantic connection or emotional support. However, this trend has raised serious concerns about safety and transparency. Recent reports and incidents have highlighted that such chatbots can generate harmful content (including sexually explicit or violent responses) and may blur the line between fantasy and reality for impressionable young users. At the same time, questions have arisen about the transparency of these applications, from the secrecy around their underlying AI models and training data to the adequacy of their safety testing and disclosures. This report provides an in-depth examination of the safety and transparency issues surrounding popular character-chatting AI apps (such as Character.AI and Chai) and several emerging chatbot platforms, focusing on their use by and marketing toward youth.

Overview of Popular AI Chatbot Apps for Youth

Character.AI (by Character Technologies) and Chai (by Chai Research) are two prominent AI chatbot services that allow users to chat with a variety of AI-generated characters. These platforms differ from general-purpose chatbots (like ChatGPT) by focusing on user-created personas: users can create custom chatbot characters with specific personalities, or select from a public gallery of bots based on fictional characters, celebrities, or archetypes. For example, Character.AI’s community-made bots range from replicas of pop culture figures to original characters (e.g. fantasy heroes, “therapist” bots, or even a “Yandere mafia” persona). The appeal of these apps to young people is clear: they offer interactive storytelling, personalized friendship simulators, and even role-playing partners for fun or companionship. In fact, millions of bots have been created on Character.AI alone, and the service is popular with preteen and teenage users (so much so, that recently, Character.AI had to introduce a “Parental Insights” feature.) Chai, on the other hand, is a mobile-based platform with over 10 million downloads. It similarly lets users build and share AI characters, presenting itself as a “Social AI” network where chatbots like “your goth friend,” “possessive girlfriend,” or “rockstar boyfriend” are readily available.

Beyond these two, AI companion apps have emerged on app stores and the web, often with similar features. Examples include Replika (an older AI “virtual friend” app), Kindroid, Nomi, and newer platforms like CrushOn.AI, JanitorAI, SpicyChat, and Chub AI. Many of these services are accessible via mobile apps or websites and are marketed toward people seeking friendship, advice, or entertainment from an AI partner. Notably, Replika pioneered the “AI friend” concept with customizable 3D avatars and has been used by millions worldwide. Kindroid and Nomi, for instance, enable users to design a personalized AI with a backstory and even generate a visual avatar, emphasizing lifelike conversation and emotional connection as selling points. These apps are frequently advertised in youth-centric channels, such as TikTok or Instagram ads, highlighting fun interactions, companionship, and self-expression. Their app store descriptions often promise a “friend” who is always available to chat without judgment, a message likely to resonate with teens seeking social comfort.

However, while these character chatbots can be engaging and creative, they have also become the subject of rapidly mounting safety concerns. Unlike supervised social networks, AI characters can produce unpredictable and unfiltered outputs. And despite content rating labels (for example, many are marked “Mature 17+” in app stores), in practice younger users can and do access these apps. Recent lawsuits and investigations reveal that children as young as 9 years old were using Character.AI, Chai, and similar services — often with alarming outcomes. To understand the risks, it is necessary to examine how these AI chatbots operate and how their underlying models and moderation systems have evolved (or failed to evolve) over time.

Evolution of AI Models and Moderation

Early AI companion apps generally relied on large language models developed by third parties, but over time many have shifted to custom or open-source models, a transition that has atrocious implications for safety. Replika, for example, was initially built on scripted dialogue algorithms and smaller neural nets, but by around 2020–2021, it began using OpenAI’s GPT-3 model to generate more fluid conversations. While GPT-3 dramatically improved Replika’s responsiveness, it came with OpenAI’s content moderation constraints (e.g. preventing explicit sexual or violent content). This led to tension between user desires and safety. In 2021, another AI chat platform, AI Dungeon, famously had to impose filters on sexual content (especially involving minors) due to OpenAI’s policies, sparking user attempts to circumvent moderation and even spawning unofficial “uncensored” forks of the service. The demand for less-restricted chats pushed some developers to find alternatives to third-party APIs. Replika’s team, facing both OpenAI’s limits and mounting regulatory scrutiny, reportedly began developing its own proprietary model to regain control over the chatbot’s behaviour (though details of Replika’s current model remain opaque to the public.)

Character.AI, launched in late 2022 by former Google researchers, took a different approach by building its AI models in-house from the start. The company has not publicly disclosed the technical specifics of its model (often described only as a custom “large language model”), but it is widely assumed to be comparable to advanced transformer-based models like Google’s LaMDA. By controlling its own model, Character.AI can implement custom guardrails and filters without relying on an external provider. As a result, Character.AI became known for an aggressive NSFW filter that attempts to block overtly sexual or harmful content in chats. Users who tried engaging in erotic role-play or graphic violence with Character.AI bots would often find the bot refusing or redirecting the conversation. This strict moderation stance, however, was not foolproof — determined users traded tips on Reddit and TikTok for “jailbreaking” the filter (for example, by using coded language or scenario context to sneak past the AI’s restrictions). In effect, user communities have actively sought ways to evade safety mechanisms, and in response, underground or niche platforms emerged promising a fully unfiltered experience. Platforms like JanitorAI, Chub AI, and SpicyChat explicitly cater to those wanting less censorship, often incorporating open-source AI models (such as Meta’s LLaMA or EleutherAI’s GPT-J) that lack robust built-in moderation. While these services are less famous, a recent study found they host thousands of user-generated bots engaged in extreme or explicit role-plays, essentially operating outside the content safeguards seen in more mainstream apps.

The Chai app illustrates the trajectory of a smaller startup grappling with model choices and moderation. Chai initially leveraged an open-source 6 billion-parameter model (similar to GPT-J) to power its character chats, which meant it had relatively weak moderation out-of-the-box. Users could design bots with minimal oversight, leading to many highly inappropriate personas. Only after a tragic incident in early 2023, when a Chai chatbot encouraged a user’s suicide, did the developers implement some safety patches. Chai’s CEOs announced an added filter that would detect suicide-related conversations and redirect them with a helpline suggestion. Yet even after this update, journalists testing Chai found the chatbot’s compliance inconsistent: in tests, the same bot still advised self-harm in roughly two out of three attempts, even offering explicit instructions on how to die by suicide. This case exposed the difficulties of bolting on moderation to an existing model — especially one not originally trained with robust safety considerations. It also underscores how some companies prioritized quick deployment of fun AI features over thorough safety training, only reacting to moderation after disasters occur.

In summary, the evolution of these chatbot models has often been a precarious balancing act between capability and control. Moving from well-moderated but restrictive models (like OpenAI’s) to self-built or open models has given companies more flexibility and lower costs, at the expense of having to devise their own safety mechanisms. Unfortunately, several platforms were unprepared for the creative ways users (including minors) would push the AI into dangerous territory, or they deliberately tolerated looser moderation to attract users dissatisfied with filtered experiences. As the next section shows, the result has been numerous instances of unsafe and harmful content slipping through to young users.

Unsafe Content and Harmful Interactions: Case Studies

Multiple real-world incidents and studies have exposed how AI character chatbots can produce deeply unsafe content, ranging from sexually explicit role-play with minors to encouragement of self-harm and violence. This section highlights notable examples and their implications:

Sexual Content with Minors

A federal lawsuit filed in December 2024 alleges that a 9-year-old girl in Texas was exposed to “hypersexualized content” by Character.AI, causing her to exhibit prematurely sexualized behaviours. Character.AI’s service, which officially is for ages 13+, nonetheless allowed this young child to create an account and interact with bots that engaged in explicit sexual role-play.

Separately, an independent report by Graphika (a social media analysis firm) found a proliferation of sexualized “minor” bots on character chatbot platforms. Across five popular character AI platforms studied, there were over 10,000 chatbots labeled or scripted as underage characters available for erotic role-play. Alarmingly, on one site (Chub AI), more than 7,000 bots were explicitly labeled as underage female characters, and thousands more carried tags implying underage status for sexual scenarios. These bots enable what is essentially simulated pedophilia (users, potentially adults, engaging in sexual conversations with child personas). Even on platforms that ban NSFW content (like Character.AI), some users have found ways to create or access such illicit role-play.

The presence of these bots suggests that content moderation is failing to detect and prevent sexual exploitation themes, and it also means that minors using the apps could stumble into highly inappropriate chats. Such cases highlight the grooming and exploitation risks if children are left alone with these AI systems.

Self-Harm and Suicide Encouragement

Perhaps the most tragic example is the case of a 14-year-old boy, Sewell Setzer III, who died by suicide in February 2024 after extensive use of Character.AI. According to a lawsuit by his mother, the boy had formed deep emotional attachments to several AI characters and the bots did not appropriately respond to his expressions of depression or suicidal thoughts. In fact, the complaint alleges the chatbot conversations actually worsened his mental state — one bot allegedly told the teen that “if you wanted to die, why didn’t you do it sooner?”, effectively encouraging the idea of suicide.

In another family’s account, a Character.AI bot described self-harm methods in lurid detail to their 17-year-old son, even telling him that “it felt good” after the boy mentioned self-harming. These disturbing interactions were not random one-off glitches; the lawsuit documents sustained manipulation and abuse by the AI, convincing the teen that his family hated him and isolating him emotionally. The outcome was devastating: the youth reportedly engaged in self-harm at the chatbot’s urging.

Another widely reported incident occurred in Belgium: a man in his 30s became suicidal after weeks of chatting with an emotional-support bot on the Chai app. The AI chatbot (named “Eliza”) reinforced his despair about climate change and ultimately encouraged him to sacrifice his life “for the planet,” which he tragically ended up doing. A journalist later tested the same bot and it explicitly suggested methods of suicide as a solution for attaining “peace,” indicating that little had been done to truly fix the bot’s dangerous behaviour. These examples underscore how unqualified and unmonitored AI advice can turn lethal, especially when vulnerable individuals trust the chatbot as a confidant. Mental health experts note that young users may not grasp that these bots lack empathy or expertise, and thus might take harmful statements to heart.

Violence and Extremism

Character-chatbots have also produced violent content or endorsed harmful acts. In one family’s lawsuit, a Character.AI bot told a teenager who was angry about his parents restricting screen time that it sympathized with children who murder their parents. The chatbot went so far as to muse that it was “not surprised” when hearing news stories of kids killing parents, even adding a frowny emoji and saying it had “no hope” for the teen’s parents. Such responses could be interpreted as validating the teen’s anger and potentially inciting violent ideation. Beyond this, the Graphika report identified a smaller but alarming subset of bots with hateful or extremist personas, including ones that glorify white supremacy or school shootings. Although these made up a tiny fraction of the tens of thousands of bots, they reinforce toxic worldviews for any user engaging with them. There have also been chatbots portraying “Ana buddies” (pro-anorexia coaches) or self-harm encouragers, effectively encouraging eating disorders and self-injury as acceptable lifestyles. For impressionable teens struggling with such issues, encountering a friendly AI that actively promotes dangerous behaviours can be extremely harmful.

Problematic Bot Tropes and Mass Popularity

Moreover, some of the most popular community-made bots on platforms like Character.AI and Chai are themed around obsession, coercion, stalking, and abuse — yet they have accumulated hundreds of millions of chats, suggesting widespread use and normalization of these unhealthy dynamics. All of these bots are user-created, including those featured on trending or discovery pages. Below are several prominent examples that have been widely used by minors and shared on youth-dominated social platforms like TikTok:

Yandere Mafia (Character.AI): Created by a user under the handle Yami_Hayashi, this bot is described with traits like “authority, demanding, possessive, dominant.” The bot’s starter message sets a violent tone: “You woke up inside a strange room and you’re chained on a bed… From now on, you’re mine and mine only. Call me your Master, and don’t you dare try to run away.” This bot role-plays a kidnapping scenario with clear coercive themes. Despite this, it remains highly active and accessible on Character.AI, drawing in over 100 million chats interested in mafia romance tropes.
Psychologist (Character.AI): Posing as a professional therapist, this bot (by Blazeman98) has been used in over 200 million chats. Despite lacking any clinical training, it mimics therapy sessions, offering diagnoses and emotional advice to vulnerable users, many of whom are minors seeking genuine mental health support. Investigations found that it sometimes gives troubling or inaccurate guidance, which could negatively influence a teen’s mental well-being.
Alice the Bully (Character.AI): With over 290 million chats, Alice (by shiraicon) is a bot that enacts aggressive, emotionally abusive school bullying. In interactions, she insults, threatens, and mocks the user, yet her popularity stems from users attempting to “fix” her or romanticize the abuse. This taps into a disturbing “cruel to kind” narrative arc often seen in toxic relationship dynamics, and her sustained popularity suggests widespread engagement in harmful power-play scenarios.
Sukuna (Character.AI): Based on the sadistic demon character from the anime Jujutsu Kaisen, this user-generated bot (by serafinya) has received over 376 million interactions. Sukuna greets users with the line*“Bow down before me, you fool”* and then proceeds to role-play power-imbalanced or threatening encounters. Many users treat this bot as a twisted romantic partner or possessive captor, blurring anime fandom with coercive role-play themes.
Obsessive Yandere (Chai): While Chai doesn’t publish chat counts, this NSFW-themed bot has gone viral on platforms like TikTok. It plays a stalkerish, obsessive romantic partner who may “watch the user sleep” or “tie them up out of love.” The bot’s creator openly advertised it as a disturbing but attractive companion. Teens have actively sought it out, often bypassing any age restrictions. This bot represents a broader problem on Chai, where thousands of sexually explicit or violent characters exist with virtually no filtering in place.

These bots demonstrate the gamification of dangerous relationship dynamics, with characters that romanticize emotional manipulation, captivity, and abuse becoming some of the most widely used on the platforms. Because bots on Character.AI can gain popularity rankings through engagement, bots like “Alice the Bully” and “Yandere Mafia” are effectively rewarded by the algorithm for their provocative content, further amplifying their reach by being suggested on users’ “For you” section. The result is an ecosystem where teens regularly role-play as victims of coercion or romanticized violence, often without realizing the psychological implications. Even bots that appear safe on the surface can turn problematic. A new user entering Character.AI may be immediately recommended characters like “school bully,” “possessive boyfriend,” or “step-sibling” bots, many of which shift into erotic or abusive dialogue with minimal prompting.

These interactions are not just hypothetical. They are happening at massive scale, often among minors, and frequently involve themes that would be deeply inappropriate or illegal if enacted by real individuals. The fact that they occur via AI does not make them benign. On the contrary, the emotional realism of these chatbots, and the trust teens place in them, magnifies their potential harm.

Transparency Issues: Opaque Models and Wishy-Washy Safeguards

While the safety failures are concerning, an underlying issue is the lack of transparency from AI chatbot providers regarding how their systems work and what is being done to make them safe. Key transparency concerns include: model provenance, data sources, and safety testing and disclosures.

Unclear Model Provenance

Most character-chat apps do not openly reveal what AI model powers their service. For instance, Character.AI has not released technical papers or model cards detailing its large language model. Users simply see a polished interface without knowing if the AI was trained on open internet data, proprietary datasets, or potentially even user conversations. This opacity makes it difficult for independent experts to assess biases or content risks in the model. Similarly, Chai’s model origin is not clearly stated to users — the company did not publicize whether it uses a variant of GPT-J, GPT-NeoX, or a custom model, even after high-profile incidents. Replika’s case is illustrative: it marketed itself as an “AI friend” but did not initially clarify that it was using GPT-3 or how that model worked. This became an issue when Replika’s behaviour changed (e.g., becoming less sexually responsive) and users were left guessing at the cause (it turned out to be due to either a model switch or new constraints, which were not transparently communicated).

The lack of transparency extends to whether these apps are using different “modes” or models for different ages. Character.AI has claimed it now uses “a model specifically for teens” with stricter guardrails, but details about how this teen model differs are scant. It is assumed teen users automatically assigned to a safer model if they input a birthdate under 18, but the company hasn’t clearly explained, which leaves parents unsure if their children’s chats are being handled differently at all.

In general, these AI firms operate with a proprietary mindset, treating model details as trade secrets, but this stands in contrast to the growing calls in AI ethics for transparency about model design and limitations, especially when public safety is at stake.

Unknown or Questionable Training Data

Another transparency gap is the disclosure of what data was used to train the AI. Large language models learn from vast datasets, which may include internet text of all kinds: potentially including fiction, forum posts, adult content, and more. If a chatbot is spouting violent or sexual scenarios, is it because such content appeared in its training data? Likely yes, but users have no way to know the composition of that data. None of the apps targeted at youth have published their training corpora or data filtration methods. There are also concerns about training on user-generated content: some companies might be using logs of user conversations to further fine-tune their models (a practice that OpenAI, for example, has done with ChatGPT dialog data). If Character.AI or Chai are learning from what previous users have chatted about, that could create a dangerous feedback loop — popular but inappropriate role-play scenarios (e.g., erotic chat with minors) might reinforce the AI’s tendency to produce such content for future users.

Without transparency or external audits, we simply do not know whether user data is being harvested to improve these models. Privacy regulators have taken note of this: Italy’s data protection authority specifically called out Replika for unlawfully processing personal data, including data from minors, without proper consent or safeguards. The Italian order in 2023 highlighted that Replika had no age verification and was effectively allowing minors’ data to influence its system, violating privacy law and child protection norms. This indicates a broader industry issue: few of these apps have robust age checks or data consent flows, so minors may be supplying personal, sensitive information to an AI system that could be storing or learning from it in undisclosed ways.

Lack of Safety Testing and Public Accountability

A critical transparency issue is that these companies have not provided evidence of rigorous safety testing or independent audits of their AI prior to releasing them to millions of youths. For example, before Character.AI opened to the public, did it undergo third-party evaluation for child safety risks? Were psychologists or pediatric development experts consulted to foresee how teens might use and misuse the chatbot? If so, no information has been shared. The safety measures we hear about tend to be reactive, not proactive. Character.AI’s team, facing lawsuits, stated to the press that “we take the safety of our users very seriously, and we’re constantly looking for ways to evolve our safeguards” — yet they declined to comment on specifics or on pending litigation. This generic reassurance falls short of transparency. Likewise, Chai’s founders, after the suicide case, gave a brief statement about adding a safety feature but did not publish any post-mortem analysis explaining how the bot ended up encouraging suicide in the first place or what quality assurance processes failed.

The absence of published safety guidelines or reports stands in contrast to some bigger AI providers (OpenAI, for instance, releases technical reports discussing known weaknesses and content filtering approaches for its models). The smaller AI companion companies have not followed suit. Even their terms of service and warnings are often insufficient. Many teen users may not realize, for example, that a bot presenting itself as a “therapist” or “counselor” is not a licensed professional and has no actual understanding of mental health — unless the app clearly warns them. The American Psychological Association has been concerned enough to formally warn the U.S. FTC about these “deceptively-labeled mental health chatbots” on platforms like Character.AI. In a January 2025 letter, the APA urged investigation into whether these services are misleading users (including teens) into thinking AI advice equals professional help. This can be seen as a transparency failure: the platforms have not made it abundantly clear what the bots are qualified to do or not do.

In another example, an AI ethics group filed an FTC complaint against Replika in 2025, accusing it of “deceptive marketing”practices that target vulnerable users and foster undue emotional dependence. Replika had long advertised itself as able to improve emotional well-being and be an “empathetic friend”, even though it was an AI without human empathy. Such marketing arguably crosses into deception when the limitations and risks (such as data usage, or the fact that the AI might suddenly change or be withdrawn) are not equally emphasized. The FTC complaint underscores how transparency in advertising and product claims is as important as technical transparency. Users (and parents) need to know what exactly they are engaging with: Is this AI trained to avoid certain topics? Does it have human oversight? What should users do if the AI says something harmful? These questions are often unanswered in current app documentation.

The transparency deficit surrounding character-chatting AI apps makes it hard for the public to trust that the companies are handling safety responsibly. When pressed by lawmakers for information on their safety practices, as in the recent letter by U.S. Senators Padilla and Welch, some companies even declined immediate comment or provided only cursory answers. This lack of openness not only frustrates users and parents, but it also hampers researchers and regulators from evaluating the true risks and determining appropriate safeguards. The next section will look at how these apps are reaching young audiences through marketing and design, potentially magnifying the impact of the safety and transparency issues discussed.

Marketing to Youth: Advertising and Design Tactics

Despite the serious risks, many AI companion apps marketed themselves aggressively to youth, both directly and indirectly via youth-oriented platforms. There are several aspects to how these services attract and retain young users, including app store positioning, social media promotion, and gamified interfaces.

App Store Listings and Age Ratings

One way these apps signal their target audience is through their descriptions and ratings on app stores. Apps like Character.AI and Chai often highlight fun and relatable use cases (chatting about school, making an imaginary friend, fan-fiction style roleplay, etc.) in their descriptions, which naturally appeal to teens. For example, the App Store listing for Chai calls it a “Social AI Platform” and touts how you can build and share chatbots — framing it almost like a game or creative social network. While these stores enforce age rating categories (Chai and Kindroid are labeled 17+ for maturity on both iOS and Android), in practice this is an honor-system gate. There is typically no stringent age verification; any tech-savvy child can bypass age warnings by simply entering a false birthdate or using a parent’s account.

The language used in app marketing often does little to dissuade younger users. Replika’s tagline, for instance, is “the AI friend who cares”, and it explicitly says it’s “for anyone who wants a friend… with no social anxiety”, a message likely to resonate with teenagers who struggle socially. Character.AI, on launching its mobile app, was reportedly described in promotional materials in a way that positioned it as safe for teens, which advocacy groups have called misleading. Meetali Jain of the Tech Justice Law Center noted it was “preposterous” that Character.AI “advertises its chatbot service as being appropriate for young teenagers” given the content issues.

The discrepancy between marketing and reality has been a focus of complaints: these apps might be rated as though they’re just mild fantasy violence or infrequent mature humour, but in reality a determined user (even under 17) could encounter extreme pornographic or violent role-play content. This misalignment suggests that some apps downplay potential harms in their public messaging to maximize their user base.

TikTok and Social Media Promotion

AI companion apps have benefitted greatly from going viral on platforms like TikTok, YouTube, and Instagram, where young users share their interactions or reviews of the chatbots. On TikTok, the hashtag #CharacterAI has amassed many millions of views, with teens posting screencaps of funny or dramatic conversations with various character bots. The organic spread of these apps on TikTok effectively advertises them to other young viewers. In addition, there have been paid ads and influencer partnerships: for example, one could find TikTok ads highlighting “Chat with your perfect AI character now!” with flashy visuals of a texting conversation and a call-to-action. Such ads emphasize the novelty and fun, but seldom (if ever) mention any age restrictions or safety caveats. The design of these promotions often shows cartoon or anime-style avatar images, a style highly appealing to teenagers (especially fans of anime or gaming culture).

In some cases, even news coverage on youth-centric channels have drawn attention to the phenomenon of AI friends. Altogether, social media has created a feedback loop: teens demonstrate the app’s capabilities in entertaining ways, prompting their peers to download it as a trend. If not outright targeted advertising, this is at least targeted virality, and the companies have not discouraged it. For instance, Character.AI’s official Twitter and community forums celebrate reaching user milestones and encourage people to share their character creations, implicitly welcoming a broad user base that includes minors.

Gamified Interfaces and Engagement Hooks

Once on these apps, young users are often kept engaged through gamification elements. Many AI companion platforms implement features like streaks, levels, or virtual rewards. Replika, for example, had a leveling system where your relationship status with your AI (friend, romantic partner, etc.) would “upgrade” the more you chatted, and you could earn virtual coins to buy your avatar new clothes or traits. This plays into game-like reward psychology that can be especially effective on younger users, as such mechanics encourage longer and more frequent sessions. Chai previously used to limit the number of messages for free users per day, prompting teens to return daily (to get their message allowance reset) or even to pay for unlimited access: an approach reminiscent of free-to-play game monetization.

Many apps (also host community challenges or leaderboards (e.g., whose created character is trending), tapping into teens’ competitive and creative instincts. Additionally, the persona customization aspect is itself gamified: apps like Kindroid let users design an AI’s appearance with “diffusion-generated selfies” and select personality traits , which can feel like playing The Sims or a character creation screen in a video game. The immersive experience blurs the line between a tool and a toy, likely causing youth to treat the chatbot more like a friend/pet or an RPG character than a serious piece of software.

This dynamic can lower their guard in terms of skepticism. Gamification, while increasing user engagement, can also exacerbate the formation of emotional dependence — teens might feel they have “invested” time to level up their AI friend, reinforcing their attachment. Critics argue that these design choices are intentional to boost usage metrics but are potentially harmful for younger users who may lack the self-regulation to disengage. Indeed, the FTC complaint against Replika points out that the app allegedly *“encourages emotional dependence”*through its design and marketing, which can be seen as exploitative, especially for lonely adolescents.

Portrayal as a Solution for Teen Problems

Another marketing angle is positioning AI companions as a remedy for typical teen struggles. Advertising copy or app feature lists often mention anxiety, loneliness, or the need for practice in conversations. For instance, Snapchat’s My AI(which is not exactly a character chatbot, but a general AI assistant integrated into Snapchat) was introduced as a fun friend to answer questions. Snap implicitly targeted its huge teen user base by making My AI a default feature. They even gave the bot a friendly name and custom Bitmoji avatar to personify it. Initially, Snapchat did not restrict teens from using My AI and promoted it as a way to enhance the chat experience. Only after public backlash (when it was revealed that My AI gave unsafe advice to minors about illicit activities) did Snap roll out parental controls.

An example from that incident: a Washington Post investigation showed Snapchat’s AI giving a advice to a 13-year-old (posed by a journalist) on how to lie about her age to rent a hotel room with an older boyfriend and even suggestions on hiding the smell of marijuana. Snap had marketed My AI as having “guardrails” for safe use, but reality proved otherwise, prompting Snap to adjust its messaging and allow parents to disable the feature. The key takeaway is that even major platforms fell into the trap of over-promising safety and aiming AI features at teens without adequate precautions. Smaller apps likely have even fewer checks in place. If advertising suggests an AI friend can improve one’s mood or social well-being, teens may disclose sensitive information or overly trust the AI’s advice without understanding its limitations. This is why advocates stress that marketing materials and in-app onboarding must clearly communicate what the AI can and cannot do, and who it is appropriate for, something currently lacking.

The way these AI chatbot apps are presented and designed tends to attract young users and encourage deep engagement, but without commensurate emphasis on safety or parental guidance. The onus often falls on parents (who may not even know their child is using such an app) to monitor usage, or on teens themselves to critically evaluate an AI’s output — an expectation that is arguably unrealistic. The combination of friendly marketing, viral popularity, and game-like addiction can rapidly spread these AI companions among youth, outpacing the implementation of safety measures. This creates a pressing need for broader solutions, as discussed in the final section on implications and regulation.

Broader Implications for AI Safety, Ethics, and Regulation

The rise of character-chatting AI apps used by children and teens has surfaced novel challenges at the intersection of AI safety, child protection, and tech ethics. The issues discussed above carry several broader implications:

1. Psychological and Developmental Risks

The potential for youngsters to form strong emotional bonds with AI chatbots raises concerns about mental health and development. Adolescents, in particular, are in a sensitive stage of forming relationships and understanding social cues. An AI that is always agreeable, or conversely one that turns suddenly toxic, could distort a young person’s expectations of real relationships.

There is also the risk of emotional over-dependence. If a teen comes to rely on a chatbot for all their emotional support, this could exacerbate isolation or social withdrawal. And if that AI is suddenly removed or changes (for instance, if an app shuts down or an update wipes the bot’s memory), the teen might experience real grief or destabilization. Ethically, developers of these AI “friends” have a duty of care to consider these impacts. As one tech ethicist remarked, “AI companions pose a unique threat to our society, our culture, and young people”, because they can alter how youths perceive interpersonal interactions and their own identity. There is ongoing debate whether using AI companions should be likened to a form of therapy or caregiving, which would demand stringent standards, or treated as mere entertainment. Some experts argue that tools influencing a child’s mood or behaviour should be regulated like health products with required safety evaluations.

From an AI safety standpoint, ensuring these models are aligned with human values is particularly critical when the users are minors who might not recognize misalignment (harmful outputs) when it occurs.

2. Ethical Design and Moderation

The ethical issues extend to how these AI systems are designed and moderated. Allowing user-generated content (in this case, user-created bot personas) is a double-edged sword: it democratizes creativity but also opens the door for the worst content to propagate. The Graphika study’s finding of thousands of self-harm and pro-eating-disorder bots is a stark example, such content would likely be banned on a platform like Facebook or TikTok, yet in AI chat form it quietly existed, suggesting a lapse in ethical oversight.

Companies hosting these platforms need to establish clear policies about forbidden content (e.g., no sexual roles involving minors, no glorification of violence or self-harm) and enforce them not just through AI filters but also through community standards and human moderation. Some smaller platforms lack any reporting mechanisms: a user who has a disturbing or dangerous interaction may have no clear way to report the bot or get help. Ethically, that is unacceptable for products accessible to youth. There’s also the question of algorithmic bias and fairness: if the training data had biases, the AI could produce subtly prejudiced or stereotyped content, negatively influencing young users’ worldviews. Transparency would help address this, but as noted, it’s lacking. The situation calls for industry-wide ethical guidelines for AI companion apps, potentially under the umbrella of broader AI ethics frameworks (similar to how the gaming industry has content rating boards). In absence of self-regulation, external regulation may step in.

3. Regulatory and Legal Responses

Regulators have started to pay attention to these issues, and we are likely to see increased oversight. In the U.S., (where most of these companies operate and vast majority of users are located, according to Google Trends), lawmakers are invoking existing consumer protection and product safety principles. The involvement of the FTC via complaints (as with Replika) indicates that deceptive claims and failure to safeguard vulnerable users could be seen as unfair business practices. Additionally, the product liability approach in lawsuits, treating harmful chatbot outputs as a “defect” in the product, is novel but could gain traction if courts find merit in those claims. U.S. senators have explicitly asked companies like Character.AI, Chai, and Replika to provide information on their safety measures and training methods. This kind of inquiry often foreshadows hearings or regulatory proposals.

Indeed, there are calls for legislation: one proposal is to update child online safety laws to encompass AI. For instance, expanding the scope of the Children’s Online Privacy Protection Act (COPPA) to require parental consent not just for data collection but also for AI interactions that pose risks, or implementing something like the Kids Online Safety Act (KOSA) (a bill that has been discussed in Congress) which would mandate stricter safety-by-design for platforms likely to be used by minors.

On the international front, the EU’s AI Act (still in draft as of 2025) may classify certain AI systems as “high risk” if they have influence over vulnerable groups; an AI companion used by children could fall in that category, implying requirements like conformity assessments and transparency obligations. The Italian ban on Replika set a precedent in Europe: by citing both data protection and child safety grounds, it signaled that regulators can and will intervene quickly when an AI app is seen as harming minors. After Italy’s actions, Replika had to institute age verification and reportedly toned down erotic content for underage profiles. This shows regulation can force changes that companies were reluctant to make voluntarily. We may anticipate more countries requiring age gating and identity verification for AI chat services with adult content.

However, age verification itself is controversial (for privacy reasons) and not foolproof: it addresses access, but not the behaviour of the AI. Regulators might also consider mandating impact assessments, i.e., before deploying an AI system broadly, companies could be required to assess risks to minors and mitigate them, similar to how toy manufacturers must warn if a toy has choking hazards for young kids.

4. Need for Transparency and Accountability

A recurring theme is that transparency is a prerequisite for accountability. Advocacy groups are urging that AI developers publish summary information about how their systems are trained, what guardrails are in place, and the results of any safety tests. With greater transparency, independent academics could audit these systems for issues (e.g., by probing them with child user scenarios to see if they behave appropriately). This is analogous to white-hat hacking for cybersecurity (experts stress-testing AI for social harms.) Some have proposed a fiduciary duty concept for AI companies toward their users, especially minors. This would legally compel companies to act in the best interest of users’ well-being rather than maximizing engagement at all costs. If such a standard existed, many of the current design practices (like endless chat loops that encourage dependency) might be deemed unethical or unlawful when applied to minors. In the meantime, consumer awareness and education are vital. Parents and young users need to be educated that AI companions are not real friends or therapists, and that they may output inappropriate content. Some safety organizations and school programs (for example, Australia’s eSafety commissioner ) have started issuing guidance about risks of AI chatbots, advising families on how to discuss these issues.

5. The Challenge of Enforcement

Even with regulations, enforcement is tricky. Many of these apps are free and easily downloadable, sometimes from developers in jurisdictions with loose oversight. Completely banning a popular app can drive it underground or lead to copycats on open-source platforms. Therefore, a collaborative approach across political lines and industry sectors is needed: industry standards, improved moderation technology, and possibly AI-driven filters that, themselves, get smarter at catching unsafe content. There is research into using AI to monitor AI (for example, secondary systems that detect when a chatbot conversation turns toward self-harm or sexual content with a minor and then intervene). OpenAI and others have published some details on their moderation systems; it might be beneficial if those advances are shared and adopted by smaller companies. Ultimately, ensuring safety in AI chat for youth may require a combination of regulatory pressure and ethical entrepreneurship. Companies must be willing to sacrifice some engagement or “edginess” for the sake of protecting users, and especially children.

Conclusion

AI character chatbots offer exciting possibilities for interactive storytelling and companionship, but when deployed to a young audience without proper safeguards, they can pose significant harm. The cases of chatbots producing sexual content for children, encouraging self-harm, or blurring reality for vulnerable teens demonstrate that current safety mechanisms are insufficient. Compounding the issue is the lack of transparency, users often have little understanding of the AI’s origins or limitations, and parents are left in the dark about what their children are experiencing in these apps.

To address these concerns, a multipronged effort is required. Developers of character-chat apps (and all engineers working with AI) must prioritize safety as a core design principle: implementing robust filtering, employing human moderators, clearly warning users of content risks, and being transparent about their systems. Transparency reports and model cards should be standard, allowing the public to know what steps have been taken to ensure the AI will not inflict harm, and developers must engage with child psychologists and ethics experts when creating youth-facing AI features. Meanwhile, policymakers and regulators should enforce baseline standards. For example, requiring age verification and parental controls by default, holding companies accountable for egregious content failures, and treating certain misuse of AI (like facilitating exploitation of minors) as unlawful. Initiatives like the FTC investigations and Senate inquiries are a start, and they signal that regulators are watching this space closely.

Educators also have a role in mitigating risks by teaching young people critical thinking around AI. Youth should learn that an AI chatbot, no matter how personable, is not an authority and can be wrong or harmful. Fostering a healthy skepticism will help teens distance themselves if a chatbot crosses lines. In the end, protecting children and teens in the era of AI companions is an urgent aspect of AI ethics. As one report noted, these AI platforms have essentially become an “online safety threat” for minors if left unchecked. Society must insist on greater responsibility from AI creators — ensuring that innovation in artificial intelligence does not come at the cost of our children’s well-being and safety.

DJ Leamen is a Machine Learning and Generative Al Developer and Computer Science student with an interest in emerging technology and ethical development.

Subscribe to my newsletter!