Forem: Muhammed Yasin Yılmaz

Building a Full Stack AI Engine From Scratch: The Architecture Behind Cevahir AI

Muhammed Yasin Yılmaz — Mon, 11 May 2026 14:28:58 +0000

Building a Full Stack AI Engine From Scratch: The Architecture Behind Cevahir AI

For the last 16 months, I’ve been building an open-source AI infrastructure project called Cevahir AI.

The original goal wasn’t simply creating another chatbot or wrapping existing APIs with a new interface. I wanted to explore something much deeper:

What would it look like to build a modular AI engine architecture from the tokenizer layer all the way to reasoning orchestration?

Most AI projects today focus on a single layer of the stack:
inference APIs,
RAG pipelines,
agent wrappers,
fine-tuning systems,
or prompt engineering workflows.

Very few projects attempt to unify tokenizer training, neural architectures, training orchestration, model lifecycle management, reasoning systems, and local inference pipelines under a single engineering structure.

Cevahir AI was created to explore exactly that problem.

The project is fully open source and designed as a modular AI infrastructure system capable of running locally and offline. Instead of focusing only on model outputs, the architecture focuses on the entire lifecycle of AI systems:
how they tokenize,
how they train,
how they reason,
how they orchestrate decisions,
and how they evolve over time.

One of the most important engineering decisions behind the project was separating responsibilities aggressively across the system.

The architecture is divided into multiple independent layers:

Tokenizer Management
Data Loader Management
Neural Network
Model Management
Training System
Training Management
Cognitive Management
Unified Cevahir Core

Each module owns a specific responsibility while remaining connected through a shared orchestration layer.

The upper-level Cevahir module acts as the production-facing API layer responsible for inference, generation, routing, memory management, and cognitive orchestration.

This separation allows training systems and inference systems to evolve independently without turning the infrastructure into a monolithic codebase.

Why I Focused on the Tokenizer Layer

One of the areas I spent the most time on was tokenizer infrastructure.

Turkish is a morphologically rich and agglutinative language. Traditional English-centric tokenization assumptions create serious fragmentation problems when applied directly to Turkish.

Instead of treating tokenization as a simple preprocessing step, I approached it as a language-aware infrastructure problem.

The tokenizer system extends traditional Byte Pair Encoding with Turkish-oriented preprocessing layers including:

Turkish lowercase normalization
Unicode NFC normalization
Morphological preprocessing
Syllable-aware fallback mechanisms
Root-suffix awareness
OOV recovery systems
Deterministic merge selection

The goal wasn’t only compression efficiency.

The real objective was reducing fragmentation while preserving semantic continuity across Turkish word structures.

Neural Architecture and Inference Design

The neural core of Cevahir AI is based on a decoder-only Transformer architecture.

The infrastructure currently supports modern LLM techniques such as:

RMSNorm
RoPE and YaRN scaling
SwiGLU
KV-Cache
Multi-Head Attention
Grouped Query Attention (GQA)
Flash Attention
Sliding Window Attention
QK-Norm
Optional Mixture of Experts (MoE)

The design philosophy here is balancing inference efficiency, scalability, VRAM optimization, and training stability without tightly coupling the system to a single architectural direction.

Rather than building a fixed model, the idea was creating an infrastructure capable of evolving over time.

Cognitive Orchestration

One of the most experimental parts of the project is the Cognitive Management layer.

I became increasingly interested in a question:

What happens after text generation?

Most systems stop once the model produces a response.
I wanted to explore architectures where inference itself could become more reflective and adaptive.

The cognitive orchestration layer combines concepts inspired by:

Chain of Thought
Tree of Thoughts
Self Consistency
ReAct
Self Refine
Constitutional AI
Retrieval-Augmented Memory

The system can route reasoning strategies dynamically, apply refinement loops, integrate memory-aware reasoning, and evaluate outputs before finalizing responses.

The long-term philosophy is simple:

Inference should not only generate.
Inference should also think.

Long-Term Vision

Cevahir AI is currently focused primarily on text infrastructure.

However, the architecture was intentionally designed to remain extensible toward:

vision tokenizer systems
audio tokenizer infrastructures
multimodal reasoning
real-time sensor processing
embodied AI systems
real-time inference pipelines interacting with physical systems

A large part of the inspiration behind this direction comes from embodied AI research such as PaLM-E, RT-2, and SayCan.

The long-term objective is not merely generating text outputs.

The goal is building modular AI infrastructure capable of perceiving, interpreting, reasoning about, and eventually interacting with the real world.

Final Thoughts

Cevahir AI is not a finished product.

It’s an ongoing exploration of what a modular full-stack AI engine architecture could look like when tokenizer systems, neural architectures, reasoning layers, training orchestration, and inference pipelines are treated as parts of the same ecosystem instead of isolated tools.

The project is open source and still evolving rapidly.

GitHub:
Click for repository

meet cevahir ai

Muhammed Yasin Yılmaz — Sat, 21 Mar 2026 19:21:43 +0000

Muhammed Yasin Yılmaz

Mar 16

Building an Open-Source AI Engine for Training Language Models — Cevahir AI

#ai #opensource #machinelearning #phyton

Comments

3 min read

Building an Open-Source AI Engine for Training Language Models — Cevahir AI

Muhammed Yasin Yılmaz — Mon, 16 Mar 2026 16:59:58 +0000

For the past months I have been building an open-source AI engine called Cevahir AI.

The goal of the project is to create a modular infrastructure for training language models from scratch. Instead of focusing only on a single model architecture, the project aims to provide a full AI production pipeline including tokenizer training, vocabulary management, neural network architecture and training orchestration.

Cevahir AI is designed as an end-to-end AI development system, allowing developers and researchers to experiment with language model training pipelines in a transparent and modular way.

The project is open-source and available on GitHub.

⸻

Project Vision

Today most modern AI systems are developed inside large organizations with complex internal infrastructures. Independent developers rarely have access to the full engineering pipeline behind language model training.

The motivation behind Cevahir AI was to build a system where the entire pipeline is visible, understandable and modifiable.

Instead of providing a single monolithic implementation, the project focuses on creating an AI engine architecture that can be extended and experimented with.

The system aims to make it easier to explore questions like:
• How tokenization pipelines affect model behavior
• How vocabulary structures evolve during training
• How neural network modules interact inside a language model system
• How training pipelines can be orchestrated in modular ways

⸻

Architecture Overview

Cevahir AI is structured as a modular AI engine composed of multiple system layers.

The architecture currently includes:
• Tokenizer management system
• Vocabulary building system
• Neural network architecture modules
• Data loader and dataset pipeline
• Training orchestration system
• Model management and persistence

The system currently contains 650+ modules organized under 12 core architecture layers.

This modular design allows different parts of the AI pipeline to evolve independently while still operating within a unified system.

⸻

Tokenization System

One of the core components of the project is the tokenizer infrastructure.

The tokenizer system includes:
• Custom BPE tokenization
• Turkish-optimized text preprocessing
• Vocabulary generation
• Token position and frequency tracking
• Dataset preparation pipeline

Unlike many simplified implementations, the tokenizer layer is designed as a production-style modular system that can be reused across multiple model experiments.

⸻

Neural Network Architecture

The neural network layer of Cevahir AI is built as a modular system rather than a single rigid architecture.

The design allows different neural components to be composed and tested in different configurations.

The architecture supports experimentation with components such as:
• Transformer-style attention layers
• Modular neural blocks
• Dynamic memory layers
• Cognitive strategy layers
• Context processing pipelines

This approach allows researchers and developers to explore new architectural ideas without rewriting the entire system.

⸻

Training Pipeline

Training orchestration is handled through a dedicated training system that coordinates:
• dataset loading
• tokenization
• vocabulary updates
• neural network training
• model checkpointing

The goal of this layer is to simulate a real AI training pipeline rather than a simplified research script.

This makes the project useful for developers who want to study how full AI systems are engineered.

⸻

Why Open Source?

One of the main goals of Cevahir AI is transparency.

Artificial intelligence development is increasingly becoming centralized in large organizations. By releasing the entire AI engine infrastructure as open source, the project aims to provide developers with the ability to study and experiment with AI systems more freely.

Open source allows the community to:
• inspect the full architecture
• suggest improvements
• experiment with new modules
• build alternative model architectures on top of the system

⸻

GitHub Repository

The full project is available here:

Github Repository

If you find the project interesting, consider leaving a star on GitHub.
Developer feedback and architectural suggestions are always welcome.