Forem: Mhamad El Itawi

🚩 Red flags series #4: Pull request monsters

Mhamad El Itawi — Wed, 10 Dec 2025 18:00:00 +0000

📌 This post is one chapter in my Red Flags series. I’m exploring the mistakes, bad practices, and subtle issues we often overlook in day-to-day development. Stay tuned for upcoming posts!

When one pull request tries to do the work of an entire sprint

A pull request monster is born when a simple change starts collecting friends. A feature here, a refactor there, a quick rename, a small cleanup and suddenly the PR grows into something huge that nobody feels ready to review. It rarely happens on purpose. Someone wants to finish a task, adds a small fix, improves something that has bothered them for months and does a bit of restructuring “while I am here.” Before long, the PR touches dozens of files and feels massive, unfocused and unpredictable.

Here is where the real pain begins. A monster PR is exhausting to read. Reviewers cannot get through it in one sitting. Important details hide inside unrelated changes. Comments scatter in every direction. The author struggles to explain everything. The reviewer struggles to understand anything. In that confusion, bugs slip through quietly. Testing becomes harder because the PR is no longer one change. It is several changes tied together with no clean separation. When something breaks after merging, identifying the root cause becomes slow and frustrating. And if the team needs a partial rollback, it turns into a nightmare because all the changes are bundled together. You cannot revert one part without dragging the rest of the mess with it.

PR monsters can be prevented. The idea is simple and powerful. Split work into smaller, focused pull requests. Keep unrelated edits out. Ship pieces early. Communicate clearly what each PR is meant to do. Let every PR solve one problem and solve it well. Smaller PRs improve review quality, reduce risk and help teams move faster with less stress.

If your pull request requires snacks, breaks and extra courage to review, it might be time to shrink it. Your teammates and your future self will be grateful.

Follow me on LinkedIn and dev.to for more practical engineering insights.

🚩 Red flags series #3: If-else endless tower

Mhamad El Itawi — Mon, 08 Dec 2025 18:00:00 +0000

📌 This post is one chapter in my Red Flags series. I’m exploring the mistakes, bad practices, and subtle issues we often overlook in day-to-day development. Stay tuned for upcoming posts!

When your logic starts stacking floors

Some code doesn’t just grow, it builds upward. A tiny “if” becomes two… then five… then fourteen, until you’re staring at an if-else tower tall enough to cast a shadow over your whole feature.

The worst part? These towers rise quietly. One quick fix here, one edge case there, and soon the real intent is buried under layers of branching. Reading it feels like climbing stairs, and testing it feels even worse: every new branch triggers a different path, mocks pile up, and the slightest change can make the whole structure wobble.

New developers don’t approach this structure with confidence, they approach it with caution, like tourists inspecting a suspicious old building.

Luckily, towers can come down.

A great first step is replacing the branching chain with a map or dictionary that acts like a clean switch: each key has its behavior, easy to see and easier to extend. When behaviors differ in meaningful ways, polymorphism is even better. Let each variation live in its own class instead of competing inside one giant conditional.

And for simple cases, the if/return strategy works wonders. Handle special conditions early, return immediately, and keep the logic flat instead of nesting it like Russian dolls.

Once the logic is split, the code becomes readable, testable, and far easier to change. Adding new behavior becomes an extension, not a renovation project. Your teammates stop tiptoeing around the file, and your future self stops sighing every time they open it.

If your conditionals require scrolling, courage, and maybe hydration, it’s probably time to bring the tower down. Your team will thank you.

Follow me on LinkedIn and dev.to for more practical engineering insights.

🚩 Red flags series #2: God functions

Mhamad El Itawi — Sun, 07 Dec 2025 18:00:00 +0000

📌 This post is one chapter in my Red Flags series. I’m exploring the mistakes, bad practices, and subtle issues we often overlook in day-to-day development. Stay tuned for upcoming posts!

When one function tries to run the universe

If you have worked in any codebase long enough, you have probably met a "god function". It is that one function that seems to know everything, do everything, control everything and occasionally summon thunder if you are not careful.

A god function usually starts with good intentions. Someone just wants to ship a feature quickly. Then a bit of validation gets added. Then a database call. Then some business logic. Then a couple of notifications. Before you know it, the function has grown into a mythical creature that nobody wants to touch during a code review.

Here is where the real pain begins. A god function mixes responsibilities that should never live together. Reading it feels like scrolling through the entire application in one file. Testing it becomes a nightmare because you need to mock or spin up half the system just to validate a tiny behavior. A simple unit test suddenly requires databases, external services, side effects and heroic patience. And once tests become difficult, people stop writing them which leads directly to more bugs and more fear around changes.

Changing one line becomes risky because everything is tightly coupled. New developers stare at it the way tourists stare at ancient monuments: with curiosity, fear and confusion.

The good news is that god functions are not eternal. The solution is straightforward, even if it takes a bit of discipline. Break them into smaller focused functions. Group related behaviors together. Move business logic away from infrastructure and side effects. Create clear boundaries and let each function do one job well.

Once the responsibilities are separated, the code becomes readable, testable and much easier to reason about. Small functions reduce cognitive load, make refactoring safer and give your team confidence to make changes without fear.

If your function requires scroll bars and multiple coffee breaks to understand it, it might be time to break the divine cycle. Your future self and your teammates will thank you.

Follow me on LinkedIn and dev.to for more practical engineering insights.

🚩 Red flags series #1: Hard-coded credentials and configuration

Mhamad El Itawi — Sat, 06 Dec 2025 19:43:04 +0000

📌 This post is one chapter in my Red Flags series. I’m exploring the mistakes, bad practices, and subtle issues we often overlook in day-to-day development. Stay tuned for upcoming posts!

When your code treats secrets like regular variables.

Hard-coding secrets and configuration values means putting sensitive data like API keys, tokens, or database URLs directly into your source code. It feels like a harmless shortcut in the moment, but once these values enter your repository, they become long-term technical debt waiting to resurface at the worst possible time.

const DB_USER = "admin";
const DB_PASSWORD = "supersecret123";
const DB_HOST = "prod-db.example.com";

const connectionString = `postgres://${DB_USER}:${DB_PASSWORD}@${DB_HOST}/app`;

The biggest issue is that you eventually forget where they are. That temporary token you dropped into a file during crunch time is now spread across branches, backups, and even your Git history. Once a secret enters version control, it becomes essentially immortal. Hard-coding also breaks environment isolation. Staging and production start behaving like they share the same apartment, and one careless commit can accidentally send staging traffic straight into prod.

Simple changes also become more painful than they should be. Updating a config value suddenly requires a commit, a PR, a review, and a deployment. And because these values live inside the code, more people need repo access, violating the principle of least privilege and increasing exposure. Compliance frameworks like HIPAA absolutely love that… just kidding, they hate it with passion.

The fix is straightforward: get secrets out of your code and into proper storage. Use environment variables, external config files, or runtime injection. Adopt a vault like HashiCorp Vault or cloud-native solutions such as AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault. In containerized environments, inject secrets through volumes or orchestrator tools so they never touch the image. This keeps access limited, rotation easy, and onboarding safer new developers don’t need sensitive values just to run the app.

Hard-coding feels fast, but it always comes back to haunt you. Put your secrets and configuration where they belong, and your systems and your audits will be much happier.

Follow me on LinkedIn and dev.to for more practical engineering insights.

How ChatGPT Was Made: Behind the Scenes of a Large Language Model

Mhamad El Itawi — Sun, 26 Oct 2025 18:49:36 +0000

Over the past few years, ChatGPT has become one of the most widely used AI tools in the world, powering everything from casual conversations to technical coding help, tutoring, writing, customer service, and more. But behind its friendly interface lies a staggering amount of complexity, engineering, and innovation.

This article pulls back the curtain on how ChatGPT was built not just as a product, but as a large language model (LLM) trained on massive datasets, guided by human feedback, and optimized for usefulness and safety. We’ll explore the key stages in its development, from gathering data and training the base model, to aligning it with human values and deploying it at scale.

Along the way, we’ll also highlight deeper insights from AI researcher Andrej Karpathy, who offers a more technical and nuanced view into how models like ChatGPT actually work under the hood.

Whether you’re a developer, a curious tech enthusiast, or someone building with AI, this guide will help you understand the full pipeline and the challenges behind the magic of ChatGPT.

🧠 What Is a Large Language Model?

A Large Language Model (LLM) is a type of artificial intelligence model trained to understand and generate human language. But unlike traditional rule-based programs, LLMs don’t follow hand-coded instructions they learn language patterns by analyzing massive amounts of text.

At its core, an LLM is a predictive engine. It doesn’t “know” things in the way humans do instead, it learns to guess the next word (or token) in a sequence based on everything that came before it. It might sound simple, but when scaled up with billions of examples and parameters, this ability leads to incredibly powerful behavior: writing essays, translating languages, answering questions, generating code, and even solving math problems.

🤖 How It Works

Most LLMs today, including ChatGPT are based on the Transformer architecture, introduced in the paper Attention is All You Need. This architecture enables the model to process entire sequences of text in parallel and “pay attention” to different parts of the input, capturing long-range dependencies and meaning.

Here’s a simplified flow:

Text is broken into tokens (sub-word units).
Each token is embedded into a high-dimensional vector.
These vectors pass through multiple layers of self-attention and feed-forward networks.
The final layer produces a probability distribution for the next token.
This process repeats until the output is complete.

📏 What Makes It Large?

The “large” in LLM refers to several factors:

Model size: billions (or even trillions) of parameters (GPT-3 has 175B, GPT-4 likely more).
Training data: hundreds of billions of tokens from books, web pages, articles, and code.
Compute resources: large clusters of GPUs or TPUs are required for training.

🧬 Emergent Capabilities

Interestingly, scaling up these models doesn’t just make them “better” it gives them new abilities that weren’t explicitly programmed in. This includes:

Multi-turn dialogue
Complex reasoning
Programming skills
Adapting tone and writing style

These emergent behaviors are a big reason why LLMs like ChatGPT feel so surprisingly capable.

📚 From the Internet to Tokens: Data Collection and Curation

Before a large language model like ChatGPT can learn anything, it needs data and a lot of it. This is the raw material from which the model learns the structure, vocabulary, logic, and quirks of human language.

But this isn’t just a copy-paste job from the internet. Collecting and curating the training data is one of the most critical and complex steps in building a reliable and safe LLM.

🌐 Where the Data Comes From

The training data for models like GPT-3 and GPT-4 spans a wide range of sources, including:

Web pages (via Common Crawl)
Wikipedia
Books (public domain and licensed)
News articles and blogs
Forums and Q&A sites
Code repositories (like GitHub) This mix gives the model a broad understanding of different writing styles, topics, and domains from casual internet slang to formal academic writing and technical documentation.

🧹 Cleaning the Data

Raw web data is messy. It contains spam, duplicated pages, broken formatting, offensive content, and sometimes personal information. That’s why heavy filtering and cleaning steps are required:

Deduplication: Removing repeated documents and boilerplate (e.g., cookie banners, templates).
Language filtering: Keeping only high-quality examples in supported languages.
Toxic content filtering: Removing hate speech, profanity, and harmful ideologies.
PII removal: Scrubbing names, phone numbers, emails, and any personally identifiable information. OpenAI and others often use custom heuristics and large-scale classifiers to automate this process at scale.

⚖️ Quality vs Quantity

There’s always a trade-off between data volume and data quality. More data generally leads to better generalization, but if the dataset includes too much low-quality or biased content, the model can learn the wrong things.

For example:

Too much code from a single language might skew the model’s programming skills.
Too many English examples can make multilingual support weaker.
Toxic forums can teach the model harmful patterns.

That’s why modern LLM training pipelines invest heavily in data curation, not just collection. The goal is to give the model the richest and most diverse understanding of language possible, without exposing it to misinformation or harmful patterns.

🔐 Closed vs Open Datasets

It’s worth noting: many companies don’t release their exact training datasets especially for models like GPT-4. This is partly due to licensing, partly due to safety concerns, and partly to maintain competitive advantage. However, open-source projects like The Pile, RedPajama, and FineWeb attempt to replicate similar datasets for public research.

How the Model Reads: Tokenization

To process human language, a large language model first needs to convert it into a format it can understand and that format is tokens. Tokenization is the first transformation that happens to any input text, and it plays a crucial role in how the model sees and reasons about language.

🧩 What Is a Token?

A token is not always a full word. It can be:

A word: "apple" → 1 token
A subword: "unbelievable" → "un", "believ", "able"
A symbol or piece of punctuation: ".", ",", "#" → 1 token each

Tokenization allows the model to efficiently handle any kind of text, including:

Misspellings
Slang or compound words
Multiple languages
Programming code

This subword-based system is more flexible than working with whole words and helps the model deal with rare or unfamiliar inputs.

⚙️ Byte Pair Encoding (BPE) and Variants

Most modern LLMs use Byte Pair Encoding (BPE) or its variants for tokenization. Here's how it works in simple terms:

Start with individual characters or bytes.
Repeatedly merge the most frequent adjacent pairs.
Build up a vocabulary of common subword units.

The result is a fixed-size vocabulary typically around 50,000 to 100,000 tokens that can efficiently encode most languages and formats.

Other models may use:

Unigram Language Model tokenization
SentencePiece
tiktoken (used in OpenAI’s models, optimized for speed and consistency) ##🧮 Why Tokenization Matters Tokenization shapes how the model “sees” everything:
It defines the input length limit (e.g., 8k, 32k, or even 128k tokens).
It affects memory efficiency and compute cost.
It impacts the granularity of language understanding.

For example:

The sentence “ChatGPT is amazing!” might be 4 tokens.
But a long piece of HTML or code might break into hundreds or thousands of tokens.

Token limits are important in production especially when prompting or working with APIs, since they directly affect what the model can “read” or “remember” at once.

💡 Fun Fact

Even small changes in punctuation or casing can produce completely different tokenizations. For instance:

"ChatGPT" might be 1 token.
"Chat GPT" could be 2 tokens.

That's why prompt formatting, especially in few-shot or chain-of-thought examples needs to be done carefully to stay within token budgets and avoid unexpected behavior.

⚙️ Pretraining: Learning to Predict

Once the data is cleaned and tokenized, the real magic begins: pretraining. This is where the model starts learning language by trying to predict the next token in a sequence over and over again, across hundreds of billions of examples.

It sounds simple: given a prompt like “The cat sat on the”, the model’s job is to predict the next most likely token (in this case, probably “mat”). But scaled up with enough data, compute, and smart architecture, this simple task turns into something powerful it becomes the foundation for reasoning, creativity, conversation, and more.

🧠 The Objective: Next-Token Prediction

The core pretraining objective is often referred to as causal language modeling:

Input: a sequence of tokens
Task: predict the next token in that sequence
Loss function: how wrong the model's prediction was, averaged over billions of tokens

This process doesn't require labeled data it's self-supervised. The structure is already embedded in the data itself, making it extremely scalable.

Over time, the model starts to capture:

Grammar and syntax
Factual knowledge
Common reasoning patterns
Relationships between words, entities, and concepts

🏗️ Architecture: The Transformer

Pretraining is powered by the Transformer, a deep neural network architecture built with layers of:

Self-attention mechanisms that let the model “look” at different parts of the sequence
Feed-forward networks to process and transform the inputs
Positional encodings so the model understands word order

The model typically has:

Hundreds of layers
Billions (or trillions) of parameters
Huge memory and compute requirements

Each layer refines the model’s internal representation of the input sequence building up from low-level syntax to high-level meaning.

🖥️ Training at Scale

This phase is computationally expensive:

Training models like GPT-3 or GPT-4 requires thousands of GPUs/TPUs running in parallel
Training can take weeks to months
Cost estimates run into the millions of dollars

Training also involves smart engineering:

Gradient checkpointing to reduce memory usage
Mixed-precision arithmetic to improve performance
Model parallelism to spread layers across multiple devices

Every token, every word, every punctuation mark is used to make the model just a little bit better at predicting what comes next.

📏 Context Window: Memory Matters

One of the most important constraints in pretraining is the context window: how many tokens the model can see at once:

GPT-3: ~2,048 tokens
GPT-4-turbo: up to 128,000 tokens (in some versions)

A longer context means the model can:

Remember more information
Handle longer prompts or conversations
Perform deeper reasoning over extended input

Context length is often the bottleneck when using LLMs in real applications, especially for long documents, multi-turn chats, or summarization tasks.

🛠️ Fine-Tuning: Teaching the Model to Be Helpful

After pretraining, the model has a general understanding of language it can write, complete sentences, and mimic patterns found in its training data. But it’s still just a raw model. It hasn’t been shaped into a helpful assistant, and it doesn’t know how to follow instructions, stay on topic, or avoid saying the wrong thing.

This is where fine-tuning comes in. It’s the phase where the model is taught to behave more like a tool, to answer questions, follow instructions, and interact conversationally.

🎓 Supervised Fine-Tuning (SFT)

The most common approach is Supervised Fine-Tuning. This involves training the model further using a curated dataset of:

Prompts: e.g., "Explain how gravity works"
Ideal responses: written by humans or rated as high quality

During this phase, the model learns:

How to format answers clearly
How to politely refuse inappropriate questions
How to stick to factual or helpful content

For example:

Pretrained model: might respond to "How do I tie a tie?" with inconsistent, partially correct answers.
Fine-tuned model: gives a step-by-step, well-structured explanation, potentially with different methods or options.

Fine-tuning teaches form, structure, and tone. it’s like a finishing school for language models.

💬 Enter ChatML: Structuring Conversations

To support multi-turn dialogue (like you see in ChatGPT), the input format is structured using ChatML, a convention for marking roles in a conversation:

[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "What's the capital of France?"},
  {"role": "assistant", "content": "The capital of France is Paris."}
]

This format:

Makes it easier for the model to understand context
Helps the system manage multiple participants (user, assistant, system instructions)
Enables customization (e.g., setting tone, style, or behavior)

It’s not part of the architecture . it’s a prompt design pattern, but it makes conversational LLMs like ChatGPT behave much more predictably.

⚠️ Limitations After Fine-Tuning

Even after fine-tuning, the model may still:

Be overconfident in its responses
Hallucinate incorrect facts
Miss subtle cues in multi-turn dialogue

That’s why fine-tuning is often followed by Reinforcement Learning with Human Feedback (RLHF) to further align the model with human judgment and expectations.

🧭 Aligning the Model with Human Values: Reinforcement Learning with Human Feedback (RLHF)

After fine-tuning, the model is much more helpful but it’s still not quite ready for the real world. It might give plausible-sounding but incorrect answers, miss social cues, or produce outputs that feel robotic or awkward. To fix this, we need a way to train it not just to be correct, but to be useful, safe, and aligned with human preferences.

That’s exactly what Reinforcement Learning with Human Feedback (RLHF) does.

👥 Step 1: Collecting Human Preferences

The process begins with a simple idea:

Ask the model to generate multiple possible responses to a given prompt.
Have human labelers rank these responses from best to worst.

For example, given: "How do I explain quantum computing to a 10-year-old?". The model might produce 4–5 different answers. Humans rank them based on:

Clarity and helpfulness
Factual accuracy
Friendliness and tone
Avoidance of jargon or confusing analogies

These ranked examples become the training data for a reward model.

🧠 Step 2: Training the Reward Model

The reward model is trained to predict what humans would prefer. It learns to assign a score to any output based on how well it matches human rankings.

This turns subjective feedback into something that can be optimized. Now, the model can be trained not just to predict the next token, but to maximize its reward score, generate responses that people actually like.

🔁 Step 3: Reinforcement Learning (PPO)

With the reward model in place, the base language model undergoes a final training loop using reinforcement learning, typically with an algorithm called Proximal Policy Optimization (PPO).

In this phase:

The model generates outputs.
The reward model scores them.
The model updates itself to increase the likelihood of producing higher-scoring answers.

This feedback loop aligns the model's behavior with human values and expectations, rather than just surface-level correctness.

🚨 Guardrails and Reward Caps

One of the risks with RL is that the model can learn to "game" the reward function producing verbose, generic, or overly cautious answers just to play it safe. To prevent this:

Engineers add reward clipping and caps to keep behavior within bounds.
They train the model to refuse harmful or inappropriate requests.
They fine-tune for humility encouraging responses like “I’m not sure” when appropriate.

RLHF isn’t perfect, but it adds a critical layer of judgment that helps the model behave more like a helpful assistant and less like an auto-completion engine.

🧪 Evaluation and Safety Testing: Stress-Testing the Model Before Deployment

After pretraining, fine-tuning, and RLHF, the model is starting to look like the assistant you know as ChatGPT. But before it’s exposed to millions of users, it needs to go through rigorous evaluation and safety testing.

Why? Because even a highly trained model can:

Hallucinate incorrect information
Respond in biased or harmful ways
Get confused by ambiguous prompts
Fail in unexpected edge cases

Evaluation helps identify and reduce those failure modes before they become real-world problems.

🧼 Automated Evaluations

Some tests can be done programmatically at scale:

Toxicity classifiers: Check whether responses contain offensive or harmful language.
Bias benchmarks: Evaluate whether the model produces unequal results across gender, race, religion, etc.
Hallucination detection: Use fact-checking models or rule-based systems to spot invented or misleading claims.
TruthfulQA & HellaSwag: Benchmark tasks designed to test factual accuracy and reasoning.

These automated evaluations help track performance across iterations and flag problematic behaviors.

🔍 Red Teaming

In addition to benchmarks, companies like OpenAI employ red teams internal or external experts whose job is to "break" the model:

Prompting it to produce dangerous instructions (e.g., building weapons)
Manipulating it into revealing sensitive content
Crafting adversarial prompts that confuse or mislead the model
Testing edge cases (e.g., medical, legal, or financial queries)

Red teaming is a kind of ethical hacking for LLMs, helping anticipate how malicious actors might misuse the model.

🤖 Calibrating Uncertainty

One of the most important alignment goals is teaching the model when not to answer. This includes:

Saying “I’m not sure” when the model lacks enough information
Avoiding speculative or made-up responses
Refusing to respond to unethical, illegal, or unsafe prompts

This behavior is usually trained during both fine-tuning and RLHF stages and verified during evaluation.

🔄 Continuous Monitoring

Even after deployment, evaluation doesn’t stop:

Production logs are monitored (often anonymously and with strict privacy safeguards)
User feedback helps identify weak points
New tests and benchmarks are added over time

Every new deployment is followed by iterations of evaluation → training → re-evaluation a continuous loop to improve quality and safety.

🧠 Reasoning and External Tools: Extending the Model’s Capabilities

ChatGPT isn't just about completing text it’s about understanding context, reasoning through problems, and even using external tools to get things right. As powerful as language models are, they still have limits. Reasoning and tool use are two of the main ways we push past those limits.

🔄 Chain-of-Thought Reasoning

One breakthrough in prompting large language models is chain-of-thought reasoning: encouraging the model to "think out loud" by explaining its steps before giving an answer.

Compare these two responses:

Without reasoning: “The answer is 42.”
With reasoning: “First, we multiply 6 by 7 to get 42. Therefore, the answer is 42.”
By breaking the problem into steps, the model:
Produces more accurate answers
Is easier to debug when it fails

Demonstrates its internal logic more transparently

This is especially useful in math, programming, logical puzzles, and multi-step instructions.

🛠️ Using Tools: Beyond the Model’s Memory

Even a powerful LLM can’t store all of human knowledge and it has no access to real-time data unless you give it some way to retrieve it. That’s where tool use comes in.

Models can be trained or prompted to call external tools like:

Web search APIs (to fetch current information)
Calculators (for precise math)
Code interpreters (for executing Python or JavaScript)
Retrieval-Augmented Generation (RAG) systems (to access custom documents or databases)

In OpenAI’s ecosystem, this shows up as:

Function calling
Browse with Bing
Code Interpreter / Python tool
File upload + document Q&A

These tools are either fine-tuned into the model or integrated via structured prompting. The model decides when and how to use them similar to a human using a calculator or browser when memory isn’t enough.

🧠 Tool Use Is Part of Alignment

Why train models to use tools? Because it:

Reduces hallucination
Increases factual accuracy
Expands capabilities without retraining the base model
Keeps the model aligned with its limitations (e.g., “I don’t know, but I can look it up”)

This is a major step toward agent-like behavior LLMs that not only generate language but act purposefully in the world through APIs, databases, and UIs.

✨ Emergent Abilities and Surprising Behavior

One of the most fascinating and still mysterious aspects of large language models is the appearance of emergent abilities. These are capabilities that weren’t explicitly programmed or taught, but seem to arise naturally when models reach a certain scale.

In other words: the bigger the model, the more surprising things it can do.

🌱 What Are Emergent Abilities?

Emergent abilities are behaviors that:

Don’t appear in smaller models
Suddenly do appear at a certain size or level of training
Often exceed expectations of what the architecture should be capable of

Some examples include:

Multi-step reasoning: solving logic puzzles, math word problems
Translation: understanding and generating text across multiple languages
Zero-shot generalization: completing tasks it was never trained on explicitly
Code generation: writing functional code in multiple languages
Style imitation: writing like Shakespeare, Tolkien, or a YouTube influencer

These skills emerge not because someone told the model how to do them, but because the model was exposed to enough examples during training to infer the patterns.

🧪 The Role of Scale

Emergent behavior tends to show up when models hit key thresholds:

Model size (parameters)
Training data volume (tokens)
Training compute (epochs × hardware)

This has led to a guiding principle in AI research:
“More data, bigger models, better results but sometimes with surprising leaps in capability.”

A smaller model might understand English syntax, but only the larger one can write legal contracts or solve geometry problems even if both were trained on similar data.

💡 Why This Matters

Emergence changes how we think about model design:

It means capabilities aren’t linear doubling model size doesn’t just make it better, it can make it different.
It forces caution a more capable model might also be more likely to generate complex, unexpected, or risky outputs.
It fuels innovation researchers keep pushing boundaries to discover what else these models might learn.

We’ve now reached the point where large language models can reason, write, plan, and problem-solve in ways that often surprise even their creators.

🚀 Deploying ChatGPT in the Real World

After training, fine-tuning, alignment, and safety testing, the model is finally ready to meet the real world. But deploying something like ChatGPT isn’t as simple as dropping a model into a server. It requires robust infrastructure, thoughtful design, and constant monitoring to ensure it’s safe, scalable, and responsive to millions of users.

🧑‍💻 How People Use ChatGPT

ChatGPT is accessible through:

The chat.openai.com web interface
OpenAI’s API (used by developers in apps, bots, plugins, etc.)
Microsoft products (e.g., Copilot in Word, Excel, GitHub)
Mobile apps on iOS and Android
Custom integrations and enterprise solutions

Each interface wraps the same core model but with added layers of formatting, safety, and interaction design.

☁️ Infrastructure at Scale

Under the hood, large models like GPT-4 require:

Massive GPU/TPU clusters for inference (not just training)
Autoscaling infrastructure to handle spikes in demand
Low-latency API orchestration to keep response times acceptable
Distributed caching, prompt streaming, and optimization tricks to minimize cost and delay

Deploying the model isn’t just about the AI, it’s a full-stack engineering challenge involving systems, networking, DevOps, and platform reliability.

🛡️ Safety Layers in Production

Even with all the training and alignment, safety can’t be fully guaranteed especially when new prompts and use cases appear daily. That’s why deployment includes multiple runtime safety systems:

Prompt moderation: filters or rejects dangerous, unethical, or harmful input
Output filtering: catches and blocks inappropriate or unsafe responses before they’re shown to the user
Rate limiting & abuse detection: protects against spam, denial-of-service attempts, and automated misuse
Policy constraints: restrict usage in sensitive areas (e.g., medical, legal, or political domains)

These layers ensure the model behaves more responsibly, even if the input tries to trick it.

🔁 Real-Time Feedback and Iteration

The work doesn’t stop after deployment. OpenAI (and similar orgs) continually improve the model through:

User feedback (thumbs up/down, comments, flags)
Anonymized usage logs to detect edge cases or regressions
Ongoing updates to prompts, training methods, moderation, and tooling

This is part of a continuous deployment cycle:

Train → Align → Test → Deploy → Monitor → Improve → Repeat

⚠️ The Challenges of Building Large Language Models

As impressive as large language models are, building them isn’t without major challenges, some technical, some ethical, and many still unsolved. These systems aren’t magic. They’re incredibly complex, expensive, and risky when handled carelessly.

Let’s look at the core challenges behind building, deploying, and maintaining models like ChatGPT.

💣 Hallucination

LLMs don’t “know” things, they generate plausible-sounding text based on patterns in data. That means they sometimes:

Make things up confidently
Fabricate sources or citations
Mislead users without realizing it

This issue is known as hallucination, and it’s one of the biggest obstacles to using LLMs in high-stakes domains like medicine, law, or science.

Efforts to mitigate this include:

Encouraging humility: responses like “I don’t know” or “I’m not sure”
Tool integration: using search, code, or databases to ground responses
Better prompting and alignment

🎭 Bias and Fairness

LLMs learn from internet-scale data and that data reflects all the biases of society:

Gender and racial stereotypes
Cultural assumptions
Political or religious slants
Offensive or exclusionary language

Even after heavy filtering, some of these patterns persist in model behavior.

To address this:

Researchers run bias benchmarks and test prompts
Models are aligned to avoid repeating harmful stereotypes
Teams aim to balance representation without over-correction

Still, bias mitigation remains an open area of research both technically and philosophically.

💰 Compute and Cost

Training models like GPT-4 takes:

Weeks or months on massive distributed GPU clusters
Tens of thousands of A100 or H100 chips
Millions of dollars in cloud costs

And that’s just training. Running the model for millions of users daily inference at scale, is also expensive.

This leads to ongoing debates about:

Accessibility (why many models aren’t open-source)
Sustainability (energy use and environmental impact)
Monopolization (only a few companies can afford to train models of this size)

🔍 Interpretability

LLMs are often described as “black boxes” they produce high-quality outputs, but we don’t always know why.

This raises difficult questions:

What caused the model to choose one answer over another?
How can we debug bad outputs?
Can we trust a system we don’t fully understand?

Researchers are exploring:

Attention visualization
Neuron analysis
Feature tracing But a clear path to full interpretability is still far away.

🔐 Privacy and Data Leaks
Because training data comes from public web sources, there’s a risk that models might:

Memorize personal or sensitive data
Reveal email addresses, phone numbers, or passwords
Reproduce private information unintentionally

To combat this:

Sensitive data is filtered or masked during training
Models are fine-tuned to avoid revealing such content
Post-training audits look for memorized patterns

Still, the risk is not zero especially in larger models trained on unfiltered or user-contributed datasets.

🔮 The Future of Language Models

We’re still in the early days of what large language models can do. ChatGPT, as powerful as it is, is just one step in a fast-moving landscape. Researchers, developers, and companies are rapidly exploring the next frontier of capabilities, architectures, and applications.

Here’s where things are headed.

🖼️ Multimodal Models

Future models won’t just understand text, they’ll process images, audio, video, and more, all in one unified interface. This is already underway:

OpenAI’s GPT-4 can handle image inputs (e.g., charts, screenshots, handwritten notes).
Google’s Gemini and Meta’s LLaVA models push toward full multimodality.
Audio and speech-based models (like Whisper and Voice Chat) are being layered into conversational systems.

This unlocks use cases like:

Visual Q&A (e.g., “What’s wrong with this graph?”)
Document understanding from scanned PDFs
Seamless voice-to-text-to-action assistants

🧠 Personalization and Memory

Right now, most LLMs are stateless, they don’t “remember” past interactions unless context is explicitly included in the prompt. But that’s changing.

New models will support:

User-specific memory: remembering preferences, goals, and past conversations
Contextual learning: adapting behavior based on previous interactions
Agent-like workflows: acting over time on your behalf, not just replying to one-off prompts

OpenAI is already rolling out memory features to select users, and more personalized assistants are on the way.

🤖 Autonomous Agents and Tool Use

LLMs are becoming more than passive responders, they’re turning into agents that can:

Make decisions
Use external tools and APIs
Perform multi-step reasoning or planning
Navigate web pages, apps, or even operating systems

This powers tools like:

AutoGPT, LangChain, and other agent frameworks
AI copilots that write code, manage tasks, or control environments
Voice-based assistants that act in real-time

These systems blur the line between model and app, they’re intelligent systems built around LLMs.

🧩 Open Source vs Proprietary Models

The ecosystem is dividing into two parallel tracks:

Closed-source giants like GPT-4, Claude, Gemini, highly capable, tightly controlled
Open-source challengers like LLaMA, Mistral, Falcon, Mixtral , lightweight, community-driven, rapidly improving

Open-source models are becoming more powerful and more accessible and are likely to dominate edge, private, and embedded AI use cases in the near future.

⚖️ Ethics, Regulation, and AI Governance

As LLMs become more influential, so does the need to ensure they’re:

Aligned with human values
Safe for all users
Respectful of privacy, consent, and legal frameworks

Governments and organizations are actively working on:

AI safety standards
Model transparency requirements
Audits, red-teaming, and public accountability

The future of LLMs isn’t just a technical challenge it’s a social and ethical one, too.

✅ Conclusion: From Prediction to Purpose

The story of how ChatGPT was made is more than just a technical achievement, it’s a glimpse into the future of how we build intelligent systems.

Instead of crafting detailed logic by hand, we now train models on human language itself, allowing them to learn from billions of examples how we speak, reason, and interact. Every stage, from collecting noisy internet data to aligning responses through human feedback reflects a new kind of engineering: one that’s less about rules, and more about shaping behavior through data, scale, and iteration.

As these systems grow more capable, so do the challenges:

How do we ensure truthfulness and safety?
How do we reduce bias while preserving diversity of thought?
How do we make models not just powerful but trustworthy and transparent?

And yet, the opportunities are equally massive. We're building tools that can teach, assist, write, reason, and even collaborate. Tools that don't just answer, they understand. That don't just compute, they communicate.

LLMs like ChatGPT are still evolving. But what’s clear is that we’re witnessing the birth of a new paradigm in computing, one where language itself becomes the interface to intelligence.

The next chapter? That’s up to all of us to write.

How to Scale Your AWS Architecture: From EC2 to Multi-Region Deployment

Mhamad El Itawi — Thu, 31 Jul 2025 07:52:25 +0000

As systems evolve, so does their architecture. What begins as a single EC2 instance can mature into a globally resilient infrastructure. This post walks through the natural progression of architectural decisions teams make as their application, traffic, and reliability needs grow.

Here’s a high-level journey from the first EC2 instance to a multi-region setup, all through the AWS lens.

1. The Humble Beginning: One EC2

Every cloud architecture begins with simplicity. The most natural first step is launching a single EC2 instance. A virtual machine that hosts both the application and the database. This setup closely mirrors a local development environment, where everything runs on the same machine. It’s quick to set up, low in cost, and easy to understand, which makes it ideal for prototypes or early-stage products.

However, this convenience comes with limitations. The application has a single point of failure, no scalability, and limited security controls. It’s a great starting point, but clearly not a structure that can handle growth, traffic spikes, or production-grade reliability.

2. Making It Reachable: DNS

Once the EC2 instance is up and running, the next logical step is making it easily accessible. By default, access is through a public IP address, a temporary, hard-to-remember string of numbers. This isn’t sustainable for users or developers.

Introducing DNS, typically through a service like Route 53, allows us to map a friendly domain name to the instance’s IP. This not only improves usability and branding, but also decouples the user-facing entry point from the underlying infrastructure. It becomes possible to change the underlying instance or later add layers like load balancers, without altering the public-facing domain. This step transforms a one-off server into a recognizable and maintainable endpoint.

3. Separating Concerns: Another EC2 for the Database

As the application grows and usage increases, the limitations of running everything on a single instance become more apparent. Resource contention starts to affect performance, database queries slow down the app, and app crashes can impact data availability.

The natural next move is to separate responsibilities by provisioning a second EC2 instance dedicated to the database. This separation improves stability and resource management. The application server can now be optimized for web traffic, while the database server is tuned for storage and queries.

Although still manually managed, this step introduces a foundational architectural principle: separation of concerns, which enables more flexibility and paves the way for future scaling and specialization.

4. Managed is Better: Migrating to RDS

Maintaining a database on an EC2 instance introduces several operational challenges, backups must be scripted, monitoring is manual, failover is complex, and scaling often involves downtime. As the architecture matures, offloading this responsibility becomes a priority.

This is where Amazon RDS (Relational Database Service) comes into play. It offers a fully managed solution for databases like MySQL, PostgreSQL, or Aurora. With built-in backups, patching, high availability (via Multi-AZ), and metrics out of the box, RDS removes much of the operational burden.

By migrating to RDS, the architecture takes a significant leap toward resilience and maintainability. It frees developers from low-level infrastructure tasks and ensures the data layer is better prepared for growth and reliability expectations.

5. Scaling Up: Vertical Scaling

As traffic and demand on the application increase, the initial EC2 instance may begin to show signs of strain. Response times grow, CPU usage spikes, and memory may become a bottleneck. The quickest and most straightforward response is vertical scaling: Upgrading the EC2 instance to a more powerful type with more CPU, memory, or network capacity.

This approach offers immediate performance gains without needing to change the application or infrastructure logic. It’s often the first form of scaling attempted because it's simple to implement and requires minimal architectural changes.

However, vertical scaling has limits. There’s a ceiling to how large a single instance can grow, and it still represents a single point of failure.

6. Scaling Out: Add Another EC2

Once vertical scaling reaches its practical limits, the natural progression is to scale out by adding a second EC2 instance. This approach distributes traffic and workload across multiple servers, improving redundancy and overall performance.

In this early stage of horizontal scaling, the setup often lacks a load balancer. Traffic may be routed manually, round-robin DNS may be used, or each server may handle specific tasks. While this offers basic fault tolerance and more compute power, it introduces complexity. Updates must be synchronized, configuration drift becomes a risk, and there’s no automatic traffic distribution.

Still, this step marks an important shift. The architecture begins transitioning from a single-server mindset to a more distributed system. A necessary foundation for scalability and availability in future stages.

7. Balancing Traffic: Load Balancer

As more EC2 instances are added to support growing demand, managing how traffic reaches them becomes increasingly important. Manually directing traffic or relying on DNS tricks is brittle and hard to maintain. This is where a load balancer becomes essential.

By introducing an Elastic Load Balancer (ELB), traffic is automatically distributed across healthy instances based on rules, health checks, and load. It abstracts the complexity of managing individual endpoints and provides a single entry point for clients.

This step not only improves reliability and performance but also enables better deployment strategies like blue/green releases and zero-downtime rollouts. It marks a critical shift toward high availability, setting the stage for auto scaling, failover, and more sophisticated routing strategies in future phases.

8. Secure the Traffic: ACM Certificates

With a load balancer in place and the application now more publicly accessible, securing traffic becomes a top priority. Encrypted communication over HTTPS is essential not only for protecting user data but also for meeting compliance standards and improving trust.

To achieve this, the architecture integrates SSL/TLS certificates using AWS Certificate Manager (ACM). These certificates can be easily provisioned and attached to the load balancer, enabling secure HTTPS connections without the need to manage keys or renewal cycles manually.

Adding HTTPS at this stage ensures that all communication between clients and the application is encrypted. It also unlocks compatibility with modern browsers, APIs, and security-conscious platforms, reinforcing the application’s readiness for production-scale usage.

9. Strengthening Security: Private Subnets

As the architecture becomes more public-facing and complex, protecting internal resources becomes critical. At this stage, the focus shifts to network-level security by restructuring the VPC and moving key components (such as EC2 instances and the RDS database) into private subnets.

A private subnet ensures that these resources are no longer directly accessible from the internet. Only the load balancer, which remains in a public subnet, handles inbound traffic and forwards it internally. This significantly reduces the attack surface and aligns with best practices for cloud security.

This move introduces the concept of a layered defense where not everything needs to be exposed, and access is granted only where absolutely necessary. It also sets up the foundation for introducing NAT gateways and more controlled outbound access in the next stage.

10. Handling Traffic Spikes: Auto Scaling

With multiple EC2 instances running behind a load balancer, the architecture is better equipped for availability, but still static in capacity. During traffic spikes, fixed instance counts may fall short, and during low-traffic periods, resources may sit idle.

To address this, Auto Scaling Groups (ASG) are introduced. Auto scaling enables the system to dynamically adjust the number of EC2 instances based on defined metrics such as CPU usage, request volume, or custom CloudWatch alarms.

This shift brings both cost efficiency and resilience. When traffic increases, new instances are automatically launched; when traffic drops, unused instances are terminated. Auto scaling also provides a safety net by replacing unhealthy instances automatically, reducing operational overhead and improving uptime.

11. Outbound Access: NAT Gateway

After moving compute resources into private subnets, a new challenge appears: these instances no longer have internet access. While this improves security, it also blocks necessary outbound communication like pulling OS updates, downloading packages, or calling external APIs.

To solve this, a NAT Gateway is introduced. It acts as a secure bridge, allowing instances in private subnets to initiate outbound connections to the internet, while still remaining unreachable from the outside world.

This step is a key piece of controlled connectivity. It balances security with operational needs, enabling critical outbound traffic without compromising the privacy and isolation of the internal network.

12. Persistent Storage: Multiple EBS Volumes

As the application’s workload diversifies, so do its storage needs. Beyond the root volume of each EC2 instance, additional storage is often required for handling logs, file uploads, temporary data, or application-specific data partitions.

To support this, the architecture begins attaching multiple EBS (Elastic Block Store) volumes to individual EC2 instances. EBS provides high-performance, persistent block storage that survives reboots and can be snapshot for backups or replication.

This step improves data organization, performance tuning, and flexibility and allowing storage to scale independently of compute. However, it introduces management overhead and remains tied to specific availability zones and individual instances, which sets the stage for shared storage solutions in the next phase.

13. Shared Storage: Moving to EFS

Managing separate EBS volumes across multiple EC2 instances can become cumbersome, especially in horizontally scaled environments. When multiple instances need to access the same files for shared media, configurations, or synchronized processing. A new solution is needed.

This is where Amazon EFS (Elastic File System) comes in. EFS provides a shared, scalable, and fully managed NFS file system that can be mounted simultaneously by multiple EC2 instances, regardless of their Availability Zone.

By adopting EFS, the architecture gains shared storage with high availability and automatic scaling, removing the need to replicate files manually or rely on external sync processes. This simplifies development, reduces storage duplication, and prepares the system for workloads that require centralized, concurrent file access.

14. Speeding Up Reads: Redis for Caching

As traffic increases and the application becomes more data-intensive, repeated database queries can become a bottleneck, slowing down response times and increasing load on the RDS instance.

To solve this, the architecture introduces a caching layer using Amazon ElastiCache with Redis. Redis is an in-memory key-value store that allows the application to quickly retrieve frequently accessed data such as session information, product listings, or user preferences without hitting the database every time.

This step greatly enhances performance and scalability, reduces database pressure, and improves overall responsiveness. It also introduces a new layer in the system design: separating fast, ephemeral data from slower, persistent storage.

15. Secure Access: Bastion Host

As more resources are moved into private subnets for security, direct SSH access to EC2 instances is no longer possible from the outside world. While this is ideal from a security standpoint, administrators still need a secure way to access these instances for debugging, deployment, or maintenance.

To enable this, a Bastion Host (also known as a jump box) is introduced. This is a single, tightly controlled EC2 instance placed in a public subnet, with strict access rules and hardened security settings. It acts as a gateway, allowing SSH access to private instances using internal networking.

The Bastion Host reinforces least privilege access principles. Instead of opening up multiple EC2 instances to the internet, only one is exposed and access is logged, audited, and minimized. It becomes the controlled entry point into the private network layer.

16. Edge Caching: CloudFront CDN

As the application gains a broader user base, especially across different geographic regions, latency becomes a noticeable concern. Serving static assets like images, stylesheets, scripts, or even cached HTML directly from EC2 or S3 can create slow load times for distant users.

To address this, Amazon CloudFront, AWS’s content delivery network (CDN) is introduced. CloudFront caches content at edge locations around the world, delivering assets from the closest point to the end user.

This dramatically improves performance, reduces bandwidth consumption, and lowers load on origin servers. It also enhances security, with built-in DDoS protection and support for signed URLs or geo-restriction. With CloudFront in place, the architecture becomes more globally responsive and efficient. A major milestone in user experience optimization.

17. Object Storage: S3 Buckets

As the system matures, storing static assets, like images, backups, logs, or user-generated content, directly on EC2 instances becomes inefficient and hard to manage. It increases storage pressure on compute resources and makes scaling more complicated.

At this point, the architecture integrates Amazon S3 (Simple Storage Service), a highly durable, scalable object storage service. S3 is designed for storing virtually unlimited data with built-in redundancy, lifecycle policies, versioning, and fine-grained access controls.

By offloading static files to S3, the application achieves better separation of concerns. EC2 instances focus solely on compute, while S3 becomes the system’s source of truth for file storage. When paired with CloudFront, S3 enables fast, global delivery of assets with low cost and minimal operational overhead.

18. Event-Driven: Lambda Triggered by S3

With S3 now acting as the central hub for object storage, new opportunities arise to automate and streamline workflows. Instead of polling for changes or running scheduled jobs, the architecture can react automatically to events.

This is where AWS Lambda comes into play. By configuring S3 to trigger Lambda functions on specific events, such as a new file upload or deletion, the system becomes event-driven. These functions can perform tasks like resizing images, generating thumbnails, scanning files, or indexing metadata, all without provisioning or managing servers.

This step adds serverless automation to the architecture, reducing operational overhead while enabling real-time responsiveness. It also introduces loosely coupled components, a powerful pattern for building scalable and maintainable systems.

19. High Availability: Multi-AZ

As reliability expectations rise, the architecture must be prepared to withstand failures at the infrastructure level, including entire data centers. AWS regions are made up of multiple Availability Zones (AZs), which are isolated locations with independent power, networking, and connectivity.

To achieve high availability, the system is restructured to span multiple AZs. EC2 instances within the Auto Scaling Group are distributed across AZs, and RDS is configured for Multi-AZ deployment, which enables synchronous replication to a standby in a different zone.

This design ensures that if one AZ goes down, the application and database remain operational through the others. It’s a critical move from single-point resilience to regional fault tolerance, minimizing downtime and improving overall system reliability.

20. Going Global: Multi-Region Deployment

After achieving high availability within a region, the next step toward true resilience and scalability is multi-region deployment. This involves replicating critical parts of the infrastructure, application servers, databases, storage, and routing, across multiple AWS regions.

Multi-region architecture improves disaster recovery, ensures low latency for global users, and provides regional failover in case of large-scale outages. DNS-level routing with Amazon Route 53 enables traffic to be directed based on latency, geography, or health checks, ensuring users always reach the closest and healthiest region.

Implementing this step involves significant planning: handling data replication (often via cross-region S3 replication or multi-region databases), syncing infrastructure, and designing for eventual consistency. But it unlocks a level of resilience, performance, and global reach that’s essential for truly mission-critical or worldwide applications.

Final Thoughts: From Simple to Scalable

This journey illustrates how an AWS architecture naturally evolves not through sudden redesigns, but through incremental, purposeful steps. Starting with a single EC2 instance, each stage solves a specific challenge: performance, availability, security, or scale.

At every point, decisions are driven by real needs, adding a database instance, introducing DNS, securing traffic, automating scaling, or enabling global reach. What begins as a basic setup grows into a robust, distributed, and highly available system capable of serving users around the world.

Not every application needs to reach the final stage right away. But understanding this progression helps teams plan ahead, avoid rework, and build systems that grow with their users and their business.

Whether you're just starting or optimizing an existing deployment, AWS provides the flexibility to scale at your own pace, one architectural decision at a time.

Start small. Grow smart.

👉 If you found this helpful or want to discuss cloud architecture further, feel free to connect with me on LinkedIn.

CI/CD for Beginners: Deploy a Static HTML Site to AWS EC2 with GitHub Actions

Mhamad El Itawi — Thu, 17 Jul 2025 22:00:27 +0000

CI/CD (Continuous Integration and Continuous Deployment) is one of the most valuable skills a developer can learn today. In this guide, you'll learn how to set up a CI/CD pipeline that automatically deploys your HTML website to an AWS EC2 instance using GitHub Actions.

By the end, every time you push code to GitHub, your website will update automatically!

🧰 Prerequisites

Make sure you have the following ready:

AWS Account
A GitHub account

🛠️ Step 1: Create and Prepare Your EC2 Instance

🔹 1.1 Launch the EC2 Instance

Head over to the EC2 Console.
Click "Launch Instance".
Fill in the form:
- Name: MyStaticSite
- AMI: Amazon Linux 2
- Instance Type: t2.micro (Eligible for Free Tier)
- Key Pair: Create or use existing (download the .pem file)
- Security Group: Allow:
  - SSH (port 22)
  - HTTP (port 80)
Click "Launch".

🔹 1.2 Connect to Your Instance

Move the PEM file to a secure location (optional)

mv ~/Downloads/your-key.pem ~/.ssh/

Set correct permissions on the key file

chmod 400 ~/.ssh/your-key.pem

SSH into your EC2 instance

ssh -i ~/.ssh/your-key.pem ec2-user@your-ec2-public-ip

If you're using Ubuntu, replace ec2-user with ubuntu.
🔑 Replace:
your-key.pem → with your actual file name
your-ec2-public-ip → with your EC2 instance’s public IP or DNS (You can find them when you click on your instance in AWS console)

🔹 1.3 Install a Web Server and grant user privilege

Amazon Linux 2:

sudo yum update -y
sudo yum install -y httpd
sudo systemctl start httpd
sudo systemctl enable httpd
sudo chown ec2-user /var/www/html

Ubuntu:

sudo apt update
sudo apt install -y apache2
sudo systemctl start apache2
sudo systemctl enable apache2
sudo chown ubuntu /var/www/html

Test it: open your EC2 public IP in your browser — you should see the Apache test page.

📁 Step 2: Push Your HTML Code to GitHub

Here's a simple structure for your project:

.
├── index.html
└── .github
    └── workflows
        └── deploy.yml

Example index.html:

<!DOCTYPE html>
<html>
<head>
  <title>Hello CI/CD</title>
</head>
<body>
  <h1>This was deployed using GitHub Actions!</h1>
</body>
</html>

Commit and push this code to your GitHub repository.

⚙️ Step 3: Create GitHub Actions Workflow

In your GitHub repo, create the file: .github/workflows/deploy.yml

name: Deploy to EC2

on:
  push:
    branches:
      - main # or 'master', based on your repo’s default branch

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v3

      - name: Copy files via SCP
        uses: appleboy/scp-action@v0.1.7
        with:
          host: ${{ secrets.EC2_HOST }}
          username: ${{ secrets.EC2_USER }}
          key: ${{ secrets.EC2_KEY }}
          source: "index.html"
          target: "/var/www/html"

This GitHub Action will trigger on every push to the main (or master, depending on your repo) branch and securely copy your index.html file to your EC2 instance using SSH.

🔐 Step 4: Add Secrets to GitHub

Go to your GitHub repository → Settings → Secrets and variables → Actions → New repository secret

Add the following secrets:

EC2_HOST → your EC2 public IP
EC2_USER → usually ec2-user (Amazon Linux) or ubuntu (Ubuntu)
EC2_KEY → paste the contents of your private key .pem file

⚠️ Never commit .pem files or expose them in public repos. GitHub Secrets are encrypted, but use IAM roles or session-based credentials for production setups.

🚀 Step 5: Push Your Code and Deploy

Now commit and push to GitHub:

git add .
git commit -m "Initial CI/CD setup"
git push origin main # or 'master' if your repo uses that

Head over to the Actions tab in your GitHub repo — you'll see the deploy job running.

When it's finished, go to http://<your-ec2-public-ip>. Your site should be live with the new HTML!

🎉 You're Done!

Congrats! You’ve set up your first CI/CD pipeline using:

GitHub Actions for automation
EC2 as your deployment target
SCP/SSH for secure transfer Now, anytime you update your HTML and push to main (or master), your changes go live automatically.

🔍 What Happens Behind the Scenes

Let’s break down what’s really going on when you push your code and GitHub Actions deploys it to EC2:

GitHub Detects a Push to main
- GitHub watches for changes to your repository.
- When you push to the main branch, it checks .github/workflows/deploy.yml.
- This file tells GitHub: “Hey, run this workflow when there’s a push.”
GitHub Actions Spins Up a Runner
- GitHub spins up a temporary virtual machine (Ubuntu in our case).
- This is where your workflow steps run.
Your Code Is Cloned
- The actions/checkout step grabs your repo’s latest code.
Secure Copy (SCP) Sends Files to EC2
- The appleboy/scp-action uses scp under the hood.
- It connects to your EC2 instance using your SSH key (EC2_KEY) and username.
- Then it uploads your index.html to the EC2 server — typically in /var/www/html.
EC2 Serves the Updated HTML
- Your Apache or Nginx web server is already running on EC2.
- It’s watching the /var/www/html directory.
- When the new index.html is uploaded, it automatically starts serving the new version.

🔹 Local equivalent:

Let’s look at what you’d do manually without CI/CD, so you can better appreciate what GitHub Actions automates.

# GitHub Actions: checkout step
git clone https://github.com/your-username/your-repo.git
cd your-repo

# GitHub Actions: SCP upload step
scp -i your-key.pem index.html ec2-user@your-ec2-ip:/var/www/html

🔹 Summary of the Flow

GitHub Repo  ---push--->  GitHub Actions
     ↓                         ↓
 deploy.yml              Ubuntu Runner
                               ↓
                       SSH into EC2 via key
                               ↓
                   Upload index.html via SCP
                               ↓
               Apache serves your site update

🧠 Bonus Ideas

Add support for multiple environments (e.g., staging, production)
Use Nginx instead of Apache for better performance
Install SSL with Let’s Encrypt (Certbot)
Automate EC2 provisioning with Terraform or CloudFormation

🙌 Final Thoughts

CI/CD doesn't have to be complicated. Starting with small projects like this builds the foundation for automating deployments in real-world applications.

🌐 Got stuck? Drop a comment below and I’ll help you out — or connect with me on LinkedIn!

Working With Legacy Code: From Survival to Mastery

Mhamad El Itawi — Tue, 08 Jul 2025 16:32:35 +0000

You open a new project, and there it is. Thousands of lines of mystery code, no tests, strange names, A README last touched in 2016. Congratulations: you’ve met legacy code.

But here’s the thing, legacy code isn’t a mess to clean up, It’s a story to uncover. And more importantly, it’s the foundation of systems that power our world: banks, airlines, hospitals, governments.

This guide will help you work with that kind of code. You’ll learn how to understand it, improve it safely, and feel more confident doing it. Because working with legacy systems isn’t just fixing things, it’s making a real difference

Let’s get into it.

1- What Is Legacy Code, Really?

When people hear “legacy code,” they often think of old code, something written in an outdated language or built many years ago. But that’s only part of the picture. Legacy code isn’t just about age. It’s more about how the code feels to work with:

It’s hard to understand.
It doesn’t have tests.
You’re afraid to change it because you might break something.

One definition that really sticks comes from Michael Feathers, in his classic book Working Effectively with Legacy Code, "Legacy code is code without tests". It might sound extreme, but it highlights the heart of the issue: if you don’t have tests, you can’t trust changes. And if you can’t trust changes, you can’t move fast or fix problems safely.

Legacy code can be a five year old project built on outdated tools, a system the current team doesn’t fully understand, or even a rushed new feature, written last week, with no tests or documentation. It isn’t about the past, it’s about risk and your ability to safely work with and improve the code in front of you.

2- Why Legacy Code Exists

It’s easy to look at legacy code and wonder, “Why would anyone write it this way?” But most legacy code didn’t start out bad. In fact, at the time it was written, it probably made perfect sense.

Legacy code exists because things change technology, teams, priorities, and deadlines. Here are some common reasons it happens:

Business pressure: Teams are often told to “just ship it,” even if that means skipping tests or long-term thinking.
Changing requirements: What worked for version 1 may no longer fit version 10.
Tech moves fast: Libraries, frameworks, and best practices evolve quickly. Code can feel outdated in just a few years.
People leave: When the original developers move on, their context often goes with them.

Blaming past developers doesn’t help. Most of the time, they were doing their best under pressure, just like we are today. The real value is in understanding why the code is the way it is so you can make it better, not just newer.

3- Common Challenges Developers Face

Working with legacy code can feel frustrating, confusing, and sometimes even risky. If you’ve ever opened a file and thought, “I have no idea what this does, but I hope I don’t break it,” you’re not alone.

Here are some of the most common challenges developers run into:

No tests: You can’t tell if your changes will break something. There’s no safety net.
Fragile code: Small changes in one place cause unexpected bugs in another.
Outdated tools and frameworks: Old dependencies can block upgrades, slow you down, or even stop things from running on modern systems.
Missing or outdated documentation: It’s hard to understand how the system works or why certain decisions were made.
Tight coupling and tangled logic: Code is often written without clear boundaries, making it hard to isolate and improve.
Fear of touching “scary” parts: Some areas are so fragile or complex that no one wants to be responsible for changing them.

All of this can slow teams down, create stress, and make even small changes feel like high-stakes moves.

But here’s the good news: these challenges are common and they’re solvable. The rest of this guide will show you how to handle them with confidence, care, and a bit of strategy.

4- Mindset Shift: How to Approach Legacy Code

Before you write a single line of code, the most important thing you can change is your mindset.

It’s easy to see legacy code as a burden or even as someone else’s mistake. But that mindset leads to frustration, blame, and burnout. Instead, try to see legacy code as an opportunity:

An opportunity to learn how the system works.
An opportunity to improve something that matters to real users.
An opportunity to leave things better than you found them.

The truth is, almost every developer, at some point, has written code that became legacy. Deadlines, business pressure, or missing knowledge these things happen to all of us.

By approaching legacy code with curiosity instead of resentment, you’ll not only enjoy the work more, but you’ll also become a stronger, more thoughtful developer.

5- First Steps When You Inherit a Legacy Codebase

The first time you inherit a legacy codebase, it can feel overwhelming like being dropped in the middle of a maze without a map. But you don’t need to figure it all out at once.

Here are simple first steps to help you get your footing:

Get the code running: Before changing anything, make sure you can build, run, and deploy the project safely (even in a local or test environment).
Look for a README or setup guide: Even outdated notes can give you clues about how things are supposed to work.
Check for existing tests: Any kind of test unit, integration, or manual scripts, can help you understand the system’s behavior.
Trace the main flows: Identify the key features or user actions and follow them through the code. This helps you spot where things really happen.
Find the seams: Seams are places in the code where you can safely make changes or add tests without affecting everything else.

You don’t need to understand every line or every module on day one. Start small. Focus on what you need to change or fix and build your understanding step by step.

6- Strategies for Working With and Improving Legacy Code

Once you’ve found your footing, the next step is figuring out how to actually work with legacy code safely and effectively. The key is to make small, steady improvements without breaking what already works.

Here are some proven strategies:

Add Characterization Tests: Before changing code, write simple tests to capture what it currently does. This way, you’ll know if you accidentally break something.
Apply the Boy Scout Rule: “Leave the code a little cleaner than you found it.” Even small improvements,renaming variables, deleting dead code add up over time.
Refactor in Small Steps: Don’t try to fix everything at once. Tackle one function, one module, or one bug at a time.
Automate Where You Can: Automated tests, linters, and static analysis tools give you confidence to make changes safely.

Think of it like cleaning an old house you don’t have to rebuild it from scratch to make it better. Fix what you touch. Improve what you need. Over time, the system becomes safer and easier to work with. It’s also important to communicate these improvements to non-technical stakeholders. Refactoring, adding tests, and paying down technical debt often don’t produce flashy new features, but they directly improve the team’s ability to move faster, reduce bugs, and avoid costly outages. When talking to business partners, frame these efforts in terms of reduced risk, improved delivery speed, and long-term cost savings, not just “cleaner code”.

7- Refactoring Patterns and Approaches

Refactoring legacy code isn’t about jumping in blindly. There are tried-and-true patterns that help you improve old systems safely, step by step. Each pattern gives you a clear approach to modernizing without breaking things or falling into the “big rewrite” trap.

Here are some of the most effective approaches you can use:

🌱 Strangler Fig Pattern

This approach lets you gradually replace parts of a legacy system by building new components alongside the old ones. Over time, the new code takes over, and the old code is “strangled” and removed. Use it:

When a full rewrite is too risky or expensive.
When you need to modernize parts of the system while keeping it running.

Example:
You want to replace an old checkout system. Instead of rewriting everything, you build a new checkout module and route some traffic to it. Little by little, you shift more users to the new code until the old system can be retired.

To visualize how the Strangler Fig Pattern works in practice, here’s a breakdown of its stages:

Stage 0 - Legacy System Only: At the beginning, your entire application is the legacy system. All traffic, business logic, and features live inside it.
Stage 1 - Introduce Strangler Facade: A Strangler Facade is added as a unified entry point (often an API gateway, proxy, or routing layer) that sits in front of the legacy system. It prepares the ground for gradual change without touching the legacy code immediately.
Stage 2 - Introduce New Component(s): New features or modules are developed separately and integrated through the Strangler Facade. Some traffic is now routed to the new components while the rest still flows through the legacy system.
Stage 3 - Expand New System, Reduce Legacy: More functionality is moved to the new system over time. The legacy system starts to shrink as new code handles more requests and business logic.
Stage 4 - Retire Legacy System: Once enough functionality has been migrated, the legacy system can be safely decommissioned. The new system now handles all operations but still passes through the facade.
Stage 5 - Retire Strangler Facade: With the legacy system gone, the facade is no longer necessary. It can be removed, leaving a clean, fully modernized system.

🔀 Branch by Abstraction

This pattern lets you change behavior behind an interface or abstraction without breaking the existing code. Both the old and new versions can coexist side by side until the new code is stable, tested, and ready for full rollout.

It’s especially useful when you need to:

Change deep business logic without disrupting current features.
Control rollout in small, reversible steps rather than risky big-bang deployments.

In this approach, the Consumer interacts only with an Abstraction Layer, a stable interface that hides the underlying implementation. Behind that abstraction, both the Legacy Component and the New Component are available. The abstraction decides which version to use, making it possible to switch between them easily or even run them in parallel.

For example, imagine you need to replace the shipping cost calculation in an e-commerce system. You create an interface—say, ShippingCalculator—and plug in both the old and new implementations. The consumer code doesn’t change. Once the new version is proven, you simply switch over and remove the old code safely.

This method allows you to write, test, and deploy the new logic without impacting users, and it gives you the flexibility to roll back quickly if needed.

🧩 Modularization

Large, messy systems often have everything bundled together. Modularization means splitting the code into smaller, focused modules or services that are easier to understand, test, and maintain. Use it:

When a single file, class, or service is doing too many things.
When you want to isolate and improve one part of the system at a time.

Example:
A legacy app has all the user management, payment processing, and notifications tangled together in one place. You start by moving the payment logic into its own module, with clear inputs and outputs.

🔗 Service Extraction

When a part of the code handles a distinct responsibility, you can extract it into a standalone service or microservice. This reduces complexity and allows teams to scale or evolve parts of the system independently. Use it:

When a piece of functionality has grown too large or different from the core app.
When you need to scale or deploy part of the system separately.

Example:
The legacy system handles payments, users, and reporting. You extract the payment feature into its own microservice with clean boundaries, separate deployments, and its own tests.

🪤 Anti-Corruption Layer (Gateway)

An Anti-Corruption Layer (ACL) is a way to protect your new code from the mess of old legacy systems. It works like a translator or a safety barrier between your clean, modern code and the outdated systems you still depend on.

Instead of letting your new code call the legacy code directly which could pull in confusing designs, bad habits, or technical debt , you build an abstraction layer. This layer handles all communication and makes sure the old system’s weirdness doesn’t leak into your new system.

Use It:

When you’re building new features that still depend on old, messy systems you can’t replace right away.
When you want to keep your new code clean, simple, and free from legacy mistakes.
When you need to control how two very different systems talk to each other.

Example:
Let’s say you’re building a new analytics dashboard, but the data lives in an old system that’s hard to work with. Instead of letting your dashboard talk to the legacy system directly, you create an Anti-Corruption Layer—a new service that handles everything in between. This layer talks to the old system, cleans up the data, and hands it to your new code in a safe, modern format.

🔄 Event-Driven Refactoring

Instead of calling other parts of the system directly, you can use events to decouple components. This reduces dependencies and makes future changes safer. Use it:

When multiple parts of the system need to react to the same action.
When you need to scale or change behavior without rewriting everything.

Example:
Right now, after an order is placed, the system directly calls the payment, inventory, and email services. You change this to publish an OrderPlaced event. Other parts of the system subscribe to that event and act independently.

8- Tools That Help Tame Legacy Code

Legacy code can feel overwhelming, but the right tools can make a big difference. They help you understand, clean, and protect your codebase as you work. These tools won’t fix legacy problems by themselves, but they give you visibility, safety, and automation, the essentials for working safely and confidently in any legacy system.

Here are some useful categories:

Linters: Tools like ESLint, RuboCop, or Pylint help enforce coding standards and catch obvious issues early.
Code Coverage Tools: Tools like Istanbul (JavaScript), JaCoCo (Java), or Coverage.py (Python) show you which parts of the code are tested and which aren’t.
Static Analyzers: Tools like SonarQube, CodeClimate, or DeepSource help identify hidden bugs, security issues, and code smells.
Dependency Checkers: Tools like Dependabot or npm audit help you spot outdated or vulnerable libraries.

These tools don’t replace good thinking, but they give you visibility and safety, two things every legacy project needs.

9- Testing Legacy Code

One of the hardest parts of working with legacy code is the constant fear of breaking something. The best way to fight that fear is to add tests before you make any big changes.

The problem is, legacy code often isn’t written in a way that makes testing easy. It might be tangled, tightly coupled, or full of side effects. But with the right techniques, you can still add valuable tests that give you confidence to move forward.

Here’s how to do it step by step:

🎯Characterization Tests

Tests that capture what the code currently does, even if that behavior is strange or incorrect. It gives you a safety net when refactoring because you’ll know if behavior accidentally changes.

@Test
public void testCharacterizeCalculateDiscount() {
    double price = 100.0;
    double discount = 0.15;

    double result = DiscountCalculator.calculate(price, discount);

    // We capture the current behavior even if it's wrong.
    assertEquals(85.0, result);
}

This lets you safely refactor the method later, knowing you won’t accidentally change its current behavior.

📸Golden Master Testing

Run the system, capture the current output, and use that as the baseline for future comparisons. It’s perfect for legacy code that’s too complex or messy to unit test easily.

@Test
public void testInvoiceGenerationGoldenMaster() throws IOException {
    String invoice = InvoiceService.generateInvoice(sampleOrder);

    String expected = Files.readString(Path.of("src/test/resources/golden_master_invoice.txt"));

    assertEquals(expected.trim(), invoice.trim());
}

Here you’re comparing the full output like a report or document to a saved correct version.

✂️Test Seams

A seam is a place where you can control behavior for testing like injecting dependencies instead of hardcoding them. It lets you add tests without rewriting the entire system.

Before (no seam):

public class PaymentProcessor {
    public void processPayment(Order order) {
        PaymentService paymentService = new PaymentService();
        paymentService.charge(order);
    }
}

After (with seam):

public class PaymentProcessor {
    private PaymentService paymentService;

    public PaymentProcessor(PaymentService paymentService) {
        this.paymentService = paymentService;
    }

    public void processPayment(Order order) {
        paymentService.charge(order);
    }
}

Now in your test:

@Test
public void testProcessPaymentWithFakeService() {
    PaymentService fakeService = Mockito.mock(PaymentService.class);
    PaymentProcessor processor = new PaymentProcessor(fakeService);

    processor.processPayment(sampleOrder);

    verify(fakeService).charge(sampleOrder);
}

By injecting the dependency, you can now test safely without hitting real services.

📝Approval Testing

A technique where you compare outputs to an “approved” version and check for unintended changes. It’s useful for complex outputs like HTML, reports, or documents where exact values can vary.

Java Example (Using ApprovalTests library):

@Test
public void testGenerateInvoiceHtml() {
    String result = InvoiceService.generateInvoiceHtml(sampleOrder);

    Approvals.verify(result);
}

Tools like ApprovalTests for Java handle storing the approved result and highlighting any differences automatically.

10- When to Refactor vs. Rewrite

At some point, every team working with legacy code faces the same question:
Should we fix what we have, or should we start over and build something new?

Starting fresh can sound exciting new tools, cleaner code, no old mess. But in reality, the choice between refactoring and rewriting is rarely simple. It’s even harder to convince the business that either option is worth the time and cost, especially when the current system is running, making money, and keeping customers happy. The usual thinking is:
“If it works, why change it?”

That’s why it’s so important to know when small, steady refactoring is the right move and when a full rewrite is truly the better path. A good rule of thumb is to always ask: “What business problem are we solving by rewriting this?”
If you can’t answer that clearly, you’re probably better off refactoring. Also you need to explain these choices in ways that non-technical people can understand.

🔧 When to Refactor

Refactoring means improving the existing codebase in small, controlled steps. You keep the core system running while making it easier to work with over time.

The system still works but is messy, hard to change, or fragile.
The business relies on it and cannot afford downtime or risky overhauls.
You need to fix bugs, add features, or improve performance gradually.
You want to lower risk while keeping the value the system already provides.

Example:
A busy e-commerce platform built 7 years ago still processes orders daily. Instead of rewriting, you add tests, break apart monolithic classes, and improve specific pain points while keeping the system live.

Briefly, Refactoring is often the safest path especially when the software is actively used and profitable.

🔄 When to Rewrite

Rewriting means starting fresh building a new system from the ground up.

The technology is outdated and actively blocks progress (e.g., can’t scale, can’t hire developers, can’t integrate with modern tools).
The architecture no longer fits the business model or growth plans.
The cost of maintaining or adding features has become higher than starting over.
The team has the time, resources, and business support to do it right.

Example:
A financial institution is running mission-critical software on COBOL with no developers left to maintain it. After careful planning, they invest in rewriting the system using modern languages and cloud infrastructure.

Briefly, Rewrites can succeed but only with strong business buy-in, careful planning, and acceptance of short-term disruption.

⚖️ The Tough Truth: It’s Hard to Convince the Business

Most businesses don’t easily invest in technical refactoring or rewrites especially when the current system works, generates revenue, and hasn’t caused visible pain.

As engineers, we see technical debt the friction that slows us down, increases risks, and raises maintenance costs. The business, however, sees working software and may not immediately feel the pain. That’s why it’s crucial to frame technical decisions in business terms: How will refactoring help us ship faster? How will it reduce incidents, improve customer satisfaction, or lower costs over time? Reframe “tech debt” as a business enabler, not just an engineering concern. That’s why you need to:

Tie technical work to business outcomes: speed, stability, new features, reduced costs.
Show small wins from refactoring before asking for major investments.
Be ready to explain the risks of inaction (security issues, scaling problems, talent gaps).

11- Security, Compliance & Risks in Legacy Code

Legacy code isn’t just about messy methods or outdated frameworks, it can also hide serious security risks and compliance issues that put the entire business at risk.

Many older systems were built in a time when security wasn’t as critical, regulations were different, and common best practices simply didn’t exist. That’s why legacy systems often carry hidden dangers that are easy to overlook until something goes wrong.

🚨 Common Risks You’ll Find in Legacy Code

Even well-running legacy systems can hide risks that expose the business to security threats and compliance issues.

Hardcoded passwords or API keys left inside the code.
Lack of encryption for sensitive data (user info, payments, medical records).
Missing input validation, which opens the door to SQL injection or cross-site scripting (XSS).
Outdated libraries with known security flaws.
No audit logs or traceability, making it hard to detect or investigate problems.
Non-compliance with current security laws or industry standards.

🛠️ How to Handle Security and Compliance in Legacy Systems

Security in legacy code can’t always be fixed overnight. But with a clear plan, you can reduce risks step by step while keeping the system running.

✅ Assess and Prioritize:

Use security scans and static analysis tools (e.g., OWASP ZAP, Snyk, SonarQube).
Check for vulnerable dependencies and outdated libraries.
Make a list of regulatory requirements that apply to your system.
Focus first on fixing issues that carry the highest risk to the business.
Even small wins like removing hardcoded secrets add up.

✅ Isolate and Contain:

Identify risky parts of the system (handling payments, personal data, etc.).
Add boundaries using APIs, service layers, or access controls to contain risks.
Apply patterns like the Strangler Fig to move sensitive features into safer, modern code over time.

✅ Fix and Improve Incrementally:

Fix small security gaps as you touch the code (the Boy Scout Rule).
Add security checks into your CI/CD pipeline to prevent new problems.
Introduce basic logging and monitoring to spot issues early.

✅ Communicate with the Business:

Explain security risks in plain language, not just “technical debt” but real risks: data loss, legal penalties, customer trust.
Show how small improvements protect revenue and reputation without needing a costly rebuild.
Document known risks if you can’t fix everything right away.

12- Code Archaeology: Digging Into the Past

Working with legacy code often feels less like modern software development and more like archaeology, digging through layers of decisions, patches, and quick fixes made over the years.

To improve legacy code safely, you first need to understand its history: why it was written the way it was, what decisions shaped it, and which parts are still important today.

This process is sometimes called code archaeology and it's a powerful skill that can save you from costly mistakes.

🏺 How to Do Code Archaeology

Here’s how you can dig into the past and make sense of legacy code:

Use Version Control as Your Map: Tools like git log, git blame, and pull request history can tell you who changed what, when, and why. Look for patterns: recurring bug fixes, old TODOs, unexplained changes
Check the Changelogs and Documentation (if any): Sometimes old release notes, wikis, or even comments in the code hold valuable clues. Business context hidden in these notes can explain Weird workarounds, Strange decisions
Talk to the “Historians”: If possible, find teammates (past or present) who worked on the system. Even short conversations can reveal decisions that aren’t written down anywhere
Identify Code Tombs and Dead Ends: Look for code that no longer serves any real purpose, leftovers from old features, forgotten experiments, half-removed functionality. Remove them carefully or quarantine them to reduce confusion.

13- Legacy Systems Across Contexts

Legacy code looks different depending on the industry, technology, and business environment. Each type of system comes with its own challenges and the way you approach it should match its context.
Here’s a quick comparison to help you understand the differences:

Context	Common Traits	Challenges	Suggested Approach
🔌 Embedded Systems	Low-level languages (C/C++), runs on hardware, long lifespan	Hard to update, expensive to test, safety-critical	Use emulators, introduce test seams, prioritize stability
🏦 Finance, Insurance, Government	COBOL, mainframes, large relational databases, strict regulations	Downtime not acceptable, skills shortage, compliance-heavy	Gradual modernization, API layers, risk-managed refactoring
🌐 Web & Enterprise Applications	Older stacks (.NET, PHP, Java, Rails), monoliths, inconsistent code quality	Technical debt, slow delivery, fragile deployments	Modularization, microservices, automated testing, CI/CD
🏥 Healthcare, Aerospace, Critical Systems	High safety, strict regulations, tightly integrated software & hardware	Change is slow, failures can have severe consequences	Focus on risk management, strong testing, traceability

14- Team & Organizational Culture Around Legacy Code

Legacy code isn’t just a technical problem it’s often a team and culture problem too.

Without the right mindset, teams fall into habits like blaming past developers, avoiding the code, or accepting that “it’s just how things are.” This leads to frustration, fear, and a system that only gets worse over time.

🚫 Unhealthy Legacy Code Cultures:

Some team habits can make legacy code even harder to deal with. Here are the warning signs to watch for.

That’s not my problem: no one takes ownership.
Fear of touching the code: change only happens when something breaks.
No time for refactoring: teams are stuck in “just ship it” mode.
Blaming the past: instead of improving the present.

🧑‍🤝‍🧑 Healthy Legacy Code Cultures:

A healthy culture is just as important as good code when working with legacy systems. Here are some key traits of teams that thrive in legacy environments:

Shared Ownership: Everyone feels responsible for the health of the codebase not just one person or team.
Psychological Safety: People feel safe to suggest improvements without fear of blame or judgment.
Small Wins Matter: Teams celebrate gradual improvements even deleting one dead function is a win.
Continuous Learning: The team values learning from the past, not shaming it. They take time to reflect on why legacy decisions were made and how to improve them.
Technical Debt as a Business Concern: Leadership understands that technical debt affects speed, stability, and innovation. It's not “just a developer problem.”

A key part of building this culture is teaching teams to regularly share progress on legacy improvements even when the work is invisible. This could mean showing increased test coverage, fewer production incidents, or faster release cycles. When technical teams learn to translate their wins into business outcomes, leadership is far more likely to support ongoing improvement efforts. Look for opportunities to share these small wins during retrospectives, sprint demos, and planning meetings. It helps make invisible progress visible.

✅ How to Build a Better Culture Around Legacy Code:

A healthy approach to legacy code starts with the right team mindset, Here’s how to build it.

Encourage small, constant improvements: Apply the Boy Scout Rule leave things better than you found them.
Lead by example: Senior engineers and tech leads should model healthy behaviors writing tests, refactoring safely, and sharing knowledge.
Talk openly about technical debt: Make it part of sprint planning, retrospectives, and roadmaps.
Pair programming and code reviews: Use them not just to check code but to transfer legacy knowledge.
Celebrate wins: Removing 100 lines of useless code is just as valuable as shipping a shiny new feature.

15- Future-Proofing: Writing Today’s Code to Avoid Tomorrow’s Legacy

Every piece of code we write today is a potential legacy system of tomorrow. The truth is, no matter how modern your tools or frameworks are, code can become legacy faster than you think especially if it's hard to understand, test, or change.

The good news? You can’t predict the future, but you can make choices today that make life easier for the next team including your future self.

🏗️ Practical Tips for Future-Proofing Your Code:

For clean code that stays adaptable and easy to maintain over time:

Write Tests Early and Often: Code without tests becomes fragile faster than you expect.
Keep It Simple: Avoid over-engineering. Clear, readable code lasts longer than “clever” solutions.
Document Decisions, Not Just Code: Explain why things are done a certain way not just how.
Design for Change: Small, loosely coupled components are easier to upgrade, replace, or retire.
Name Things Clearly: Future developers (even you) should understand what something does without guessing.
Leave Good Traces: Update READMEs, commit messages, and diagrams when you change something meaningful.

Finally remember the goal isn’t perfection. It’s writing code that is easy to read, safe to change, and kind to those who come after you.. That’s how you avoid creating the legacy code nightmares of the future. And don’t forget to communicate these choices, leaving good documentation and sharing your reasoning helps future teams (and stakeholders) understand why things were done a certain way.

16- Conclusion: Legacy Code as an Inheritance

Working with legacy code isn’t a punishment it’s a fundamental part of being a professional software engineer. Beneath every tangled method and outdated framework lies something important: the story of how systems came to be, how businesses grew, and how real-world needs shaped the software we inherit today.

It’s easy to look at legacy code with frustration or even dread. But the truth is, legacy systems are the backbone of industries that impact millions of lives. Banks, airlines, hospitals, governments they all run on code that someone, somewhere, once wrote under pressure, with the best intentions and the constraints of their time.

The best developers aren’t just those who build new, shiny things. They’re the ones who can safely, thoughtfully, and patiently improve what already exists. They know that real impact often comes not from starting over, but from making steady, careful changes that honor the systems people rely on every day.

If there’s one thing I hope you take away, it’s this: legacy code is not your enemy it’s your inheritance. And the skills you build working with it are the same skills that will make you a more resilient, thoughtful, and valuable engineer for years to come.

If you’d like to explore these ideas further, here are some excellent books that have stood the test of time, just like the systems we work on:

Working Effectively with Legacy Code - Michael Feathers
Refactoring: Improving the Design of Existing Code - Martin Fowler
Clean Code - Robert C. Martin
Building Evolutionary Architectures - Neal Ford, Rebecca Parsons, Patrick Kua

In the end, the legacy code you improve today might just be the foundation that supports someone else’s future success tomorrow.

🌐 For more tech insights, you can find me on LinkedIn.

The Leaderboard Illusion: Is Your Ai Model Smart or Just Well-Studied?

Mhamad El Itawi — Sat, 28 Jun 2025 13:43:09 +0000

Leaderboards are everywhere in AI these days. They help us compare models, track progress, and decide which ones are worth our time and resources. But sometimes, a model's top score might raise an eyebrow—almost like it knew the answers ahead of time.

It’s easy to assume the highest-ranked models are the smartest or most capable. But in reality, there’s a subtle issue that can throw these rankings off. And while it might sound like cutting corners, it’s not always that simple—or even wrong.

In this article, we’ll take a closer look at how this issue impacts model evaluations, why it’s more common than you might think, and how, when handled carefully, it can actually make models more useful in practice.

🧠 What Is Data Contamination?

Data contamination in AI refers to situations where information that shouldn't be present during model training accidentally influences the learning process, leading to misleadingly good performance and poor generalization.

In this article, we focus specifically on one type of data contamination: when a model is trained on the same data that’s later used to evaluate it.

Think of training an AI model like preparing a student for an exam.

Now imagine if that student had access to the exact exam questions during their study sessions. On test day, they ace the exam—not because they deeply understand the material, but because they memorized the answers.

This is what data contamination means in AI:
A model is evaluated on the same data it saw during training, so the high score might just reflect memorization, not true skill or reasoning.

📉 Why Is It a Problem?

If a model scores 95% on a contaminated benchmark, it doesn't necessarily mean it will perform that well on real-world tasks. The model might only be good at repeating what it has seen, not generalizing to new, unseen problems.

That’s like hiring someone based on their exam score, only to find out they can't solve any new problems—just the ones from past papers.

🤔 So Why Do Model Providers Still Train on Benchmarks?

Great question! Here's why it's not always wrong—and can even be strategic and beneficial:

Improving Real Performance : Some benchmarks are built from high-quality, real-world problems. Including them in training can genuinely help the model become more useful in actual applications. It’s like giving a student the best practice problems—not to cheat, but to prepare them better.
The Model Will Face Similar Tasks: If users will likely ask questions similar to benchmark data, it makes sense to prepare the model with those examples, ensuring better user experience.
Everyone Does It (Inadvertently): Most modern models are trained on huge datasets scraped from the internet. If benchmark data was online (papers, datasets, blog posts), it may get included accidentally. This isn’t malicious—it’s just hard to control.
Strategic Final Training: Many developers do intentional “final tuning” on benchmarks right before release. It's a bit like a student cramming before an exam—not ideal for evaluation, but great for last-mile polish before the model is put into the real world.
Users Don’t Evaluate Models—They Use Them: Ultimately, users care about how well a model works, not whether it was trained “purely.” If training on benchmarks makes the model more helpful, safer, or smarter, that’s a net positive for most practical use cases.

📃Data Contamination Detection Techniques

Proving that a model has seen test data during training isn’t always straightforward—especially when the overlap isn’t exact. Researchers have developed a variety of techniques to detect possible contamination, ranging from direct data matching to more nuanced behavioral analysis. Among these, N-gram Overlap and Perplexity Analysis stand out as two of the most insightful and accessible methods. They help reveal whether a model’s performance is based on true generalization—or subtle memorization of familiar patterns. Let’s take a closer look at how these techniques work, along with examples to make them easier to understand.

1. N-gram Overlap

An n-gram is a short sequence of n consecutive words. For instance:

The sentence: "Artificial intelligence is transforming industries"
2-grams: “Artificial intelligence,” “intelligence is,” “is transforming,” “transforming industries”
3-grams: “Artificial intelligence is,” “intelligence is transforming,” “is transforming industries”

To check for contamination, researchers compare the n-grams in benchmark datasets (used for evaluation) against those in the model’s training data. If many of the same word sequences appear, even without matching the full sentence, that suggests the model may have learned to recognize and rely on familiar phrasing — rather than understanding the meaning from scratch.

2. Perplexity Analysis

Perplexity is a measure of how surprised a language model is when it sees a piece of text. More technically, it reflects how confidently the model can predict each next word in a sentence.

Low perplexity = the model finds the text very predictable → likely it's seen it (or something very similar) before
High perplexity = the model is uncertain → it’s encountering unfamiliar or novel phrasing

Suppose a model reads: “Photosynthesis is the process by which plants convert light into energy.”
If the model assigns very low perplexity to this sentence, that likely means it has seen this exact phrasing, or very close variations, during training.

Now, if this sentence comes from a test benchmark, that low perplexity could be a signal of contamination. The model didn’t have to reason about the answer it just recognized a familiar sentence.

⚖️ The Balanced Takeaway

Training on exam questions (i.e., benchmarks) can be misleading if used to brag about scores—but perfectly valid if the goal is to make the model better for actual tasks.

So it’s not inherently wrong—what matters is transparency. If a model is trained on test data, developers should disclose it so evaluations can be interpreted honestly.

🌐 For more tech insights, you can find me on LinkedIn.

How to Choose the Right AI Model for Your Use Case (Without Going Crazy)

Mhamad El Itawi — Tue, 17 Jun 2025 17:10:50 +0000

You're building with AI — maybe a chatbot, an agent, a writing assistant, or something more experimental. The code is coming together, the idea is taking shape… and then the real question hits:

“Which model should I actually use?”

Suddenly, you’re lost in a jungle of names: GPT-4, GROK, Mistral, Claude, LLaMA, Gemma... Some are open source. Some are locked behind APIs. Some are fast, others smart, all of them marketed like they’re magic.

And every source seems to offer conflicting advice. The truth is:

It’s not about picking the best model in the world — it’s about picking the best model for your job.

This post is a practical, developer-focused approach to making smart model choices — without the confusion, wasted resources, or marketing noise. It’s inspired by Chip Huyen’s book, AI Engineering: Building Applications with Foundation Models.

🎯Start With What You Need

Before diving into model comparisons, define what success looks like for your application. Not hype-worthy demos. What matters is what works for your users — and your goals

Ask yourself:

What kind of results do I need? (Accuracy, creativity, safety, etc.)
What are my non-negotiables? (Privacy, low latency, low cost?)
What kind of hardware or budget do I have?
Do I want to use an API or run the model myself?

This might seem obvious, but skipping this step is why so many teams waste time testing the wrong models.

🧠 Model Selection Is Not One-and-Done

Picking a model isn’t a one-time thing. You’ll probably test and switch models multiple times as your app grows.

For example , You might start testing with a big fancy model to see if your idea even works. Then try smaller, cheaper models to save cost. Maybe later, you want to finetune a model for better results.

You’ll keep coming back to this decision—so don’t stress about getting it “perfect” the first time.

Here’s the core process most teams follow:

Find the best achievable performance
Map models along cost–performance trade-offs
Choose the best model for your needs and budget

💡 Hard vs. Soft Requirements

Think of model features in two buckets:

Hard stuff (can’t change easily)	Soft stuff (can improve or tweak)
Model license, training data, size	Accuracy, speed, safety
API vs. self-hosted	Factual quality, response tone
Where data is processed (local or cloud)	Toxicity, helpfulness

Example:
Latency is a soft issue if you host the model and can optimize it. But it’s hard if the model is on someone else’s API and you have no control.

Build or Buy? Use APIs or Run Your Own Model?

Here’s the classic question:
Should I use a commercial model through an API, or host an open-source model myself?

There’s no one right answer—it depends on what matters most to you.
✅ Using Commercial APIs (like OpenAI, Anthropic, etc.)
Pros:

Easy to get started
No server headaches
Great performance, usually

Cons:

You don’t control the model
Can’t tweak everything
Expensive at scale
Privacy/legal concerns

✅ Hosting Open Source Models
Pros:

Full control
Better privacy (data stays with you)
You can finetune or modify as needed

Cons:

Harder to set up
You need infra, GPUs, and time
May not match top commercial models in raw power

🧠 Ask Yourself:

How sensitive is your data?
Do you need full control or flexibility?
What’s your team’s technical skill level?
How fast do you need to scale?

Licensing: The Fine Print That Can Mess You Up

Not all “open-source” models are created equal. Some only share their weights (how the model behaves), but not the training data (what it learned from).

Before using a model, ask:

Can I use this model for commercial stuff?
Can I use its output to train other models?
Are there limits on user count or distribution?

Read the license (or ask your lawyer). Some models seem open, but have tricky clauses. Better safe than sorry.

Benchmarks and Leaderboards: Helpful Guides, Not Final Answers

You’ll see lots of leaderboards and benchmarks (like MMLU, TruthfulQA, GSM8K). These test models on different tasks—math, reasoning, trivia, etc.

These are useful for:

Spotting obviously bad models
Tracking model progress over time
Getting a rough sense of model strengths

But here’s the thing:
Leaderboards are helpful to narrow down options, not to pick your final model.

Problems with Benchmarks:

Data contamination, models might memorize test data (link)
Benchmarks don’t cover all use cases
A high score doesn’t mean the model will work well for you

Imagine you’re building a chatbot. A model that does well on math quizzes might still give awful answers to your customers.

Create Your Own Evaluation Tests

Once you’ve picked a few promising models, the best thing to do is run your own tests, using your own data.

Steps:

Pick real tasks your model needs to handle.
Write test prompts (e.g., customer questions, documents to summarize).
Define what good looks like (Accuracy? Speed? Tone?)
Compare models side-by-side.

Don’t rely only on numbers—look at outputs with your own eyes. Real-world behavior matters more than benchmark charts.

Beware of Hidden Costs and Tradeoffs

Let’s break it down:

Feature	Commercial APIs	Open Source (Self-hosted)
🔐 Data privacy	Risky (your data leaves your system)	Safe (you control everything)
💪 Performance	Top models available	Slightly behind but improving
💻 Setup effort	Very low	Medium to high
💸 Cost	Pay per use (can get expensive fast)	Higher setup, lower variable cost
🎯 Customization	Limited	Full control
🧠 Transparency	Black box	You can inspect everything
🛰️ On device deployment	Nope	Possible (if small enough)

Choose based on what’s most important to you. Some teams start with APIs, then switch to self-hosting later.

Watch Out for Model Changes

When you use commercial APIs, the model can change without warning.

Example:
OpenAI might update GPT-4, and suddenly your prompt stops working the same way. It’s happened before. If stability matters to you, this can be a problem.

With open-source models, you can “freeze” the version and always get the same result.

Final Thoughts: The Best Model Is the One That Works for You

Model selection is not a one-time decision—it’s a continuous process of experimentation, evaluation, and iteration. While leaderboards, benchmarks, and market buzz can guide you, the right model is the one that delivers value for your use case under your constraints.

Remember:

Pick models based on your actual needs.
Run your own evaluations.
Be ready to switch when things change.
Keep privacy, cost, and control in mind.



[0. Start]
  ↓
[1. Filter models by hard requirements]
  ↓
[2. Compare public benchmark data]
  ↓
[3. Run your own evaluation tests]
  ↓
[4. Monitor in production & iterate]
  ↓
[5.Retry if needed]

 🌐 For more tech insights, you can find me on [LinkedIn](https://www.linkedin.com/in/mhamadelitawi).

Payment Gateway Integrations tips

Mhamad El Itawi — Wed, 02 Apr 2025 14:34:16 +0000

Over the past two years, I’ve worked extensively on integrating various payment gateways from different countries. Each integration came with its own challenges, and I’ve gathered key lessons that might help others working on similar projects.

1. Documentation Can Be Wrong or Outdated

Many payment gateways provide documentation, but I’ve learned that it’s not always accurate or up to date. Always verify the information through testing and, when in doubt, reach out to support.

2. Draw the Flow Before Coding

Jumping straight into coding can lead to unnecessary back-and-forth. Instead, take the time to design the integration flow first. This helps in understanding edge cases and ensures a smoother development process.

3. Check the Sandbox Account Before Starting

Not all sandbox environments are ready to use immediately. Some require approval or additional configurations. Make sure to check before you begin coding to avoid delays.

4. Be Cautious with Google Translate

When working with non-English documentation, using a translation tool like Google Translate is helpful. However, be careful—it might change variable names or other critical values, leading to unexpected errors.

5. Don’t Limit Order Creation to Payment Success

Ensure orders are created even if payment is pending or fails. This allows for proper tracking, customer notifications, and potential retry attempts.

6. Pay Attention to 3D Secure

Some gateways have 3D Secure enabled by default, while others require explicit configuration. Ensure you understand how it works for each provider to avoid unexpected declines or authentication issues.

7. Log All Requests and Responses

Always log all interactions with the payment gateway, including both successful and failed requests. These logs are invaluable when troubleshooting errors or disputes.

8. Handle Async Integrations with Care

For integrations that rely on asynchronous processes like webhooks, be aware that failures might not always be communicated properly. Assume that some notifications may be lost and design retry mechanisms where possible.

9. Log All Webhooks

Since webhooks are a crucial part of many payment integrations, logging every incoming webhook helps in debugging missing or incorrect transactions. This ensures that no critical information is lost.

10. Understand Local Regulations

Some countries have specific regulations around payment processing, such as PCI DSS compliance, data localization laws, or strong customer authentication (SCA). Be aware of these when integrating with gateways in different regions.

11. Test with Multiple Payment Methods

Many gateways support different payment methods (credit/debit cards, wallets, bank transfers, etc.). Ensure you test all applicable methods, as some might have different behaviors or require extra steps.

12. Validate Currency and Conversion Handling

If the gateway supports multiple currencies, check how conversions and fees are handled. Some gateways automatically convert amounts, while others require explicit handling.

13. Retry Logic for Transient Failures

Network failures, temporary gateway issues, or timeouts can occur. Implement a robust retry mechanism with exponential backoff for recoverable errors.

14. Monitor for Fraud Prevention Mechanisms

Some payment gateways have built-in fraud detection, while others require external tools. Be sure you understand how risk scoring, velocity checks, and blacklisting work.

15. Ensure Idempotency for API Requests

For operations like capturing a payment or issuing a refund, ensure requests are idempotent (i.e., repeated requests do not result in duplicate transactions). Some gateways provide idempotency keys to help with this.

16. Handle Webhook Security Properly

Always verify the authenticity of webhooks using signature verification or other security measures. This prevents attackers from sending fake webhook requests.

17. Build a Robust Error Handling Strategy

Clearly define how to handle different error codes, declined transactions, and exceptions. Consider providing detailed failure messages for users to improve their experience.

18. Monitor and Set Up Alerts for Payment Failures

Use logging and monitoring tools to track payment failures in real-time. Setting up alerts for unusual failure rates can help detect potential issues early.

19. Plan for Scaling and High Volume Transactions

If you're dealing with high transaction volumes, ensure that your integration can handle concurrency, rate limits, and load spikes efficiently.

20. Read User and Developer Community Feedback

Sometimes, real-world issues are discussed in developer forums, GitHub issues, or Stack Overflow. Checking these sources can help you avoid common pitfalls.

21. Consider User Experience (UX) in Payment Flow

A smooth and intuitive checkout process can reduce cart abandonment. Ensure the user experience is seamless across different devices and browsers.

22. Be Aware of Gateway-Specific Rate Limits

Many payment gateways impose rate limits on API requests. Exceeding these limits can result in blocked transactions. Implement rate-limiting strategies or queue requests efficiently.

23. Plan for Subscription and Recurring Payments

If your platform supports subscriptions, ensure proper handling of recurring transactions, failed payments, and customer notifications for renewals or card expirations.

24. Keep an Eye on Settlement and Reconciliation

Some payment gateways process transactions instantly, while others take days to settle. Implement automated reconciliation to match incoming payments with expected transactions.

25. Support Multiple Gateways for Redundancy

Relying on a single payment provider can be risky. Consider integrating multiple gateways to ensure uptime and avoid disruptions if one provider experiences downtime.

26. Maintain a Documented Flow

Keeping a detailed document about the integration flow helps in debugging, future maintenance, and onboarding new team members. A well-documented process can save hours of troubleshooting.

By following these practices, I’ve built more reliable payment integrations. If you’re working on a similar project, I hope these insights help!

🌐 For more tech insights, you can find me on LinkedIn.

Shutdown EC2 servers when unused

Mhamad El Itawi — Sat, 08 Feb 2025 13:05:17 +0000

In the company where I was working, the process of development passes by 4 different environments:

Local Environment: it’s where the developers usually develop their applications
Dev Environment: an environment for upcoming or under development features. where all features are deployed and tested together
Test Environment: an environment used by non developers, mainly sales teams and POs
Production Environment

The process introduced the need to deploy the solution on multiple EC2 which unfortunately increased the development cost. Specially that we are using powerful EC2.

In order to reduce costs, the idea was simple : Shutdown the server when unused.

The first idea was to make servers available in a specific timeframe, like making them available between 9:00 AM and 6:00 PM. But personally I still believed there was a room for optimization. For example : why do the servers need to be available if the tester is on vacation? what if the person has other tasks to handle?

Proposed Architecture

My proposition was to have 2 simples APIs, one for server shutdown and one for startup. These APIs are triggered from a web interface. The shutdown’s one is also triggered by a cron job just in case the employee forgot it when he finishes. Finally, the server is configured with a startup script that runs on restart with root privilege.

Implementation

The process of implementation will be like this:

Creating an IAM policy and Role
Creating and configuring lambdas
Configuring the event bridge
Creating lambda’s client
Configuring the cloud watch log
Configuring the startup script

Creating IAM policy and Role

Choose role from IAM interface and add the following. Replace accountID and instanceId by your own.

The following policy gives the permission to start and stop specific instances, in addition to the ability to write logs to CloudWatch.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "ec2:StartInstances",
                "ec2:StopInstances"
            ],
            "Resource": "arn:aws:ec2:*:accountID:instance/i-instanceID"
        }
    ]
}

After that go to Role, select AWS service as a trusted entity and lambda as use case, choose the policy created and create the role

Create Lambda functions

As mentioned before 2 lambdas functions will be created. one for shutown and one for startup.

Let’s start by the shutdown.

Create a Lambda function
Choose Author from scratch
Set runtime Node (the following code is tested on 12.X)
Assign the role you created previously
From advanced options choose enable function URL. This new feature will allow you to proceed without the need of an API gateway
Set Auth type to NONE (internal app with non critical servers)
Enable CORS
Set the following as a lambda code and deploy it:

const AWS = require('aws-sdk');
exports.handler = (event, context, callback) => {
    var instanceId = null;
    var instanceRegion = null;
    if(event.instanceRegion != null && event.instanceId != null){
        instanceId = event.instanceId;
        instanceRegion = event.instanceRegion;
    } else {
        var obj = JSON.parse(event.body);
        instanceId = obj.instanceId;
        instanceRegion = obj.instanceRegion;
    }
    const ec2 = new AWS.EC2({ region: instanceRegion });
    ec2.stopInstances({ InstanceIds: [instanceId] }).promise()
        .then(() => callback(null, `Successfully stopped ${instanceId}`))
        .catch(err => callback(err));
};

For the startup lambda follow the same steps but add the following code:

const AWS = require('aws-sdk');
exports.handler = (event, context, callback) => {
    var instanceId = null;
    var instanceRegion = null;
    if(event.instanceRegion != null && event.instanceId != null) {
        instanceId = event.instanceId;
        instanceRegion = event.instanceRegion;
    } else {
        var obj = JSON.parse(event.body);
        instanceId = obj.instanceId;
        instanceRegion = obj.instanceRegion;
    }
    const ec2 = new AWS.EC2({ region: instanceRegion });
    ec2.startInstances({ InstanceIds: [instanceId] }).promise()
        .then(() => callback(null, `Successfully started ${instanceId}`))
        .catch(err => callback(err));
};

To test it following is the body of the request (replace instanceRegion and instanceID by yours):

{
    "instanceRegion": "instanceRegion",
    "instanceId":   "i-instanceID"
}

Creating Lambda's client

In order to invoke lambda you need a client. There are many ways to do it, but in order to keep it simple we will use a simple HTML/JS code. It’s a simple POST request. Feel free to use my boilerplate code from my github. The solution can be hosted on S3.

Configuring Event bridge

This step is a plan B in case the employee forgot to shutdown the server after testing.

Create a new rule and set data as following:

For the cron pattern:

Configuring cloudwatch logs

When lambda’s functions are invoked, they will generate logs. Just make sure to set retention time to the values that suit your business. In order to do it: cloudWatch > Logs > your log path > Retention

Creating the startup script

You may need to run some scripts on startup in order to boot some services.

AWS provides something called EC2 user-data. By default this script runs on the creation of the machine with root privilege. It can be modified to make it run on restart. If you want to do it refer to the AWS article.

Personally, I used a feature on the ubuntu server, and created a service.

First, create the service file as in the template below in /etc/systemd/system

sudo nano /etc/systemd/system/servicefile.service

And the Template as:

[Unit]
Description = ~Name of the service~

[Service]
WorkingDirectory= ~directory of working file~
ExecStart= ~directory~/filename.sh

[Install]
WantedBy=multi-user.target

Start the service by

systemctl start servicefile.service

To enable on startup

systemctl enable servicefile.service

Cost Assessment

The beauty here is that all the components used are serverless. The cost of this integration won’t surpass 1$. A very small price to pay when comparing it to the bill of the unused servers.

🌐 For more tech insights, you can find me on LinkedIn.