<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Imthadh Ahamed</title>
    <description>The latest articles on Forem by Imthadh Ahamed (@imthadh_ahamed).</description>
    <link>https://forem.com/imthadh_ahamed</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2742351%2F2c8598ae-cd41-42df-bd8e-dd293337f504.jpg</url>
      <title>Forem: Imthadh Ahamed</title>
      <link>https://forem.com/imthadh_ahamed</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/imthadh_ahamed"/>
    <language>en</language>
    <item>
      <title>From Code Push to Docker Hub: CI/CD with GitHub Actions🚀</title>
      <dc:creator>Imthadh Ahamed</dc:creator>
      <pubDate>Thu, 09 Oct 2025 11:45:51 +0000</pubDate>
      <link>https://forem.com/imthadh_ahamed/from-code-push-to-docker-hub-cicd-with-github-actions-171a</link>
      <guid>https://forem.com/imthadh_ahamed/from-code-push-to-docker-hub-cicd-with-github-actions-171a</guid>
      <description>&lt;p&gt;Imagine you’re running a bakery. Every morning, you bring fresh ingredients (your new code). But instead of kneading, baking, packaging, and delivering each loaf by hand, you want a conveyor belt that handles everything the moment the ingredients arrive.&lt;/p&gt;

&lt;p&gt;That conveyor belt is your &lt;strong&gt;CI/CD pipeline&lt;/strong&gt;.&lt;br&gt;
Each time you git push, it builds, tests, packages, and ships your product automatically.&lt;/p&gt;

&lt;p&gt;In this article, we’ll create that conveyor belt for your Node.js app using &lt;strong&gt;GitHub Actions and Docker&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Automate Docker Builds?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters&lt;/strong&gt;&lt;br&gt;
Manual builds are slow, repetitive, and error-prone. Automation brings consistency and peace of mind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explain Like a Friend&lt;/strong&gt;&lt;br&gt;
Think of manually assembling furniture for every order — you’ll eventually forget screws or swap panels. A factory line ensures the same result every single time. CI/CD is that factory line for software.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-World Example&lt;/strong&gt;&lt;br&gt;
A small dev team pushes hotfixes daily. Without automation, they rebuild locally, tag manually, and sometimes push to the wrong repo. With GitHub Actions, every push triggers a clean, predictable build → tag → push pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action Step&lt;/strong&gt;&lt;br&gt;
Write down every manual command you currently run to deploy your Docker image. That’s your “to-automate” checklist.&lt;/p&gt;


&lt;h2&gt;
  
  
  Prerequisites: Accounts, Secrets &amp;amp; Permissions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters&lt;/strong&gt;&lt;br&gt;
Your automation can’t log in or deploy if it doesn’t have the right keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;&lt;br&gt;
Think of GitHub and Docker Hub as two warehouses. Your CI/CD robot needs secure keys to both: one to collect the code, the other to store the built product (your image).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-World Example&lt;/strong&gt;&lt;br&gt;
In the demo, the author logs into Docker Hub using encrypted secrets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;DOCKER_HUB_USERNAME&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DOCKER_HUB_TOKEN&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These secrets are stored safely in GitHub → Settings → Secrets → Actions — never in your source code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action Steps&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create or log into your Docker Hub account.&lt;/li&gt;
&lt;li&gt;Generate a personal access token (instead of using your password).&lt;/li&gt;
&lt;li&gt;In GitHub, go to &lt;code&gt;Settings → Secrets → Actions&lt;/code&gt;, and add:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DOCKER_HUB_USERNAME&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DOCKER_HUB_TOKEN&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Writing the Workflow (main.yaml) — Step by Step
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters&lt;/strong&gt;&lt;br&gt;
The workflow file is your recipe — the step-by-step plan your robot will follow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;&lt;br&gt;
In &lt;code&gt;.github/workflows/main.yaml&lt;/code&gt;, you define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When to run: (trigger event, like push)&lt;/li&gt;
&lt;li&gt;What to run: (the jobs and steps)&lt;/li&gt;
&lt;li&gt;How to run: (what tools or actions to use)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Parts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;on&lt;/code&gt;: push: “Start the oven when new ingredients arrive.”&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;jobs&lt;/code&gt;: “Each job is a workstation — baking, packaging, shipping.”&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;actions/checkout@v3&lt;/code&gt;: “Fetch ingredients from the storage (repo).”&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker/login-action@v2&lt;/code&gt;: “Unlock the warehouse with your keycard (secret).”&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker/build-push-action@v3&lt;/code&gt;: “Bake and ship the final product.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Workflow&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name: Build and Push Docker Image

on:
  push:
    branches:
      - main

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_HUB_USERNAME }}
          password: ${{ secrets.DOCKER_HUB_TOKEN }}

      - uses: docker/build-push-action@v3
        with:
          context: .
          push: true
          tags: yourname/app:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;⚠️ Common Pitfalls&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong Dockerfile path → build fails.&lt;/li&gt;
&lt;li&gt;Missing or misnamed secrets → login fails silently.&lt;/li&gt;
&lt;li&gt;Forgot &lt;code&gt;push: true&lt;/code&gt; → image never uploads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;**✅ Action Step&lt;br&gt;
Commit your file to &lt;code&gt;.github/workflows/main.yaml&lt;/code&gt; and push it. Your pipeline is ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build &amp;amp; Push: Watching It Run
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters&lt;/strong&gt;&lt;br&gt;
Seeing it work builds confidence that your automation truly works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;&lt;br&gt;
Push your code, and watch:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;GitHub detects the push.&lt;/li&gt;
&lt;li&gt;Actions triggers the job.&lt;/li&gt;
&lt;li&gt;The workflow checks out code, logs in to Docker Hub, builds, and pushes the image.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Real-World Example&lt;/strong&gt;&lt;br&gt;
In the demo, after pushing a small code change, the author refreshed Docker Hub and saw the new image tagged &lt;code&gt;latest&lt;/code&gt;. No manual typing, no Docker commands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action Steps&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Edit a small file (e.g., README).&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;git push&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Open GitHub → Actions tab → watch logs.&lt;/li&gt;
&lt;li&gt;Open Docker Hub → confirm the new tag.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Counterpoints &amp;amp; Limitations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Private repos or registries: Require additional permissions or tokens.&lt;/li&gt;
&lt;li&gt;Multi-service setups: Need multiple jobs or build matrix strategies.&lt;/li&gt;
&lt;li&gt;Large images: Consider caching layers for faster builds.&lt;/li&gt;
&lt;li&gt;Security: Always scan base images and protect secrets carefully.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Mini Case Study: “Solo Developer to One-Click Deployment”
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
X, a solo Node.js developer, manually built and pushed Docker images every update — often forgetting tags or pushing the wrong version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After GitHub Actions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Added main.yaml&lt;/li&gt;
&lt;li&gt;Configured Docker Hub secrets&lt;/li&gt;
&lt;li&gt;Now, every git push builds + pushes automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;He cut his deployment time from 10 minutes to under 1, and never mistakes again.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion — From Kitchen Chaos to Conveyor Belt
&lt;/h2&gt;

&lt;p&gt;We began with a bakery. You don’t want to bake, pack, and ship every loaf by hand. You want a conveyor belt that works reliably.&lt;/p&gt;

&lt;p&gt;GitHub Actions and Docker give you that belt. As soon as you push code, your automation handles the rest — repeatable, consistent, hands-off.&lt;/p&gt;

&lt;p&gt;Take this workflow file, plug in your secrets, push a change, and watch your pipeline come alive.&lt;/p&gt;

</description>
      <category>cicd</category>
      <category>devops</category>
      <category>githubactions</category>
      <category>node</category>
    </item>
    <item>
      <title>Docker Doesn’t Bite: A Beginner’s Guide</title>
      <dc:creator>Imthadh Ahamed</dc:creator>
      <pubDate>Tue, 23 Sep 2025 09:00:41 +0000</pubDate>
      <link>https://forem.com/imthadh_ahamed/docker-doesnt-bite-a-beginners-guide-39no</link>
      <guid>https://forem.com/imthadh_ahamed/docker-doesnt-bite-a-beginners-guide-39no</guid>
      <description>&lt;p&gt;When I first heard about Docker, I imagined it was only for “&lt;strong&gt;serious DevOps people&lt;/strong&gt;” running massive cloud systems. The truth? Docker is simply a smarter way to run apps without the headaches of manual setup. Instead of installing Node.js, PostgreSQL, or frameworks directly on your laptop, Docker lets you bundle everything your app needs into neat little containers like bento boxes for software. These containers run the same anywhere: your laptop, your friend’s machine, or a cloud server.&lt;/p&gt;

&lt;p&gt;In this post, I’ll share how I used Docker to spin up a PENN (PostgreSQL, Express/Node, Next.js) project in minutes, no messy installs required. We’ll walk through containerization basics, Docker Compose for multi-service apps, and even pushing images to Docker Hub so teammates can run the project instantly. If Docker has ever felt intimidating, think of this as a friendly guide — it’s easier than you think.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Docker?
&lt;/h2&gt;

&lt;p&gt;Docker is a tool that lets you package an application with everything it needs code, libraries, and settings, into a container. A container is like a &lt;strong&gt;bento box&lt;/strong&gt; for software: it neatly holds your app and its ingredients so it can run the same way anywhere.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Docker Image&lt;/strong&gt; = the recipe (instructions + ingredients).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker Container&lt;/strong&gt; = the meal (a running instance of the recipe).&lt;/li&gt;
&lt;li&gt;*&lt;em&gt;Isolation *&lt;/em&gt;= each bento box keeps its food separate, so one app doesn’t interfere with another.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portability&lt;/strong&gt; = once packed, you can run the container on any computer with Docker installed — your laptop, a server, or the cloud.
In short, Docker makes apps &lt;strong&gt;consistent, portable, and easy to run&lt;/strong&gt;. No more “works on my machine” drama — if it works in your container, it’ll work everywhere.🐳&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Use It?
&lt;/h2&gt;

&lt;p&gt;Docker solves the “&lt;strong&gt;it works on my machine&lt;/strong&gt;” problem by bundling your app with all its dependencies. No more manual installs, version mismatches, or dependency hell. Everyone runs the same container, so the app behaves consistently everywhere. And if something breaks? Just restart or replace the container — your laptop stays clean.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdnlnlgjcue0r7rizr2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdnlnlgjcue0r7rizr2z.png" alt=" " width="720" height="1080"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When One Container Isn’t Enough: Enter Docker Compose
&lt;/h2&gt;

&lt;p&gt;Running a single app in a container is nice, but real projects often need multiple pieces — a database, an API, a frontend. That’s where &lt;strong&gt;Docker Compose&lt;/strong&gt; comes in. Think of it as the &lt;strong&gt;orchestra conductor&lt;/strong&gt; for containers: instead of starting each one manually, you describe everything in a &lt;code&gt;docker-compose.yml&lt;/code&gt; file, then run &lt;code&gt;docker compose up&lt;/code&gt;. Compose pulls images, builds your code, wires containers together on a private network, and manages startup order.&lt;/p&gt;

&lt;p&gt;For example, in a PENN stack (PostgreSQL, Express/Node, Next.js), your Compose file might define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;db&lt;/strong&gt; → runs Postgres from an official image, with a persistent volume for data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;backend&lt;/strong&gt; → builds from a Node image and connects to the database service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;frontend&lt;/strong&gt; → builds the Next.js app, mapped to port 3000.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With a single command, you have a database, server, and UI all talking to each other. Need Redis later? Just add a service — no local installs required.&lt;/p&gt;

&lt;p&gt;The best part? &lt;strong&gt;Consistency and speed&lt;/strong&gt;. A teammate can clone your repo, run &lt;code&gt;docker compose up&lt;/code&gt;, and be productive in minutes. No dependency hell, no OS-specific setup, just a clean, reproducible dev environment.🎉&lt;/p&gt;

&lt;p&gt;To illustrate, here’s a snippet of what a Compose file might look like for our &lt;strong&gt;PENN stack&lt;/strong&gt; (PostgreSQL, Express, Node, Next.js) example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;services:
  db:
    image: postgres:15
    environment:
      - POSTGRES_USER=myuser
      - POSTGRES_PASSWORD=mypassword
      - POSTGRES_DB=mydb
    volumes:
      - postgres-data:/var/lib/postgresql/data

  backend:
    build: ./backend
    ports:
      - "8080:8080"
    environment:
      - DATABASE_URL=postgres://myuser:mypassword@db:5432/mydb
    depends_on:
      db:
        condition: service_healthy

  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    environment:
      - API_URL=http://localhost:8080
    depends_on:
      backend:
        condition: service_started

volumes:
  postgres-data:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Docker as a Friendly Tool in Your Toolbox
&lt;/h2&gt;

&lt;p&gt;Docker might look magical at first, but it’s very practical magic. By wrapping your app and its dependencies into containers, you avoid the repetitive installs and endless “it works on my laptop” debates. Think of Docker as a kitchen assistant that preps everything ahead of time, or a shipping manager that delivers your package sealed and intact.&lt;/p&gt;

&lt;p&gt;The key benefits are simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt; → run multiple apps with conflicting setups without clashing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt; → your app runs the same everywhere, from dev to production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity&lt;/strong&gt; → with Docker Compose, even multi-service projects spin up with a single command.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For beginners, the trick is not to overthink it. Start small: containerize a simple app, then expand. Use Docker Hub to pull ready-made images or share your own. With just &lt;code&gt;docker compose up&lt;/code&gt;, you can run a database, backend, and frontend in minutes—no manual installs, no dependency chaos.&lt;/p&gt;

&lt;p&gt;Once you try it, Docker feels less like a scary sea monster and more like the friendly whale in its logo. It helps you ship software reliably and with less stress. So go ahead — dip your toes in. You’ll wonder how you ever coded without it.🐳&lt;/p&gt;

</description>
      <category>docker</category>
      <category>cicd</category>
      <category>containers</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Breaking Down Text for Better AI Processing: Why Chunk Size and Overlap Matter</title>
      <dc:creator>Imthadh Ahamed</dc:creator>
      <pubDate>Mon, 22 Sep 2025 08:13:37 +0000</pubDate>
      <link>https://forem.com/imthadh_ahamed/breaking-down-text-for-better-ai-processing-why-chunk-size-and-overlap-matter-530m</link>
      <guid>https://forem.com/imthadh_ahamed/breaking-down-text-for-better-ai-processing-why-chunk-size-and-overlap-matter-530m</guid>
      <description>&lt;p&gt;Before an AI model like GPT or Gemini can provide smart answers, summarize documents, or generate insights, the input text needs to be prepared carefully. This process, known as text preprocessing, makes sure the AI can process and comprehend the data you supply.&lt;/p&gt;

&lt;p&gt;Token limits are a major preprocessing challenge. The most tokens that even sophisticated models can handle in a single request is limited. They will struggle if you give them a whole book or research paper. At this point, chunking(splitting) — the process of dividing lengthy texts into manageable chunks — becomes crucial.&lt;/p&gt;

&lt;p&gt;Frameworks like LangChain contain tools like &lt;a href="https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html" rel="noopener noreferrer"&gt;RecursiveCharacterTextSplitter&lt;/a&gt; that are made especially for this use. They minimise the loss of meaning while assisting in the division of lengthy texts into digestible sections.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4e0hf7cxgzjk5l3rziz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4e0hf7cxgzjk5l3rziz.png" alt=" " width="720" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Split Text in NLP?
&lt;/h2&gt;

&lt;p&gt;Language models are powerful, but they don’t have infinite memory. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4 Turbo supports up to &lt;strong&gt;128,000 tokens&lt;/strong&gt; (roughly 300 pages of text).&lt;/li&gt;
&lt;li&gt;Other models may handle much less, sometimes only &lt;strong&gt;4,000–8,000 tokens&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When your text exceeds these limits, you need to split it. But if you split text carelessly, you risk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context loss&lt;/strong&gt;: Important details cut in half between chunks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic breaks&lt;/strong&gt;: Sentences or paragraphs split in the middle, making chunks harder to understand.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal of splitting is to preserve semantic coherence — keeping ideas whole and understandable — while staying within token limits.&lt;/p&gt;

&lt;p&gt;📱&lt;strong&gt;Example: Sending Long Messages on WhatsApp&lt;/strong&gt;&lt;br&gt;
Imagine you want to send your friend a long story over WhatsApp, but WhatsApp only lets you send 500 characters per message.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you paste the whole story in one go, it won’t send (like hitting the AI’s token limit).&lt;/li&gt;
&lt;li&gt;So, you split the story into smaller messages (like chunks).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now two problems can happen if you split carelessly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context loss&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You cut right in the middle of a sentence.&lt;/li&gt;
&lt;li&gt;Message 1: &lt;em&gt;“The hero opened the doo — ”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Message 2: “ — &lt;em&gt;r and saw a dragon.&lt;/em&gt;” → The flow feels broken.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Semantic breaks&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You accidentally separate related parts.&lt;/li&gt;
&lt;li&gt;Message 1 ends with: “&lt;em&gt;The hero raised his sword.&lt;/em&gt;”&lt;/li&gt;
&lt;li&gt;Message 2 starts with something completely new: “&lt;em&gt;Meanwhile, in another city…&lt;/em&gt;” → Your friend might get confused because the action was cut too sharply.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ The solution is to split at natural points (like at the end of a sentence or paragraph) and sometimes repeat a little overlap.&lt;br&gt;
For example, you might copy the last few words of one message into the next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Message 1: “&lt;em&gt;…he opened the door and saw a dragon.&lt;/em&gt;”&lt;/li&gt;
&lt;li&gt;Message 2: “&lt;em&gt;He saw a dragon breathing fire across the room…&lt;/em&gt;”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That way, your friend remembers the scene, and the story feels smooth — just like preserving semantic coherence when chunking text for AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is RecursiveCharacterTextSplitter?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html" rel="noopener noreferrer"&gt;RecursiveCharacterTextSplitter&lt;/a&gt; is a utility in LangChain (and similar NLP libraries) that intelligently splits large text into smaller parts.&lt;/p&gt;

&lt;p&gt;Here’s how it works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It tries to split text by larger natural boundaries first (paragraphs, sentences).&lt;/li&gt;
&lt;li&gt;If a chunk is still too big, it splits further down to smaller boundaries (words, then characters).&lt;/li&gt;
&lt;li&gt;This recursive process ensures the chunks are manageable but still meaningful.
In short, it’s like cutting a long story into chapters, then into scenes, and only as a last resort into lines — making sure each piece still makes sense.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code Breakdown
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(  
    chunk_size=1000,  
    chunk_overlap=200  
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What does this mean?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;chunk_size=1000&lt;/code&gt; → Each chunk will be about &lt;strong&gt;1,000 characters long&lt;/strong&gt;. Think of it as setting the length of each episode in your story.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chunk_overlap=200&lt;/code&gt; → The end of one chunk overlaps with the next by &lt;strong&gt;200 characters&lt;/strong&gt;. This ensures continuity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Example: Watching a TV Series
&lt;/h2&gt;

&lt;p&gt;Imagine your PDF is a long TV series. If you cut it into episodes without overlap, a scene might end halfway through Episode 1 and continue in Episode 2. Confusing, right?&lt;/p&gt;

&lt;p&gt;With overlap, &lt;strong&gt;the last 5 minutes of Episode 1 are replayed at the start of Episode 2&lt;/strong&gt;.&lt;br&gt;
👉 This way, you don’t forget what happened, and the story flows smoothly.&lt;/p&gt;

&lt;p&gt;That’s exactly what &lt;code&gt;chunk_overlap=200&lt;/code&gt; does: it repeats part of the previous text to keep context intact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Use Cases
&lt;/h2&gt;

&lt;p&gt;Chunking text isn’t just academic — it’s used in real projects every day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Document Q&amp;amp;A systems&lt;/strong&gt;: Splitting a PDF or manual into chunks so an LLM can answer specific questions accurately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text summarization&lt;/strong&gt;: Breaking down large reports into smaller parts before generating summaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector databases (FAISS, Pinecone, Chroma)&lt;/strong&gt;: Storing chunk embeddings for efficient semantic search and retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training data preparation&lt;/strong&gt;: Splitting text before feeding it into custom NLP/LLM training pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pros and Cons
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;Advantages&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains context continuity with overlaps.&lt;/li&gt;
&lt;li&gt;Preserves semantic meaning by splitting at logical points.&lt;/li&gt;
&lt;li&gt;Works well with vector databases and LLMs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Drawbacks&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased processing time: More chunks = more computations.&lt;/li&gt;
&lt;li&gt;Higher token cost: Overlaps mean some text is repeated, slightly increasing usage costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Thoughtful text splitting is a cornerstone of effective AI text processing. By carefully choosing chunk size and overlap, you make sure your AI has just enough information in each piece without losing the bigger picture.&lt;/p&gt;

&lt;p&gt;While &lt;a href="https://dev.tourl"&gt;RecursiveCharacterTextSplitter&lt;/a&gt; is a go-to tool, alternatives exist: sentence-based splitters, semantic chunkers, or token-level splitters. The key is to balance chunk length and context preservation based on your use case.&lt;/p&gt;

&lt;p&gt;If you’re building anything from a chatbot to a summarizer, applying these chunking strategies will dramatically improve your results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading / Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/modules/data_connection/document_transformers/" rel="noopener noreferrer"&gt;LangChain Text Splitters Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/transformers/main/en/tokenizer_summary" rel="noopener noreferrer"&gt;HuggingFace Tokenization Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/advanced-usage#managing-tokens" rel="noopener noreferrer"&gt;OpenAI Token Limits&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>chunk</category>
      <category>llm</category>
      <category>nlp</category>
    </item>
    <item>
      <title>Getting Started with Google Gemini Embeddings in Python: A Hands-On Guide</title>
      <dc:creator>Imthadh Ahamed</dc:creator>
      <pubDate>Mon, 22 Sep 2025 06:36:16 +0000</pubDate>
      <link>https://forem.com/imthadh_ahamed/getting-started-with-google-gemini-embeddings-in-python-a-hands-on-guide-bfi</link>
      <guid>https://forem.com/imthadh_ahamed/getting-started-with-google-gemini-embeddings-in-python-a-hands-on-guide-bfi</guid>
      <description>&lt;p&gt;Artificial Intelligence is evolving rapidly, and one of the most exciting areas is retrieval-augmented generation (RAG) and semantic search. At the heart of these systems lies a powerful concept: embeddings.&lt;/p&gt;

&lt;p&gt;In this article, I'll walk you through the process of generating embeddings using Google's Gemini API in Python. Don't worry if you're a beginner, we'll break it down step by step, with code you can run today.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Embeddings (with a real-world twist)?
&lt;/h2&gt;

&lt;p&gt;Imagine walking into a huge supermarket. Instead of wandering, you notice how items are grouped.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Fruits are in one section 🍎🍌🍇&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Vegetables in another 🥦🥕&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bakery items together 🥖🍩&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even though &lt;em&gt;apple&lt;/em&gt; and &lt;em&gt;banana&lt;/em&gt; are different, they're close together in the fruit section because they &lt;em&gt;mean similar things&lt;/em&gt; (both are fruits).&lt;br&gt;
That's exactly what embeddings do for text. They put similar concepts near each other in a mathematical supermarket.&lt;br&gt;
So if you ask the system about &lt;em&gt;solar energy&lt;/em&gt;, it doesn't just look for the exact word "solar." It also finds &lt;em&gt;photovoltaics, sunlight power&lt;/em&gt;, or &lt;em&gt;renewable energy&lt;/em&gt;, because those live in the same "aisle."&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Think of embeddings as a way to organize knowledge like a supermarket organizes groceries.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What Are Embeddings (Technical View)?
&lt;/h2&gt;

&lt;p&gt;An embedding is a numerical representation of data (text, images, audio, etc.) in a continuous vector space. In NLP (natural language processing), embeddings are used to represent words, sentences, or documents as high-dimensional vectors of real numbers.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Properties
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Dimensionality&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each embedding is a vector of fixed length (768 dimensions for embedding-001 in Gemini).&lt;/li&gt;
&lt;li&gt;Example:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[0.12, -0.34, 0.89, ...]  # length = 768
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Semantic Proximity&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The geometry of the vector space encodes meaning.&lt;/li&gt;
&lt;li&gt;Texts with similar semantic meaning will have embeddings that are closer together (low cosine distance / high cosine similarity).&lt;/li&gt;
&lt;li&gt;Example:&lt;/li&gt;
&lt;li&gt;"Solar energy" and "photovoltaics" → embeddings close in space&lt;/li&gt;
&lt;li&gt;"Solar energy" and "chocolate cake" → embeddings far apart&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Training Basis&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embeddings are learned by large language models (LLMs) trained on massive corpora.&lt;/li&gt;
&lt;li&gt;The model optimizes representations such that semantically related text produces vectors with high similarity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mathematical Use&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can compute distances between embeddings using:&lt;/li&gt;
&lt;li&gt;Cosine similarity (most common)&lt;/li&gt;
&lt;li&gt;Euclidean distance&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Setting Up the Environment
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install google-generativeai python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;google-generativeai → The engine (Gemini API)&lt;/li&gt;
&lt;li&gt;python-dotenv → The cashier who checks your membership card (API key)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Inside your .env file, add:&lt;br&gt;
&lt;code&gt;GEMINI_API_KEY=your_api_key_here&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Writing the Python Code
&lt;/h3&gt;

&lt;p&gt;Here's our hands-on demo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import google.generativeai as genai
from dotenv import load_dotenv

# Load API key
os.environ.pop("GEMINI_API_KEY", None)
load_dotenv(override=True)
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")

if not GEMINI_API_KEY:
    raise ValueError("GEMINI_API_KEY is not set in the environment variables.")
print("GEMINI_API_KEY is set.")

# Configure Gemini
genai.configure(api_key=GEMINI_API_KEY)

# Text chunks = our knowledge items
chunks = [
    "Chunk 1: Renewable energy comes from resources that are naturally replenished (sunlight, wind, rain, tides, waves, geothermal).",
    "Chunk 2: Solar energy is abundant and captured using photovoltaic panels. Wind energy uses turbines to generate electricity.",
    "Chunk 3: Geothermal energy comes from heat inside the Earth. Biomass energy is derived from organic materials."
]

# Generate embeddings
for i, chunk in enumerate(chunks):
    response = genai.embed_content(
        model="models/embedding-001",
        content=chunk,
        task_type="retrieval_document"
    )
    embedding = response["embedding"]
    print(f"\nEmbedding for Chunk {i + 1}:\n{embedding[:10]}...")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Breaking It Down
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;API Key Handling&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We use .env for security. never hardcode keys.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Chunks of Text&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each chunk represents a small passage of knowledge. In real-world projects, chunks might be paragraphs from PDFs, product descriptions, or support docs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;embed_content&lt;/code&gt; Call&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Gemini model (&lt;code&gt;embedding-001&lt;/code&gt;) converts text into a 768-dimensional vector. We only print the first 10 numbers to keep output readable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each chunk now has a unique vector representation. These embeddings can be stored in a vector database (like Pinecone, Weaviate, or ChromaDB) for semantic search or powering RAG pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Applications
&lt;/h2&gt;

&lt;p&gt;So why does this matter? Here are some examples where embeddings shine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search Engines → Find relevant docs by meaning, not just keywords.&lt;/li&gt;
&lt;li&gt;Chatbots &amp;amp; RAG Systems → Retrieve context-aware answers.&lt;/li&gt;
&lt;li&gt;Recommendation Engines → Suggest similar products or articles.&lt;/li&gt;
&lt;li&gt;Clustering &amp;amp; Topic Modeling → Group similar content automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Imagine building a renewable energy Q&amp;amp;A bot: the chunks above could serve as knowledge, and embeddings would help the bot fetch the right passage when a user asks, "How does geothermal energy work?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Embeddings are like the hidden language that bridges human words and machine understanding. With Google's Gemini API, creating them is no longer rocket science - it's just a few lines of Python.&lt;/p&gt;

&lt;p&gt;If you're planning to build your own &lt;strong&gt;AI-powered search, chatbot, or recommendation system&lt;/strong&gt;, embeddings will be at the core of it. This hands-on example is your first step toward building those advanced systems.&lt;/p&gt;

</description>
      <category>gemini</category>
      <category>nlp</category>
      <category>rag</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
