Forem: Damilare Ogundele

Building a Multi-Application Kubernetes Marketplace: TCP/UDP App Onboarding at Scale

Damilare Ogundele — Wed, 25 Jun 2025 22:56:30 +0000

Introduction

Recently, I tackled a comprehensive marketplace app onboarding project that involved deploying various TCP/UDP and HTTP applications on Kubernetes. This post shares the technical journey, challenges faced, and solutions implemented while onboarding applications like MySQL, MongoDB, RabbitMQ, and others to our marketplace platform.

The Challenge

The goal was to create a standardized onboarding process for diverse applications with different networking requirements:

TCP/UDP applications requiring Network Load Balancers
HTTP applications needing ingress controllers and SSL termination
Database applications requiring persistent storage and proper health checks
Message brokers with multiple port configurations

Architecture Overview

Our solution leverages several key AWS and Kubernetes components:

Network Load Balancer Configuration

services:
  - name: tcp
    type: LoadBalancer
    protocol: TCP
    port: 3306
    targetPort: 3306
    lb:
      scheme: internet-facing
      type: nlb-ip
      target_type: ip
      healthcheck_protocol: TCP
      healthcheck_interval: 10
      healthcheck_timeout: 6

Helm Chart Structure

We standardized our deployments using a consistent Helm chart pattern with these key sections:

Global configuration for app identification and DNS
Deployment specs with resource limits and security contexts
Service definitions for both internal and external access
Storage management with persistent volumes
Secrets handling for sensitive configuration

Case Study: RabbitMQ Deployment

RabbitMQ presented an interesting challenge with its dual-port requirement (AMQP protocol on 5672 and Management UI on 15672). Here's how we handled it:

containers:
  - name: rabbitmq
    image: rabbitmq:4-management
    ports:
      - 5672  # AMQP
      - 15672 # Management UI
    livenessProbe:
      httpGet:
        path: /
        port: 15672
      initialDelaySeconds: 60
      periodSeconds: 30
    readinessProbe:
      httpGet:
        path: /
        port: 15672
      initialDelaySeconds: 30
      periodSeconds: 10

services:
  - name: amqp
    type: ClusterIP
    protocol: TCP
    port: 5672
    targetPort: 5672
  - name: management
    type: ClusterIP
    protocol: TCP
    port: 15672
    targetPort: 15672
    ingress:
      cert_issuer: "letsencrypt"
      class: nginx

Database Deployment Patterns

For database applications like MySQL, we focused on:

Persistent Storage

volumes:
  - name: mysql-data-volume
    storageClassName: ebs-sc
    storage: 8Gi
    accessModes:
      - ReadWriteOnce

Health Checks

readinessProbe:
  tcpSocket:
    port: 3306
  initialDelaySeconds: 30
  periodSeconds: 10
livenessProbe:
  tcpSocket:
    port: 3306
  initialDelaySeconds: 60
  periodSeconds: 30

Key Learnings

1. Standardization is Critical

Creating a consistent Helm chart structure across all applications significantly reduced deployment complexity and improved maintainability.

2. Health Check Strategy

Different applications require different health check approaches:

HTTP applications: Use HTTP GET requests to health endpoints
Databases: Use TCP socket checks on primary ports
Message brokers: Check management interfaces when available

3. Security Context Management

Proper user and group ID management is essential for persistent storage:

securityContext:
  runAsUser: 999
  fsGroup: 999

4. Resource Management

Setting appropriate resource limits prevents applications from consuming excessive cluster resources:

resources:
  requests:
    cpu: 200m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi

Automation and Testing

We implemented automated testing procedures to ensure each application deployment meets our marketplace standards:

Connectivity tests for TCP/UDP services
SSL certificate validation for HTTPS services
Persistent storage verification for stateful applications
Health check validation for all deployments

Results and Impact

The standardized onboarding process has enabled us to:

Reduce deployment time by 70%
Maintain consistent security policies across all applications
Simplify troubleshooting with standardized logging and monitoring
Scale our marketplace offerings efficiently

Future Enhancements

Looking ahead, we're planning to:

Implement GitOps workflows for automated deployments
Add application-specific monitoring dashboards
Develop self-service onboarding tools for developers
Expand support for more complex multi-tier applications

Conclusion

Building a multi-application Kubernetes marketplace requires careful planning, standardization, and attention to the unique requirements of each application type. By leveraging Helm charts, AWS Load Balancers, and Kubernetes best practices, we've created a robust platform that can scale with our growing marketplace needs.

The key takeaway is that while each application has unique requirements, a well-designed template system can accommodate this diversity while maintaining operational consistency.

Building a High-Performance Web Scraper with Python

Damilare Ogundele — Mon, 17 Mar 2025 08:44:17 +0000

Introduction

This article explores the architecture and implementation of a high-performance web scraper built to extract product data from e-commerce platforms. The scraper uses multiple Python libraries and techniques to efficiently process thousands of products while maintaining resilience against common scraping challenges.

Technical Architecture

The scraper is built on a fully asynchronous foundation using Python's asyncio ecosystem, with these key components:

Network Layer: aiohttp for async HTTP requests with connection pooling
DOM Processing: BeautifulSoup4 for HTML parsing
Dynamic Content: Playwright for JavaScript-rendered content extraction
Data Processing: pandas for data manipulation and export

Implementation Highlights

Concurrency Management

The scraper implements a worker pool pattern with configurable concurrency limits:

# Concurrency settings
self.max_workers = int(os.getenv('MAX_WORKERS'))
self.max_connections = int(os.getenv('MAX_CONNECTIONS'))

# TCP connection pooling
connector = aiohttp.TCPConnector(
    limit=self.max_connections,
    resolver=resolver  # Custom DNS resolver
)

This prevents overwhelming the target server while maximizing throughput.

Resilient Network Requests

The network layer implements sophisticated retry logic with exponential backoff:

async def fetch_url(self, session, url):
    retries = 0
    while retries < self.max_retries:
        try:
            headers = {
                'User-Agent': self.user_agent.random,
                # Additional headers omitted for brevity
            }

            async with session.get(url, headers=headers, timeout=self.request_timeout) as response:
                if response.status == 200:
                    return await response.read()
                elif response.status == 429:
                    # Rate limit handling
                    retry_after = int(response.headers.get('Retry-After', 
                                    self.retry_backoff ** (retries + 2)))
                    await asyncio.sleep(retry_after)

            # Retry with exponential backoff
            retries += 1
            wait_time = self.retry_backoff ** (retries + 1)
            await asyncio.sleep(wait_time)

        except (asyncio.TimeoutError, aiohttp.ClientError) as e:
            logger.warning(f"Network error: {e}")
            retries += 1

Hybrid Content Extraction

The scraper employs a two-phase extraction approach:

Static HTML Parsing: Uses BeautifulSoup to extract readily available content
Dynamic Content Extraction: Uses Playwright to handle JavaScript-rendered elements

async def fetch_product(self, session, url, page):
    # Static content extraction
    with concurrent.futures.ThreadPoolExecutor() as executor:
        loop = asyncio.get_event_loop()
        product = await loop.run_in_executor(
            executor, 
            partial(self.scrape_product_html, content, url)
        )

    # Dynamic content extraction
    image_url, description = await self.scrape_dynamic_content_playwright(page, url)
    product.image_url = image_url
    product.description = description

This approach optimizes for both speed and completeness.

DNS Resilience

The scraper implements DNS fallbacks to handle potential DNS resolution issues:

try:
    import aiodns
    resolver = aiohttp.AsyncResolver(nameservers=["8.8.8.8", "1.1.1.1"])
except ImportError:
    logger.warning("aiodns library not found. Falling back to default resolver.")
    resolver = None

Data Processing Pipeline

The scraper implements a thread-safe queue for handling scraped data:

# Thread-safe queue for results
self.results_queue = queue.Queue()

# Data processing
def save_results_from_queue(self):
    products = []
    while not self.results_queue.empty():
        try:
            products.append(self.results_queue.get_nowait())
        except queue.Empty:
            break

    if products:
        df = pd.DataFrame(products)
        # Save to CSV with proper encoding and escaping
        df.to_csv(
            filename,
            index=False,
            encoding='utf-8-sig',
            escapechar='\\',
            quoting=csv.QUOTE_ALL
        )

Performance Optimizations

Several techniques are employed to maximize throughput:

Batch Processing: Products are processed in configurable batches
Random Delays: Randomized delays between requests prevent detection
Connection Pooling: TCP connection reuse reduces overhead
ThreadPoolExecutor: CPU-bound tasks are offloaded to prevent blocking the event loop
Sampling: For large datasets, statistical sampling is used to estimate total counts

Error Handling and Reliability

The scraper implements comprehensive error handling:

try:
    # Scraping logic
except Exception as e:
    logger.error(f"Error in scrape_all_products: {e}")
    # Save any results in queue before exiting
    self.save_results_from_queue()
    raise

This ensures that even if the scraper crashes, partial results are saved.

Conclusion

The architecture outlined here demonstrates how to build a high-performance web scraper that balances speed, reliability, and target server courtesy. By leveraging asynchronous programming, connection pooling, and hybrid content extraction techniques, the scraper can efficiently process thousands of products while maintaining resilience against common scraping challenges.

Key takeaways:

Asynchronous programming is essential for high-performance web scraping
Hybrid static/dynamic extraction maximizes data completeness
Proper error handling and resilience mechanisms are crucial for production use
Configurable parameters allow for fine-tuning based on target site characteristics

Certificate Unlocked! 🎓

Damilare Ogundele — Tue, 03 Sep 2024 07:41:25 +0000

I’m thrilled to announce that I’ve officially received my certificate for completing the HNG11 internship!

Being one of the 500 finalists chosen from over 20,000 applicants was a journey of growth, learning, and resilience. Just a few hours ago, I reflected on this incredible experience, the projects I took on, and the amazing people I had the privilege to work with. (You can check out that post here).

This certificate is more than just a piece of paper—it’s a testament to the hard work, sleepless nights, countless lines of code, debugging sessions, and the relentless pursuit of excellence. From developing and deploying the Remote Bingo game app to mastering tools like Helm, Ansible, Docker, Kubernetes, and GitHub Actions, this journey has been truly transformative.

I’m immensely grateful to the mentors, teammates, and everyone who supported me along the way. This is just the beginning, and I can’t wait to continue pushing boundaries and achieving more.

To everyone out there, keep striving, keep learning, and never forget the power of perseverance.

Connect with me on LinkedIn and also GitHub

Reflecting on My Journey at HNG11 Internship!

Damilare Ogundele — Mon, 02 Sep 2024 12:02:50 +0000

After a thrilling and intense few months, I’ve just wrapped up the HNG11 internship, and I’m incredibly proud to share that I was among the 500 finalists selected from over 20,000 applicants! 🎉

Starting on June 27th, thousands of interns across various tracks, including Project Managers, Developers, Designers, Video Marketers, Data Analysts, and DevOps specialists, embarked on this journey. Being part of the DevOps track, I faced numerous challenging tasks that tested my abilities and expanded my skill set. I worked with tools like Helm, Ansible, Docker, Kubernetes, Bash, Python, GitHub Actions, and more.

As the Chief Mentor, Mark Essien, put it:

“All of you are special. 20k wanted to be finalists, 500 made it. You have proven something to yourself. You are special, never forget that. Where everyone gave up, you kept going. You have that strength, you have that willpower, you have that intelligence.”

One of the highlights was collaborating with a fantastic team to develop and deploy a game app Remote Bingo from dev to staging and finally to production. This project, along with many others, taught me invaluable lessons about teamwork, resilience, and the power of persistence.

A big shoutout to all my team members and mentors who made this experience so rewarding. I’m grateful for your support, guidance, and the memories we created together.

On to the next challenge!💯

Connect with me on LinkedIn and also GitHub

Scheduled Test Workflow Documentation

Damilare Ogundele — Sat, 24 Aug 2024 21:51:24 +0000

Overview

This documentation provides a detailed explanation of the setup for a cron job scheduled test in a GitHub Actions workflow. The purpose of this setup is to automate the execution of test scripts against a Postman collection every 15 minutes, ensuring continuous testing and monitoring of the boilerplate repository's API endpoints.

GitHub Actions Workflow Configuration

name: Scheduled Test

on:
  schedule:
    - cron: '*/15 * * * *'

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Run test script
        env:
          POSTMAN_API_KEY: ${{ secrets.POSTMAN_API_KEY }}
          API_URL: ${{ secrets.API_URL }}
        run: |
          cd qa
          chmod +x test.sh
          ./test.sh

Key Components:

Schedule Trigger (on: schedule): The workflow is triggered every 15 minutes as specified by the cron expression '*/15 * * * *'.
Environment Variables:
- POSTMAN_API_KEY: The API key for accessing the Postman collection (stored securely in GitHub Secrets).
- API_URL: The base URL of the API under test.
Steps:
- Checkout Repository: Uses the actions/checkout@v3 action to check out the repository code.
- Run Test Script: Executes the test.sh script located in the qa directory, which handles the installation of necessary dependencies and triggers the test execution.

Test Script (test.sh)

npm install newman
npm install axios
npm install big-json
node ./index.js

Explanation:

Dependency Installation:
- newman: A command-line tool to run Postman collections.
- axios: A promise-based HTTP client for making API requests.
- big-json: A module to handle large JSON files.
Running index.js: Executes the main script that handles the Postman collection run, compression of results, and subsequent API requests.

Main Script (index.js) (Not included in this documentation)

The index.js script orchestrates the entire process, from executing the tests with Newman to compressing the results and sending them to the API.

Conclusion

This setup provides a robust mechanism for automated testing of the boilerplate repository. The workflow ensures that API endpoints are tested every 15 minutes, with detailed logs and results management, enhancing the overall quality and reliability of the project.

DevOps Task - Kubernetes Self-Hosted GitHub Runners

Damilare Ogundele — Tue, 20 Aug 2024 14:16:08 +0000

This documentation outlines the process of setting up self-hosted GitHub Actions runners on a Kubernetes cluster as part of the "Stage 7 DevOps Task." The goal was to establish scalable and manageable CI/CD pipelines using the GitHub Actions Runner Controller.

1. Environment Setup

This section details the setup of the Kubernetes environment required to host the GitHub Actions runners.

1.1. Kubernetes Cluster Setup

We utilized Minikube for its simplicity and ease of use in creating a local Kubernetes cluster.

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start --driver=docker

Cluster verification was performed using kubectl cluster-info to ensure it was operational.

1.2. Helm Installation

Helm simplifies Kubernetes application management. Installation was achieved through the following commands:

curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

1.3. Docker Installation

Docker is essential for containerized applications within the Kubernetes cluster.

sudo apt-get update
sudo apt-get -y install ca-certificates curl
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

1.4. kubectl Installation

kubectl is the command-line tool for managing Kubernetes clusters.

KUBECTL_VERSION=v1.29.0
wget https://storage.googleapis.com/kubernetes-release/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl
chmod +x kubectl
sudo mv kubectl /usr/local/bin/
kubectl version --client

2. GitHub Actions Runner Controller Setup

This section focuses on deploying the GitHub Actions Runner Controller, which manages the runners.

2.1. Namespace Creation

A dedicated namespace, github-actions-runner, was created to isolate the runner controller and its resources.

kubectl create namespace github-actions-runner

2.2. Runner Deployment

The YAML configuration below defines the RunnerDeployment, specifying the desired number of replicas and the runner image to use.

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: kahuna-runner
spec:
  replicas: 1
  template:
    spec:
      repository: Stage7-GitHub-Runner/hng_boilerplate_nestjs
      labels:
      - Kahuna

Deployment was achieved using kubectl apply -f runnerdeployment.yaml, and the pod status was verified using kubectl get pods -n github-actions-runner.

2.3. Service Account and Permissions

A service account (runner-sa) was created and granted the cluster-admin role to ensure the runner had the necessary permissions to access Kubernetes resources.

kubectl create serviceaccount runner-sa -n github-actions-runner
kubectl create rolebinding runner-rb --serviceaccount=github-actions-runner:runner-sa --clusterrole=cluster-admin -n github-actions-runner

3. Runner Configuration

This section details the configuration and deployment of the GitHub Actions runners.

3.1. Runner Registration Token

The runner registration token was retrieved from the GitHub repository. This token is used to authenticate the runners with GitHub.

3.2. Runner Spec Definition

The runner specifications were defined in the YAML file, including labels (Kahuna) and resource requests.

3.3. Runner Deployment

The Kubernetes manifests were applied using kubectl apply -f runnerdeployment.yaml to deploy the runners.

3.4. Verification

Runner logs were monitored using kubectl logs -f kahuna-runner-jkbv6-lpdqp to ensure successful registration and operation.

4. Testing and Validation

This section describes the testing and validation process for the self-hosted runners.

4.1. Workflow Creation

A sample GitHub Actions workflow was created to utilize the self-hosted runners. This workflow included steps for code checkout, dependency installation, linting, building, and testing.

name: Lint, Build and Test

on: workflow_dispatch

jobs:
  lint-build-and-test:
    runs-on: Kahuna
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm install --include=dev

      - name: Run lint
        run: npm run lint

      - name: Build project
        run: npm run build

      - name: Run tests
        run: npm run test

4.2. Workflow Execution

The workflow was executed to verify that the self-hosted runners were correctly processing jobs.

4.3. Monitoring

Runner performance was monitored, and any issues were debugged.

4.4. Scalability Testing

Scalability was tested by adjusting the number of workflows and observing how the runners scaled up or down.

5. Conclusion

Self-hosted GitHub Actions runners were successfully set up on a Kubernetes cluster, providing a scalable and manageable solution for CI/CD pipelines. The runners were thoroughly tested, validated, and documented, demonstrating proficiency with Kubernetes and CI/CD tools.

Linux User Creation Bash Script

Damilare Ogundele — Sun, 30 Jun 2024 22:59:25 +0000

Hello everyone, I am Kahuna, and I’m excited to share my latest technical article. As a DevOps engineer, I was asked to manage user accounts and groups. Today, I’ll walk you through a script I wrote to automate this process. This script reads a text file containing usernames and their respective groups, creates users and groups as specified.

Prerequisites

I ensured I have the necessary permissions to create users and groups, and write to the /var/log/ and /var/secure/ directories.

The Script

Here’s a breakdown of the create_users.sh script:

Log and Password Files:

The script uses /var/log/user_management.log for logging actions and /var/secure/user_passwords.csv to securely store generated passwords. The /var/secure/ directory is set with restrictive permissions to ensure password security.

Input Validation:

The script checks if an input file is provided and exits with usage instructions if not.

Logging Function:

A simple function logs messages with timestamps to the log file.

Password Generation:

A function generates random 12-character passwords using /dev/urandom.

Processing the Input File:

The script reads each line of the input file, extracts the username and groups, and processes them:

User Existence Check: If the user already exists, it logs the information and skips to the next line.
User Creation: It creates the user with the specified personal group and a home directory.
Additional Groups: If additional groups are specified, the script creates them if they don’t exist and adds the user to these groups.
Password Setting: It generates and sets a random password for the user and logs this action.

Running the Script

To run the script, I have saved it as create_users.sh, and I have provided the input file as an argument:

chmod +x create_users.sh
sudo ./create_users.sh employee_file

Input File

Here’s the input file (employee_file) looks like:

Kahuna; Backend,DevOps,HR
Dami; DevOps,HR
Sola; Backend

Conclusion

This script automates the process of creating and managing users and groups, ensuring consistency and security. I am currently on a DevOps journey with HNG Internship. To learn more, check HNG Internship and HNG Premium.

My Backend World: Tackling my first NestJS project

Damilare Ogundele — Sat, 29 Jun 2024 12:20:14 +0000

Hello everyone, I am Kahuna, and this is my first ever technical article. I am a graduate of Mechanical Engineering, but my enthusiasm for technology led me to dive into the world of backend development.

Recently, I had the opportunity to be part of a team for a project that involved using a lovely framework I had never used before: NestJS. This was challenging, but at the same time, I knew it would be an interesting experience. Stay with me as I highlight how I navigated through the project.

Solution

1. Understanding the framework

I learned that you don't jump into using a tool or framework without understanding it first. With this in mind, I delved into the NestJS documentation online and studied it to gain a solid understanding of what this framework does and how it works.

2. Trusting in mentors

Upon joining the team, I met some other developers who had more experience with the framework. I discussed my situation with them, explaining that this was my first project using NestJS. They provided guidance, and I was accountable to them. Whenever I got stuck with my code, they were there to help.

3. Consistency

As advised by a senior developer, consistency brings about mastery. I committed to writing code every day, which significantly improved my skills in NestJS and coding in general.

Conclusion

The above process has been incredibly beneficial and continues to aid me in my journey in backend development. Currently, I am embarking on a new journey in backend development with the HNG Internship. I highly recommend this platform to every newbie in tech. It offers the experience you need to grow and excel. To learn more, check out HNG Internship and HNG Premium.