<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vincent Tommi</title>
    <description>The latest articles on Forem by Vincent Tommi (@vincenttommi).</description>
    <link>https://forem.com/vincenttommi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1057211%2Fb677487b-3f87-456c-b555-669a164fba50.jpeg</url>
      <title>Forem: Vincent Tommi</title>
      <link>https://forem.com/vincenttommi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vincenttommi"/>
    <language>en</language>
    <item>
      <title>How to Reset a Django Admin Password Using the Django Shell</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Wed, 10 Dec 2025 04:42:03 +0000</pubDate>
      <link>https://forem.com/vincenttommi/how-to-reset-a-django-admin-password-using-the-django-shell-c59</link>
      <guid>https://forem.com/vincenttommi/how-to-reset-a-django-admin-password-using-the-django-shell-c59</guid>
      <description>&lt;p&gt;Forgetting the password for your Django admin account can be frustrating, but resetting it is quick and straightforward using the Django shell. This method works even if you no longer have access to the admin interface and doesn't require any additional packages.&lt;/p&gt;

&lt;p&gt;Step-by-Step Guide&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the Django shell
In your project directory (where manage.py is located), run the following command:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python manage.py shell
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will start an interactive Python shell with your Django project loaded.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reset the password
In the shell, execute the following code:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from django.contrib.auth.models import User

# Replace 'admin' with the actual username of the admin account
user = User.objects.get(username='admin')

# Set the new password (replace 'new_secure_password' with your desired password)
user.set_password('new_secure_password')

# Save the changes
user.save()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the correct username (it’s usually admin, but it could be something else if you created a custom superuser).&lt;/li&gt;
&lt;li&gt;set_password() automatically hashes the password for you—never use user.password = 'plain_text' as that would store the password unhashed.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Exit the shell
Once the password is updated, simply type:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;exit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or press &lt;code&gt;Ctrl+D&lt;/code&gt; (on Unix-like systems) or &lt;code&gt;Ctrl+Z&lt;/code&gt; then Enter (on Windows).&lt;/p&gt;

&lt;p&gt;That’s it! You can now log in to the Django &lt;code&gt;admin&lt;/code&gt; interface (/admin/) using the &lt;code&gt;username&lt;/code&gt; and the new password you just set.&lt;br&gt;
Bonus: Resetting Password for Any User (Not Just Admin)&lt;br&gt;
The same method works for any user in your database. Just change the username to the correct one&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user = User.objects.get(username='john_doe')
user.set_password('another_secure_password')
user.save()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Troubleshooting Tips&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;“User matching query does not exist”
Double-check the username (case-sensitive). You can list all users with:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User.objects.all()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Using a custom User model
If your project uses a custom user model (e.g., CustomUser), replace User with your model:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
from myapp.models import CustomUser
user = CustomUser.objects.get(username='admin')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conclusion&lt;/p&gt;

&lt;p&gt;With just a few lines of code, you can regain access to your Django admin account in seconds. This technique is especially useful on production servers or when email-based password resets are not configured.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>beginners</category>
      <category>backend</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Work on a Team with Git &amp; GitHub Without Breaking Everything</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Tue, 09 Dec 2025 18:44:57 +0000</pubDate>
      <link>https://forem.com/vincenttommi/how-to-work-on-a-team-with-git-github-without-breaking-everything-17fo</link>
      <guid>https://forem.com/vincenttommi/how-to-work-on-a-team-with-git-github-without-breaking-everything-17fo</guid>
      <description>&lt;p&gt;The Definitive Guide Every Engineering Team Should Adopt&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stop fighting over Git.&lt;/li&gt;
&lt;li&gt;Stop breaking main.&lt;/li&gt;
&lt;li&gt;Stop losing work.
This is the exact workflow used by high-performing teams at startups and scale-ups worldwide — simple enough for juniors, disciplined enough for staff engineers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Core Workflow (6 Commands to Rule Them All)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 1. Safely stash your in-progress changes
git stash push -m "wip: halfway through payment UI"

# 2. Fetch and integrate the latest main
git pull origin main --rebase    # preferred over merge for clean history

# 3. Re-apply your work on top
git stash pop                    # resolves conflicts early

# 4. Stage changes
git add .

# 5. Write a meaningful commit message
git commit -m "feat: add real-time donation progress bar with percentage"

# 6. Push to remote
git push origin HEAD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Daily &amp;amp; Hourly Discipline (Do This Religiously)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git stash
git pull origin main --rebase
git stash pop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First thing in the morning&lt;/li&gt;
&lt;li&gt;Before starting any new task&lt;/li&gt;
&lt;li&gt;After a teammate announces a hotfix&lt;/li&gt;
&lt;li&gt;Before pushing any commit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This single habit eliminates 95% of merge conflicts&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro-Level One-Liner (Add to Your Shell)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# ~/.zshrc or ~/.bash_profile
alias sync="git stash push -m 'autosave $(date +%H:%M)' \
    &amp;amp;&amp;amp; git pull origin main --rebase \
    &amp;amp;&amp;amp; git stash pop"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sync
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recommended Branching Strategy (Safe + Fast)&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Branch Name Example&lt;/th&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hotfix / Urgent&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hotfix/double-payment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Direct to main (with PR)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature (&amp;gt;1 hour)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;feat/donation-progress-bar&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Branch → PR → Review → Merge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bugfix&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fix/invalid-goal-calculation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Branch → PR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refactor / Chore&lt;/td&gt;
&lt;td&gt;&lt;code&gt;refactor/extract-payment-service&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Branch → PR&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Example: Start a proper feature branch
git pull origin main --rebase
git checkout -b feat/share-fundraiser-buttons
# ... work ...
git add .
git commit -m "feat: add social sharing for fundraisers"
git push -u origin feat/share-fundraiser-buttons
# → Open Pull Request on GitHub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conventional Commits (Your Team Will Thank You)&lt;br&gt;
Always use this format — enables auto-changelogs and clear history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;feat:     Add new feature
fix:      Bug fix
docs:     Documentation only changes
style:    Formatting, missing semicolons, etc.
refactor: Code change that neither fixes a bug nor adds a feature
perf:     Performance improvements
test:     Adding or correcting tests
chore:    Build process or auxiliary tool changes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m "feat: add fundraiser short link sharing"
git commit -m "fix: prevent negative donation amounts"
git commit -m "refactor: extract donation validation logic"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conflict Resolution (When stash pop Fails)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git stash pop
# → Conflict in payments/views.py

# Fix the &amp;lt;&amp;lt;&amp;lt; === &amp;gt;&amp;gt;&amp;gt; markers manually
# Then:
git add payments/views.py
git commit  # no -m needed, Git creates merge commit message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Golden Rules Every Team Member Must Follow&lt;/p&gt;

&lt;p&gt;1 Never commit directly to main (except hotfixes with approval)&lt;br&gt;
2 Always sync before starting work&lt;br&gt;
3 Always write descriptive commit messages&lt;br&gt;
4 Always push feature branches and open PRs&lt;br&gt;
5 Never force push main (or any shared branch)&lt;br&gt;
6 Rebase locally, merge on GitHub via PR (keeps history clean)&lt;/p&gt;

&lt;p&gt;Quick Reference Cheat Sheet (Pin This)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Stay in sync (run often)
sync                    # your alias
# or manually:
git stash &amp;amp;&amp;amp; git pull --rebase origin main &amp;amp;&amp;amp; git stash pop

# Ship completed work
git add .
git commit -m "type(scope): description"
git push

# Start new work safely
git pull --rebase origin main
git checkout -b feat/your-feature-name

# Emergency hotfix
git checkout main
git pull --rebase origin main
git checkout -b hotfix/critical-bug
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Final Words &lt;br&gt;
I’ve been on teams that lost entire days to merge conflicts.&lt;br&gt;
I’ve been on teams that deployed 20 times per day with zero drama.&lt;br&gt;
The difference was always this: discipline around syncing and branching.&lt;br&gt;
Adopt this workflow today.&lt;br&gt;
Enforce it in code reviews.&lt;br&gt;
Put it in your onboarding docs.&lt;br&gt;
Your future self — and every teammate who’s ever screamed at Git — will thank you.&lt;br&gt;
Now go forth and collaborate like professionals.&lt;br&gt;
Saved you from at least 47 rage-quits in 2025.&lt;br&gt;
You’re welcome. &lt;/p&gt;

</description>
      <category>webdev</category>
      <category>backend</category>
      <category>ai</category>
      <category>writing</category>
    </item>
    <item>
      <title>How to Build a Powerful &amp; Beginner-Friendly Django Admin</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Tue, 25 Nov 2025 16:05:33 +0000</pubDate>
      <link>https://forem.com/vincenttommi/how-to-build-a-powerful-beginner-friendly-django-admin-4k0b</link>
      <guid>https://forem.com/vincenttommi/how-to-build-a-powerful-beginner-friendly-django-admin-4k0b</guid>
      <description>&lt;p&gt;A Step-by-Step Tutorial Using a Real-World Fundraising Platform&lt;br&gt;
Perfect for intermediate Django developers who want to go from “it works” to “this admin is actually amazing”.&lt;br&gt;
We’ll use a real crowdfunding/startup fundraising app (with individuals, NGOs, and startups) to teach you every important Django admin feature — with copy-paste code and clear explanations.&lt;br&gt;
By the end of this tutorial, you’ll know how to:&lt;/p&gt;

&lt;p&gt;Show custom model properties in the list view&lt;br&gt;
Add filters, search, and bulk edits&lt;br&gt;
Use inlines to edit related models on the same page&lt;br&gt;
Make the change-form beautiful with fieldsets and collapse sections&lt;br&gt;
Add custom columns with links and formatted money&lt;br&gt;
Write safe, performant querysets&lt;br&gt;
Add helpful readonly fields&lt;/p&gt;

&lt;p&gt;Let’s build it together!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# models.py (simplified)
class Fundraiser(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE, null=True)
    fundraising_category = models.ForeignKey(FundraisingCategory, ...)
    short_code = models.CharField(max_length=10, unique=True)
    is_approved = models.BooleanField(default=False)
    is_private = models.BooleanField(default=False)
    status = models.CharField(max_length=20, choices=STATUS_CHOICES, default='active')
    created_at = models.DateTimeField(auto_now_add=True)

    @property
    def raised_amount(self):
        return self.paystack_transactions.filter(status='success').aggregate(
            total=Sum('amount')
        )['total'] or 0

class FundraiserImage(models.Model):
    fundraiser = models.ForeignKey(Fundraiser, on_delete=models.CASCADE, related_name='images')
    image = models.ImageField(...)
    is_primary = models.BooleanField(default=False)

class IndividualDetail(models.Model):  # OneToOne with Fundraiser
    fundraiser = models.OneToOneField(Fundraiser, related_name='individual_detail', ...)
    fundraiser_title = models.CharField(...)
    fundraiser_goal = models.DecimalField(...)

# + OrganisationDetail and StartupDetail (also OneToOne)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: The Most Important Admin — FundraiserAdmin&lt;br&gt;
This will be your main dashboard.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# admin.py
from django.contrib import admin
from django.urls import reverse
from django.utils.html import format_html
from .models import Fundraiser, FundraiserImage, IndividualDetail, OrganisationDetail, StartupDetail


class FundraiserImageInline(admin.TabularInline):   # Step A: Inline images
    model = FundraiserImage
    extra = 1
    fields = ('name', 'image', 'is_primary')
    readonly_fields = ('file_size',)


@admin.register(Fundraiser)
class FundraiserAdmin(admin.ModelAdmin):
    # 1. What columns to show in the list
    list_display = (
        'short_code',
        'user',
        'category_colored',           # custom column (we'll write it)
        'raised_vs_goal',               # beautiful progress column
        'is_approved',
        'is_private',
        'status',
        'created_at',
    )

    # 2. Right sidebar filters
    list_filter = (
        'is_approved',
        'is_private',
        'status',
        'fundraising_category',
        'created_at',
    )

    # 3. Search box
    search_fields = ('short_code', 'user__email', 'user__username')

    # 4. Click the checkbox → edit these fields directly in place
    list_editable = ('is_approved', 'is_private', 'status')

    # 5. These fields are shown but not editable
    readonly_fields = ('short_code', 'created_at', 'updated_at')

    # 6. Inline images right on the same page
    inlines = [FundraiserImageInline]

    # 7. Default sorting
    ordering = ('-created_at',)

    # 8. Performance: avoid N+1 queries
    def get_queryset(self, request):
        return super().get_queryset(request).select_related(
            'user', 'fundraising_category'
        ).prefetch_related('paystack_transactions')

    # 9. Custom column: show category with color
    def category_colored(self, obj):
        color = {
            'Medical': 'crimson',
            'Education': 'royalblue',
            'Startup': 'green',
        }.get(obj.fundraising_category.name, 'gray')
        return format_html(
            '&amp;lt;span style="color: white; background:{}; padding: 2px 8px; border-radius: 4px;"&amp;gt;{}&amp;lt;/span&amp;gt;',
            color, obj.fundraising_category or "—"
        )
    category_colored.short_description = "Category"

    # 10. Custom column: $12,450 / $50,000 (54%)
    def raised_vs_goal(self, obj):
        goal = None
        if hasattr(obj, 'individual_detail'):
            goal = obj.individual_detail.fundraiser_goal
        elif hasattr(obj, 'organisation_details'):
            goal = obj.organisation_details.fundraiser_goal
        elif hasattr(obj, 'startup_detail'):
            goal = obj.startup_detail.fundraiser_goal

        if not goal:
            return "—"

        raised = obj.raised_amount
        percentage = (raised / goal * 100 if goal &amp;gt; 0 else 0

        return format_html(
            '&amp;lt;b&amp;gt;${:,.0f}&amp;lt;/b&amp;gt; → ${:,.0f} &amp;lt;small&amp;gt;({:.0f}%)&amp;lt;/small&amp;gt;',
            raised, goal, percentage
        )
    raised_vs_goal.short_description = "Raised / Goal"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: Your admin list now looks professional and saves hours of clicking around.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Make Individual/Organisation/Startup Pages Beautiful&lt;br&gt;
Example: StartupDetailAdmin&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@admin.register(StartupDetail)
class StartupDetailAdmin(admin.ModelAdmin):
    list_display = ('startup_name', 'fundraiser_link', 'industry', 'stage', 'fundraiser_goal')
    list_filter = ('industry', 'stage', 'team_size')
    search_fields = ('startup_name', 'fundraiser__short_code')

    # Group fields nicely on the edit page
    fieldsets = (
        ("Linked Fundraiser", {
            'fields': ('fundraiser',),
            'description': 'This startup belongs to the fundraiser below'
        }),
        ("Startup Info", {
            'fields': ('startup_name', 'business_description', 'location', 'website')
        }),
        ("Fundraising", {
            'fields': ('fundraiser_title', 'fundraiser_details', 'fundraiser_goal')
        }),
        ("Classification", {
            'fields': ('industry', 'stage', 'team_size')
        }),
        ("Social Media (optional)", {
            'fields': ('social_media',),
            'classes': ('collapse',)  # collapsed by default
        }),
    )

    readonly_fields = ('created_at', 'updated_at')

    # Nice clickable link back to the main fundraiser
    def fundraiser_link(self, obj):
        url = reverse('admin:yourapp_fundraiser_change', args=[obj.fundraiser.id])
        return format_html('&amp;lt;a href="{}"&amp;gt;{} → View Fundraiser&amp;lt;/a&amp;gt;', url, obj.fundraiser.short_code)
    fundraiser_link.short_description = "Fundraiser"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do the same for IndividualDetailAdmin and OrganisationDetailAdmin — just change the fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Bonus — Useful Tricks Every Django Developer Should Know&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 1. Custom bulk actions
def approve_selected(modeladmin, request, queryset):
    updated = queryset.update(is_approved=True)
    modeladmin.message_user(request, f"{updated} fundraisers approved!")
approve_selected.short_description = "Approve selected fundraisers"

FundraiserAdmin.actions = ['approve_selected']

# 2. Show image preview in list
def admin_image_preview(self, obj):
    if obj.image:
        return format_html('&amp;lt;img src="{}" width="80" height="50" style="object-fit: cover;"/&amp;gt;', obj.image.url)
    return "(No image)"
admin_image_preview.short_description = "Preview"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Final Result&lt;br&gt;
You now have an admin that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Non-technical staff love using&lt;/li&gt;
&lt;li&gt;Shows real-time money raised&lt;/li&gt;
&lt;li&gt;Lets you approve 50 campaigns in 10 seconds&lt;/li&gt;
&lt;li&gt;Handles three different content types without confusion&lt;/li&gt;
&lt;li&gt;Looks clean and professional&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copy the full final admin.py below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Full final admin.py (ready to copy-paste)
from django.contrib import admin
from django.db.models import Sum
from django.urls import reverse
from django.utils.html import format_html
from .models import (
    Fundraiser, FundraiserImage,
    IndividualDetail, OrganisationDetail, StartupDetail
)

class FundraiserImageInline(admin.TabularInline):
    model = FundraiserImage
    extra = 1
    fields = ('name', 'image', 'is_primary', 'file_size')
    readonly_fields = ('file_size',)

@admin.register(Fundraiser)
class FundraiserAdmin(admin.ModelAdmin):
    list_display = ('short_code', 'user', 'category_colored', 'raised_vs_goal',
                    'is_approved', 'is_private', 'status', 'created_at')
    list_filter = ('is_approved', 'is_private', 'status', 'fundraising_category')
    search_fields = ('short_code', 'user__email')
    list_editable = ('is_approved', 'is_private', 'status')
    readonly_fields = ('short_code', 'created_at', 'updated_at')
    inlines = [FundraiserImageInline]
    ordering = ('-created_at',)

    def get_queryset(self, request):
        return super().get_queryset(request).select_related(
            'user', 'fundraising_category'
        ).prefetch_related('paystack_transactions')

    def category_colored(self, obj):
        colors = {'Medical': 'crimson', 'Education': 'royalblue', 'Startup': 'green'}
        color = colors.get(obj.fundraising_category.name if obj.fundraising_category else '', 'gray')
        return format_html(
            '&amp;lt;span style="color:white; background:{}; padding:3px 8px; border-radius:4px;"&amp;gt;{}&amp;lt;/span&amp;gt;',
            color, obj.fundraising_category or "—"
        )
    category_colored.short_description = "Category"

    def raised_vs_goal(self, obj):
        # same as earlier — omitted for brevity
        pass
    raised_vs_goal.short_description = "Progress"

    def approve_selected(self, request, queryset):
        updated = queryset.update(is_approved=True)
        self.message_user(request, f"{updated} fundraisers approved.")
    approve_selected.short_description = "Approve selected"
    actions = ['approve_selected']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
      <category>python</category>
      <category>performance</category>
    </item>
    <item>
      <title>Long Polling vs WebSockets — How to Achieve Real-Time Communication day 55 of system design</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Mon, 13 Oct 2025 01:56:47 +0000</pubDate>
      <link>https://forem.com/vincenttommi/long-polling-vs-websockets-how-to-achieve-real-time-communication-day-55-of-system-design-1nhl</link>
      <guid>https://forem.com/vincenttommi/long-polling-vs-websockets-how-to-achieve-real-time-communication-day-55-of-system-design-1nhl</guid>
      <description>&lt;p&gt;Learn the key differences between Long Polling and WebSockets, how they work, and when to use each for real-time applications — with Python examples&lt;/p&gt;

&lt;p&gt;Whether you’re playing an online game or chatting with a friend — updates appear in &lt;em&gt;real-time&lt;/em&gt; without ever hitting “refresh.”&lt;/p&gt;

&lt;p&gt;Behind these seamless experiences lies a crucial engineering decision: &lt;strong&gt;how to push real-time updates from servers to clients&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The traditional HTTP model was built around request–response:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Client asks, server answers.”  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But in real-time systems, the &lt;strong&gt;server needs to talk first — and more often&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;Long Polling&lt;/strong&gt; and &lt;strong&gt;WebSockets&lt;/strong&gt; come in — two popular methods to achieve real-time communication on the web.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 1. Why Traditional HTTP Isn’t Enough
&lt;/h2&gt;

&lt;p&gt;HTTP follows a &lt;strong&gt;client-driven request–response&lt;/strong&gt; model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The client (browser/app) sends a request to the server.
&lt;/li&gt;
&lt;li&gt;The server processes the request and responds.
&lt;/li&gt;
&lt;li&gt;The connection closes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This works fine for static or on-demand content, but for live data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ The server can’t &lt;em&gt;push&lt;/em&gt; updates to the client.&lt;/li&gt;
&lt;li&gt;❌ HTTP is &lt;em&gt;stateless&lt;/em&gt;, so there’s no persistent connection.&lt;/li&gt;
&lt;li&gt;❌ You’d need constant polling to get new data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To build &lt;strong&gt;truly real-time experiences&lt;/strong&gt; — like live chat, multiplayer games, or financial tickers — we need a way for the server to instantly notify clients of updates.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⏳ 2. Long Polling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Long Polling&lt;/strong&gt; is a clever hack that &lt;em&gt;simulates&lt;/em&gt; real-time communication over standard HTTP.&lt;/p&gt;

&lt;p&gt;Instead of sending requests every second (like regular polling), the client &lt;strong&gt;sends a request and waits&lt;/strong&gt; — keeping the connection open until the server has something new to send.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚙️ How It Works
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Client sends a request and waits for new data.
&lt;/li&gt;
&lt;li&gt;The server &lt;strong&gt;holds the connection open&lt;/strong&gt; until it has data or a timeout occurs.
&lt;/li&gt;
&lt;li&gt;If new data arrives → server responds immediately.
&lt;/li&gt;
&lt;li&gt;If timeout occurs → server sends a minimal response.
&lt;/li&gt;
&lt;li&gt;The client &lt;strong&gt;immediately reopens&lt;/strong&gt; a new connection.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a near-continuous loop that feels real-time.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simple to implement (standard HTTP).
&lt;/li&gt;
&lt;li&gt;Works everywhere — across proxies, firewalls, and browsers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ❌ Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Slight latency after each update (client must reconnect).
&lt;/li&gt;
&lt;li&gt;Server overhead (many open “hanging” connections).
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  💡 Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simple chat apps or comment feeds.
&lt;/li&gt;
&lt;li&gt;Notification systems (e.g., “new email” alerts).
&lt;/li&gt;
&lt;li&gt;Legacy systems that can’t use WebSockets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  💻 Example (Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
import requests
import time

def long_poll():
    while True:
        try:
            response = requests.get("http://localhost:5000/updates", timeout=60)
            if response.status_code == 200 and response.text.strip():
                print("New data:", response.json())
            else:
                print("No new data, reconnecting...")
        except requests.exceptions.Timeout:
            print("Timeout reached, reconnecting...")
        except Exception as e:
            print("Error:", e)
            time.sleep(5)
        finally:
            # Immediately re-establish connection
            continue

if __name__ == "__main__":
    long_poll()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;WebSockets&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;WebSockets provide a persistent, full-duplex connection between the client and server — meaning both can send messages to each other at any time.&lt;/p&gt;

&lt;p&gt;This removes the overhead of repeatedly opening and closing HTTP connections.&lt;/p&gt;

&lt;p&gt;How It Works&lt;/p&gt;

&lt;p&gt;Handshake:&lt;br&gt;
The client sends an HTTP request with Upgrade: websocket.&lt;/p&gt;

&lt;p&gt;Connection Upgrade:&lt;br&gt;
The server switches from HTTP → WebSocket (ws:// or wss://).&lt;/p&gt;

&lt;p&gt;Persistent Channel:&lt;br&gt;
Both client and server can now exchange messages freely until the connection closes.&lt;/p&gt;

&lt;p&gt;✅ Pros&lt;/p&gt;

&lt;p&gt;Extremely low latency.&lt;/p&gt;

&lt;p&gt;Less network overhead (single persistent connection).&lt;/p&gt;

&lt;p&gt;Scales well for frequent or high-volume updates.&lt;/p&gt;

&lt;p&gt;❌ Cons&lt;/p&gt;

&lt;p&gt;Slightly more complex setup (client + server must support it).&lt;/p&gt;

&lt;p&gt;Some firewalls/proxies may block WebSocket traffic.&lt;/p&gt;

&lt;p&gt;Managing reconnections adds implementation complexity.&lt;/p&gt;

&lt;p&gt;💡 Use Cases&lt;/p&gt;

&lt;p&gt;Real-time chat and collaboration tools (Slack, Google Docs).&lt;/p&gt;

&lt;p&gt;Multiplayer online games.&lt;/p&gt;

&lt;p&gt;Live dashboards (sports, finance, IoT).&lt;/p&gt;

&lt;p&gt;Example of the above use cases&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import asyncio
import websockets
import json

async def connect():
    uri = "ws://localhost:6789"
    async with websockets.connect(uri) as websocket:
        await websocket.send(json.dumps({"message": "Hello Server!"}))
        print("Connected to server and sent greeting.")

        try:
            async for message in websocket:
                data = json.loads(message)
                print("Received:", data)
        except websockets.ConnectionClosed:
            print("Connection closed. Reconnecting...")
            await asyncio.sleep(2)
            await connect()

 4. Choosing the Right Approach

if __name__ == "__main__":
    asyncio.run(connect())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Long Polling&lt;/th&gt;
&lt;th&gt;WebSockets&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple (HTTP-based)&lt;/td&gt;
&lt;td&gt;Requires setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher latency&lt;/td&gt;
&lt;td&gt;Near-zero latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scalability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited for many clients&lt;/td&gt;
&lt;td&gt;Scales efficiently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compatibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Works everywhere&lt;/td&gt;
&lt;td&gt;May need proxy support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Notifications, light updates&lt;/td&gt;
&lt;td&gt;Real-time apps, games&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;🧩 5. Alternatives Worth Considering&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Server-Sent Events (SSE)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One-way communication: server → client.&lt;/p&gt;

&lt;p&gt;Lightweight and simple for push notifications or news feeds.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;MQTT&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Publish–subscribe protocol used in IoT.&lt;/p&gt;

&lt;p&gt;Designed for lightweight, device-to-server messaging.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Socket.io&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Abstraction layer over WebSockets (and Long Polling fallback).&lt;/p&gt;

&lt;p&gt;Handles reconnections, fallbacks, and cross-browser quirks automatically.&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;While both Long Polling and WebSockets achieve “real-time” communication, the right choice depends on your project’s needs:&lt;/p&gt;

&lt;p&gt;Choose Long Polling when simplicity and broad compatibility matter.&lt;/p&gt;

&lt;p&gt;Choose WebSockets when performance, scalability, and bidirectional communication are key.&lt;/p&gt;

&lt;p&gt;Either way, both are essential tools in building modern, dynamic, and interactive web experiences. &lt;/p&gt;

</description>
      <category>programming</category>
      <category>systemdesign</category>
      <category>webdev</category>
      <category>python</category>
    </item>
    <item>
      <title>Concurrency vs Parallelism: Understanding the Difference with Examples day 54 of system design</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Fri, 19 Sep 2025 08:47:42 +0000</pubDate>
      <link>https://forem.com/vincenttommi/concurrency-vs-parallelism-understanding-the-difference-with-examples-day-54-of-system-design-27bh</link>
      <guid>https://forem.com/vincenttommi/concurrency-vs-parallelism-understanding-the-difference-with-examples-day-54-of-system-design-27bh</guid>
      <description>&lt;p&gt;Concurrency and parallelism are two of the most misunderstood concepts in system design.&lt;/p&gt;

&lt;p&gt;While they might sound similar, they refer to fundamentally different approaches to handling tasks.&lt;/p&gt;

&lt;p&gt;Simply put:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Concurrency is about dealing with lots of things at once (task management).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parallelism is about doing lots of things at once (task execution).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we’ll break down the differences, explore how they work, and walk through real-world applications with examples and code.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is Concurrency?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Concurrency means an application is making progress on more than one task at the same time.&lt;/p&gt;

&lt;p&gt;Even though a single CPU core can only execute one task at a time, it achieves concurrency by rapidly switching between tasks (context switching).&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Playing music while writing code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The CPU alternates between the two tasks so quickly that it feels like both are happening simultaneously.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But remember: this is not parallelism. This is concurrency.&lt;/p&gt;

&lt;p&gt;Real-World Examples&lt;/p&gt;

&lt;p&gt;Web Browsers: Rendering pages, fetching resources, responding to clicks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Web Servers: Handling multiple requests at the same time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chat Apps: Sending/receiving messages, updating the UI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Video Games: Rendering, physics, input handling, background music.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code Example: Concurrency in Python (asyncio)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import asyncio

async def task(name):
    for i in range(1, 4):
        print(f"{name} - Step {i}")
        await asyncio.sleep(0.5)  # simulate I/O work

async def main():
    await asyncio.gather(
        task("Task A"),
        task("Task B"),
        task("Task C"),
    )

asyncio.run(main())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output (interleaved execution):&lt;/p&gt;

&lt;p&gt;Task A - Step 1&lt;br&gt;
Task B - Step 1&lt;br&gt;
Task C - Step 1&lt;br&gt;
Task A - Step 2&lt;br&gt;
Task B - Step 2&lt;br&gt;
Task C - Step 2&lt;br&gt;
...&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is Parallelism?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Parallelism means multiple tasks are executed at the exact same time.&lt;/p&gt;

&lt;p&gt;This requires multiple CPU cores or processors. Each task (or subtask) gets its own execution unit.&lt;/p&gt;

&lt;p&gt;Real-World Examples&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Machine Learning Training: Distribute dataset batches across GPUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Video Rendering: Multiple frames processed simultaneously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Web Crawlers: Fetch URLs in parallel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Big Data: Distribute jobs across a cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scientific Simulations: Weather modeling, physics simulations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code Example: Parallelism in Python (multiprocessing)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from multiprocessing import Pool
import time

def work(n):
    print(f"Processing {n}")
    time.sleep(1)  # simulate CPU work
    return n * n

if __name__ == "__main__":
    numbers = [1, 2, 3, 4]

    with Pool(processes=4) as pool:  # use 4 CPU cores
        results = pool.map(work, numbers)

    print("Results:", results)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output (executed in parallel):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Processing 1
Processing 2
Processing 3
Processing 4
Results: [1, 4, 9, 16]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, each task runs on a separate CPU core at the same time.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Concurrency vs Parallelism: Putting It All Together&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Concurrent, Not Parallel: Single-core CPU rapidly switching tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parallel, Not Concurrent: One task split into subtasks, each core handles one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Neither: Sequential execution, one task at a time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Both: Multi-core CPU handling multiple concurrent tasks, each split into parallel subtasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;Concurrency = task management (making progress on many things).&lt;/p&gt;

&lt;p&gt;Parallelism = task execution (doing many things simultaneously).&lt;/p&gt;

&lt;p&gt;Most modern systems use both together for efficiency.&lt;/p&gt;

&lt;p&gt;Understanding these concepts helps you design scalable, efficient software — whether you're writing backend servers, training ML models, or building real-time apps.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>python</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Vertical vs Horizontal Scaling: Choosing the Right Strategy for Your Application day 53 of system design</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Thu, 18 Sep 2025 09:55:49 +0000</pubDate>
      <link>https://forem.com/vincenttommi/vertical-vs-horizontal-scaling-choosing-the-right-strategy-for-your-application-378a</link>
      <guid>https://forem.com/vincenttommi/vertical-vs-horizontal-scaling-choosing-the-right-strategy-for-your-application-378a</guid>
      <description>&lt;p&gt;As your application grows, it requires more resources to handle the increasing demand. To meet this challenge, two common strategies emerge: &lt;strong&gt;vertical scaling&lt;/strong&gt; (scaling up) and &lt;strong&gt;horizontal scaling&lt;/strong&gt; (scaling out).&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore the pros and cons of both approaches and help you understand when to use one over the other.&lt;/p&gt;




&lt;h2&gt;
  
  
  Vertical Scaling (Scaling Up)
&lt;/h2&gt;

&lt;p&gt;Vertical scaling involves upgrading the resources of a single machine within your system. This can mean enhancing the CPU, RAM, storage, or other hardware components.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Upgrading CPU&lt;/strong&gt;: Replacing your server’s processor with a more powerful one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Increasing RAM&lt;/strong&gt;: Adding more memory to process larger datasets efficiently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhancing Storage&lt;/strong&gt;: Using faster SSDs or increasing total storage capacity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ✅ Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity&lt;/strong&gt;: Easy to implement with minimal architectural changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low latency&lt;/strong&gt;: No inter-server communication needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced software costs&lt;/strong&gt;: Often cheaper initially compared to scaling out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No major code changes&lt;/strong&gt;: Works well without modifying your application significantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ❌ Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Limited scalability&lt;/strong&gt;: There’s only so much you can upgrade a single machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single point of failure&lt;/strong&gt;: If that server fails, the entire system can go down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Downtime&lt;/strong&gt;: Hardware upgrades often require taking the system offline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High costs in the long run&lt;/strong&gt;: High-end servers become expensive quickly.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Horizontal Scaling (Scaling Out)
&lt;/h2&gt;

&lt;p&gt;Horizontal scaling involves adding more servers or nodes to the system and distributing the workload across them, often with a load balancer.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Near-limitless scalability&lt;/strong&gt;: Add as many nodes as needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved fault tolerance&lt;/strong&gt;: Failure of one node doesn’t crash the whole system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-effective hardware&lt;/strong&gt;: Uses multiple commodity servers instead of one expensive machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ❌ Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt;: Requires careful handling of data consistency, load balancing, and networking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Increased latency&lt;/strong&gt;: Communication between nodes introduces overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher initial setup costs&lt;/strong&gt;: Infrastructure is more complex to maintain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application compatibility&lt;/strong&gt;: Some apps need code adjustments to run on distributed systems.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When to Choose Vertical vs Horizontal Scaling
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choose &lt;strong&gt;Vertical Scaling&lt;/strong&gt; when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Your app has &lt;strong&gt;limited scalability needs&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;You’re working with &lt;strong&gt;legacy applications&lt;/strong&gt; that are hard to distribute.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low latency&lt;/strong&gt; is critical.&lt;/li&gt;
&lt;li&gt;You’re on a &lt;strong&gt;cost-sensitive project&lt;/strong&gt; with minimal infrastructure budget.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose &lt;strong&gt;Horizontal Scaling&lt;/strong&gt; when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You anticipate &lt;strong&gt;rapid growth&lt;/strong&gt; in traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High availability&lt;/strong&gt; is required.&lt;/li&gt;
&lt;li&gt;Your app can be &lt;strong&gt;easily distributed&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;You’re using a &lt;strong&gt;microservices architecture&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Cost-effectiveness with &lt;strong&gt;commodity hardware&lt;/strong&gt; is a priority.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Combining Vertical and Horizontal Scaling
&lt;/h2&gt;

&lt;p&gt;In many cases, the best solution is a &lt;strong&gt;hybrid approach&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start by scaling vertically until you reach the limits of a single machine.&lt;/li&gt;
&lt;li&gt;Transition to horizontal scaling as demand grows further.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vertically scaled clusters&lt;/strong&gt;: Each node is powerful, but the cluster scales horizontally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database sharding&lt;/strong&gt;: Data is spread across multiple servers (horizontal), with each server scaled vertically for performance.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The choice between vertical and horizontal scaling depends on your &lt;strong&gt;application’s needs, growth expectations, budget, and uptime requirements&lt;/strong&gt;. Often, the most effective strategy is to combine both approaches: start with vertical scaling for simplicity and cost savings, then plan for horizontal scaling to ensure long-term scalability and resilience.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>softwaredevelopment</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Mastering Microservices: Lessons from Netflix’s Journey on AWS</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Wed, 17 Sep 2025 06:22:08 +0000</pubDate>
      <link>https://forem.com/vincenttommi/mastering-microservices-lessons-from-netflixs-journey-on-aws-3b7n</link>
      <guid>https://forem.com/vincenttommi/mastering-microservices-lessons-from-netflixs-journey-on-aws-3b7n</guid>
      <description>&lt;p&gt;Netflix, a global streaming giant, runs its infrastructure on AWS, transitioning from a monolithic architecture to a microservices architecture to address scalability and reliability challenges. This article explores why Netflix adopted microservices, the benefits and challenges of this approach, and practical solutions drawn from their experience. We'll also cover best practices to help you navigate the complexities of microservices architecture.&lt;/p&gt;

&lt;p&gt;Why Netflix Moved to Microservices&lt;/p&gt;

&lt;p&gt;Netflix initially relied on a monolithic architecture, but as their platform grew, they faced significant challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Debugging Difficulties: Frequent changes to a single codebase made it hard to pinpoint bugs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Vertical Scaling Limits: Scaling the monolith vertically (adding more resources to a single server) became inefficient.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Single Points of Failure: The monolith introduced risks where a single failure could bring down the entire system.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By adopting microservices, Netflix achieved greater scalability, flexibility, and resilience, but this transition introduced new challenges that required innovative solutions.&lt;/p&gt;

&lt;p&gt;Benefits of Microservices&lt;/p&gt;

&lt;p&gt;Microservices offer several advantages over monolithic architectures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Independent Scaling: Each service can scale independently based on demand.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Faster Development: Teams can work on different services simultaneously, speeding up deployment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved Fault Isolation: A failure in one service doesn’t necessarily affect others.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, these benefits come with trade-offs, particularly in three key areas: dependency, scale, and variance.&lt;/p&gt;

&lt;p&gt;Challenges and Solutions in Microservices Architecture&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Dependency&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Dependencies between microservices can lead to cascading failures and increased complexity. Here are four scenarios where dependency issues arise, along with Netflix’s solutions:&lt;/p&gt;

&lt;p&gt;i) Intra-Service Requests&lt;/p&gt;

&lt;p&gt;When one service (e.g., Service A) depends on another (e.g., Service B) to fulfill a client request, a failure in Service B can cause a cascading failure.&lt;/p&gt;

&lt;p&gt;Solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Circuit Breaker Pattern: Prevents operations likely to fail by halting requests to a failing service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fault Injection Testing: Simulates failures to verify circuit breaker functionality.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fallback to Static Page: Ensures the system remains responsive by serving a static page during failures.&lt;/p&gt;

&lt;p&gt;ii) Client Libraries&lt;/p&gt;

&lt;p&gt;An API gateway centralizes business logic for various clients but can introduce issues like high heap consumption, logical defects, or transitive dependencies.&lt;/p&gt;

&lt;p&gt;Solution: Keep the API gateway simple to prevent it from becoming a new monolith.&lt;/p&gt;

&lt;p&gt;iii) Persistence&lt;/p&gt;

&lt;p&gt;Choosing a storage layer involves trade-offs between availability and consistency, as dictated by the CAP theorem.&lt;/p&gt;

&lt;p&gt;Solution: Analyze data access patterns and select the appropriate storage system (e.g., SQL for consistency, NoSQL for availability).&lt;/p&gt;

&lt;p&gt;Exponential Backoff: Avoids overwhelming services by spacing out retry attempts, preventing the "thundering herd" problem.&lt;/p&gt;

&lt;p&gt;Challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Degraded Availability: Downtime in individual services compounds overall system downtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Increased Test Scope: The number of test permutations grows with more services.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;iii) Persistence&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choosing a storage layer involves trade-offs between availability and consistency, as dictated by the CAP theorem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Solution: Analyze data access patterns and select the appropriate storage system (e.g., SQL for consistency, NoSQL for availability).&lt;/p&gt;

&lt;p&gt;iv) Infrastructure&lt;/p&gt;

&lt;p&gt;An entire data center failure can disrupt services.&lt;/p&gt;

&lt;p&gt;Solution: Replicate infrastructure across multiple data centers for redundancy.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scale&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Scalability is the ability to handle increased workloads while maintaining performance. Netflix addresses scalability in three dimensions: stateless services, stateful services, and hybrid services.&lt;/p&gt;

&lt;p&gt;i) Stateless Services&lt;/p&gt;

&lt;p&gt;Stateless services have no instance affinity (no sticky sessions) and can handle failures without significant impact.&lt;/p&gt;

&lt;p&gt;Solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Replication: Deploy multiple instances for high availability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Autoscaling: Automatically adjust resources based on demand to handle traffic spikes, node failures, or performance bugs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Testing: Use chaos engineering to simulate disruptions and verify autoscaling reliability.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Variance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Variance refers to the diversity in software architecture, which increases system complexity.&lt;/p&gt;

&lt;p&gt;i) Operational Drift&lt;/p&gt;

&lt;p&gt;Operational drift occurs unintentionally over time due to new features, leading to issues like increased alert thresholds, timeouts, or degraded throughput.&lt;/p&gt;

&lt;p&gt;Solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Continuous Learning and Automation:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Review incident resolutions to prevent recurrence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analyze incidents for patterns and derive best practices.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automate best practices and promote their adoption.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ii) Polyglot Architecture&lt;/p&gt;

&lt;p&gt;Using different programming languages for microservices (polyglot) introduces complexity, including tooling challenges, operational overhead, and duplicated business logic.&lt;/p&gt;

&lt;p&gt;Solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Raise awareness of technology costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Limit centralized support to critical services.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prioritize reusable solutions with proven technologies.&lt;/p&gt;

&lt;p&gt;Benefit: Polyglot architecture encourages API gateway decomposition, reducing central bottlenecks.&lt;/p&gt;

&lt;p&gt;Netflix’s Microservices Best Practices&lt;/p&gt;

&lt;p&gt;Netflix’s experience offers a checklist of best practices for microservices architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Automate Tasks: Reduce manual overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set Up Alerts: Monitor system health proactively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Autoscale: Handle dynamic loads efficiently.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chaos Engineering: Test resilience through controlled disruptions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consistent Naming Conventions: Simplify service management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Health Check Services: Monitor service availability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Blue-Green Deployment: Enable quick rollbacks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure Timeouts, Retries, and Fallbacks: Ensure system responsiveness.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conclusion&lt;/p&gt;

&lt;p&gt;Change is inevitable in microservices, and failures often accompany changes. Netflix’s approach emphasizes moving quickly while minimizing breaking changes. Restructuring teams to align with the microservices architecture also enhances efficiency.&lt;/p&gt;

&lt;p&gt;By addressing dependency, scale, and variance challenges with proven solutions like circuit breakers, autoscaling, and automation, Netflix has built a robust, scalable system. These lessons can guide any organization transitioning to or optimizing a microservices architecture.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>distributedsystems</category>
      <category>softwaredevelopment</category>
      <category>development</category>
    </item>
    <item>
      <title>Understanding Checksums: Your Data's Digital Fingerprint day 52 of system design</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Wed, 17 Sep 2025 06:02:07 +0000</pubDate>
      <link>https://forem.com/vincenttommi/understanding-checksums-your-datas-digital-fingerprint-day-52-of-system-design-4ij2</link>
      <guid>https://forem.com/vincenttommi/understanding-checksums-your-datas-digital-fingerprint-day-52-of-system-design-4ij2</guid>
      <description>&lt;p&gt;Imagine you're sending an important letter to a friend through the mail. Before sealing the envelope, you take a photo of the letter. When your friend receives it, they take another photo and send it back to you. If the two photos match, you know the letter arrived untampered and intact. If they don't, something went wrong during transit—perhaps the letter was altered or damaged.&lt;/p&gt;

&lt;p&gt;In the digital world, checksums serve a similar purpose. Just as photos verify the integrity of a physical letter, checksums answer the question: Has this data been altered unintentionally or maliciously since it was created, stored, or transmitted? In this article, we'll dive into what checksums are, how they work, their types, and their real-world applications.&lt;/p&gt;

&lt;p&gt;What is a Checksum?&lt;/p&gt;

&lt;p&gt;A checksum is a unique digital fingerprint generated from a piece of data before it's transmitted or stored. When the data reaches its destination, the fingerprint is recalculated and compared to the original. If they match, the data is intact. If not, it’s a sign of corruption or tampering.&lt;/p&gt;

&lt;p&gt;Checksums are created by applying a mathematical operation to the data, such as summing all its bytes or using a cryptographic hash function. This process produces a compact value that represents the data’s integrity.&lt;/p&gt;

&lt;p&gt;How Does a Checksum Work?&lt;/p&gt;

&lt;p&gt;The process of using a checksum for error detection is simple yet powerful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Calculation: Before sending or storing data, an algorithm processes the data to generate a checksum value.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transmission/Storage: The checksum is attached to the data and sent over a network or saved in storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verification: Upon receipt or retrieval, the same algorithm recalculates the checksum from the received data and compares it to the original checksum.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Error Detection: If the checksums match, the data is intact. If they differ, the data has been altered or corrupted during transmission or storage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Types of Checksums&lt;/p&gt;

&lt;p&gt;There are several types of checksums, each suited for different use cases. Here are the most common ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Parity Bit: A single bit added to a group of bits to ensure the total number of 1s is either even (even parity) or odd (odd parity). It’s simple but limited, as it can only detect single-bit errors and fails if an even number of bits are flipped.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cyclic Redundancy Check (CRC): CRC treats the data as a large binary number and divides it by a predetermined divisor. The remainder becomes the checksum. CRCs are excellent for detecting errors caused by noise in transmission channels.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cryptographic Hash Functions: These one-way functions generate a fixed-size hash value from the data. Popular examples include MD5, SHA-1, and SHA-256. They’re widely used for verifying data integrity and authenticity, though some (like MD5) are less secure for cryptographic purposes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why Checksums Matter&lt;/p&gt;

&lt;p&gt;Checksums are a critical line of defense in the digital world, safeguarding data against errors and corruption. From ensuring the integrity of a downloaded file to verifying the accuracy of a network transmission, checksums work behind the scenes to maintain trust in our digital systems.&lt;/p&gt;

&lt;p&gt;By acting as a digital fingerprint, checksums provide a simple yet effective way to detect issues, giving us confidence in the accuracy and reliability of our data.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>systems</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>Understanding the Circuit Breaker Pattern in Distributed Systems day 52 of system design</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Tue, 16 Sep 2025 09:52:36 +0000</pubDate>
      <link>https://forem.com/vincenttommi/understanding-the-circuit-breaker-pattern-in-distributed-systems-day-52-of-system-design-im3</link>
      <guid>https://forem.com/vincenttommi/understanding-the-circuit-breaker-pattern-in-distributed-systems-day-52-of-system-design-im3</guid>
      <description>&lt;p&gt;In a distributed system, you never know how or when things might go wrong. Network glitches, component failures, or even a rogue router can wreak havoc. As a software engineer, it’s your job to keep these systems resilient and alive. Enter the Circuit Breaker Pattern—a design pattern that helps prevent cascading failures and keeps your services running smoothly. &lt;/p&gt;

&lt;p&gt;In this article, we’ll dive into what the Circuit Breaker Pattern is, why it’s critical for microservices, and how it works with a practical use case. Let’s get started! &lt;/p&gt;

&lt;p&gt;What is a Circuit Breaker? &lt;/p&gt;

&lt;p&gt;If your house runs on electricity, you’re probably familiar with a circuit breaker. It’s an electrical switch that automatically cuts off power to protect your circuits from damage due to overloads (like a lightning strike) or short circuits. Its job? Stop the current flow when something goes wrong to protect your appliances.&lt;/p&gt;

&lt;p&gt;The Circuit Breaker Pattern in software engineering works in a similar way. It’s designed to halt request-and-response processes when a service fails, preventing your system from spiraling into chaos. Let’s explore how.&lt;/p&gt;

&lt;p&gt;What is the Circuit Breaker Pattern? &lt;/p&gt;

&lt;p&gt;The Circuit Breaker Pattern stops a service call when it detects that the service is failing, much like its electrical namesake. Here’s how it works in a nutshell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A consumer sends requests to multiple services, but one service is down due to technical issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Without a circuit breaker, the consumer keeps sending requests to the failed service, wasting resources and degrading performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Circuit Breaker Pattern introduces a proxy that acts as a barrier between the consumer and the service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When failures exceed a threshold, the circuit breaker trips, blocking further requests for a set time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;During this timeout, requests to the failed service are rejected immediately.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After the timeout, the circuit breaker allows a few test requests. If they succeed, it resumes normal operation; if they fail, the timeout restarts.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This pattern prevents resource exhaustion and ensures a better user experience by failing fast. &lt;/p&gt;

&lt;p&gt;The Main Use Case: Employee Management System &lt;/p&gt;

&lt;p&gt;To illustrate the Circuit Breaker Pattern, let’s use a microservices-based employee management system for a fictional company, Mercantile Finance. This system includes four services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Service 1: Fetches personal information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Service 2: Retrieves leave information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Service 3: Provides employee performance data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service 4: Handles allocation information.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;These services are called using an aggregator pattern, where a proxy coordinates requests to multiple backend services. If one service fails, the entire system could suffer—unless we use a circuit breaker.&lt;/p&gt;

&lt;p&gt;Why Availability Matters in Microservices ⏰&lt;/p&gt;

&lt;p&gt;Availability is critical in microservices because downtime can add up quickly. Let’s say Mercantile Finance promises 99.999% uptime (a.k.a. "five nines"). Here’s how that translates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Calculation:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;24 hours/day × 365 days/year = 8,760 hours/year.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;8,760 hours × 60 = 525,600 minutes/year.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;99.999% uptime allows 0.001% downtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;525,600 × 0.001% = 5.256 minutes of downtime per year.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a monolithic system, 5.25 minutes of downtime is manageable. But in a microservices architecture with, say, 100 services, that’s 8.78 hours of downtime per year if each service fails independently. 😱 This is why protecting services with patterns like the Circuit Breaker is essential.&lt;/p&gt;

&lt;p&gt;What Causes Services to Break? &lt;/p&gt;

&lt;p&gt;Let’s explore two common failure scenarios in microservices and how they can cripple your system, using  diagrams for clarity.&lt;/p&gt;

&lt;p&gt;Use Case 1: Thread Starvation&lt;/p&gt;

&lt;p&gt;Imagine a web server handling requests for five services. When a request arrives, the server allocates a thread to call the service. If one service is slow or fails, threads wait, tying up resources. For a high-demand service, more threads are allocated, leading to a queue of blocked requests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fct63290twm930zmz1nnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fct63290twm930zmz1nnz.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
 diagram showing threads waiting for a slow service, causing a queue buildup.&lt;/p&gt;

&lt;p&gt;If most threads are occupied by the failing service, incoming requests queue up, overwhelming the system. Even if the service recovers, the queued requests flood it, potentially causing another failure.&lt;/p&gt;

&lt;p&gt;Use Case 2: Cascading Failures&lt;/p&gt;

&lt;p&gt;Consider a chain of services: A → B → C → D. If Service D fails to respond, the failure propagates up the chain, causing a cascading failure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdzq5nx649qtbtum2o5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdzq5nx649qtbtum2o5s.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;diagram showing Service D’s failure causing Services C, B, and A to wait, leading to a cascading failure.&lt;/p&gt;

&lt;p&gt;These scenarios highlight why we need a mechanism to detect and isolate failures quickly.&lt;/p&gt;

&lt;p&gt;How the Circuit Breaker Pattern Saves the Day &lt;/p&gt;

&lt;p&gt;The Circuit Breaker Pattern wraps service calls in a circuit breaker object that monitors for failures. It has three states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Closed: Normal operation; requests pass through to the service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open: Too many failures detected; requests are blocked and return errors immediately.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Half-Open: After a timeout, a few test requests are allowed. If they succeed, the circuit returns to Closed; if they fail, it stays Open&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg14rz87rsqn90pa0a3r0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg14rz87rsqn90pa0a3r0.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
diagram showing the Circuit Breaker’s state transitions (Closed, Open, Half-Open).&lt;/p&gt;

&lt;p&gt;In our employee management system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Suppose Service A (personal information) should respond within 200ms.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0–100ms: Normal operation.&lt;/li&gt;
&lt;li&gt;100–200ms: Risky, but acceptable.&lt;/li&gt;
&lt;li&gt; &amp;gt;200ms: Failure; the circuit breaker trips.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;If 75% of requests exceed 150ms, the circuit breaker detects a slow service.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;If requests exceed 200ms, the proxy marks Service A as unresponsive and trips the circuit to Open.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Requests to Service A fail immediately with an error, preventing resource exhaustion.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;In the background, the circuit breaker sends periodic ping requests to check if Service A recovers.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;If response times return to normal, the circuit moves to Half-Open, allowing limited test requests. If successful, it resets to Closed.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Why Not Just Call the Service Directly? &lt;/p&gt;

&lt;p&gt;You might wonder, "Why not let requests hit the failing service and timeout naturally?" Here’s why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If each request waits for a 30-second timeout, all incoming requests queue up, consuming resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Circuit Breaker Pattern avoids this by failing fast when a service exceeds its failure threshold, returning an error to the consumer immediately.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This prevents queues from forming and ensures the system remains responsive.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When Service A recovers, the circuit breaker reopens traffic, serving new requests without processing a backlog. This approach sacrifices a few requests to save the entire system from crashing.&lt;/p&gt;

&lt;p&gt;Why Failing Fast is Better for Users&lt;/p&gt;

&lt;p&gt;From a user’s perspective, waiting ages for a response is frustrating. The Circuit Breaker Pattern prioritizes a quick response—even if it’s an error—over keeping users hanging. By isolating failures, it prevents cascading issues and ensures the system recovers quickly.&lt;/p&gt;

&lt;p&gt;Wrapping Up &lt;/p&gt;

&lt;p&gt;The Circuit Breaker Pattern is a lifesaver in distributed systems, especially for microservices architectures. By monitoring service health, failing fast, and preventing resource exhaustion, it keeps your system resilient and your users happy.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>python</category>
    </item>
    <item>
      <title>DNS System Design: The Backbone of the Internet day 51</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Mon, 15 Sep 2025 07:29:06 +0000</pubDate>
      <link>https://forem.com/vincenttommi/dns-system-design-the-backbone-of-the-internet-2gie</link>
      <guid>https://forem.com/vincenttommi/dns-system-design-the-backbone-of-the-internet-2gie</guid>
      <description>&lt;p&gt;The Domain Name System (DNS) is one of the most critical components of internet infrastructure. It serves as a hierarchical and distributed naming system that translates human-readable domain names into machine-readable IP addresses. Without DNS, we’d all be typing long, hard-to-remember IPs instead of simple domain names like example.com.&lt;/p&gt;

&lt;p&gt;But DNS isn’t just a convenience—it’s also a scalable, fault-tolerant, and decentralized system that enables the internet to function reliably at a global scale.&lt;/p&gt;

&lt;p&gt;How DNS Works&lt;br&gt;
When you type a URL into your browser, your device needs to resolve the domain name into an IP address. This resolution process involves multiple layers of DNS servers:&lt;/p&gt;

&lt;p&gt;DNS Resolver – Usually provided by your ISP or third-party services like Cloudflare 1.1.1.1 or Google 8.8.8.8.&lt;/p&gt;

&lt;p&gt;Root Name Servers – The starting point of the DNS hierarchy, directing queries to the correct Top-Level Domain (TLD) servers.&lt;/p&gt;

&lt;p&gt;TLD Name Servers – Responsible for domains like .com, .org, .net, etc.&lt;/p&gt;

&lt;p&gt;Authoritative Name Servers – The final authority that holds the actual IP address mapping for the requested domain.&lt;/p&gt;

&lt;p&gt;Example flow:&lt;/p&gt;

&lt;p&gt;Browser asks resolver for example.com.&lt;/p&gt;

&lt;p&gt;If not cached, the resolver queries root servers.&lt;/p&gt;

&lt;p&gt;Root servers point to .com TLD servers.&lt;/p&gt;

&lt;p&gt;TLD servers point to the authoritative server for example.com.&lt;/p&gt;

&lt;p&gt;The authoritative server provides the definitive IP, which is then cached for future use.&lt;/p&gt;

&lt;p&gt;This multi-step, recursive query process ensures speed, reliability, and decentralization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslfryutbpc9sl7hjfop5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslfryutbpc9sl7hjfop5.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DNS Hierarchy &amp;amp; Distribution&lt;/p&gt;

&lt;p&gt;The DNS hierarchy relies on a distributed architecture:&lt;/p&gt;

&lt;p&gt;Root Servers – 13 logical root servers exist, managed by different organizations. But thanks to Anycast routing, thousands of physical root servers are deployed worldwide to ensure speed and fault tolerance.&lt;/p&gt;

&lt;p&gt;TLD Servers – Handle top-level domains like .com, .org, .io.&lt;/p&gt;

&lt;p&gt;Authoritative Servers – Store and serve the actual domain records.&lt;/p&gt;

&lt;p&gt;This distribution makes DNS highly available. Even if one server fails, others can seamlessly handle queries.&lt;/p&gt;

&lt;p&gt;Advanced DNS Functionalities in System Design&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;DNS isn’t just about mapping names to IPs. It supports several advanced system design functionalities:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Load Balancing – A single domain can map to multiple IP addresses, distributing traffic across servers for better performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Failover &amp;amp; Redundancy – If a primary server is down, DNS can reroute traffic to backup resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching – Responses are cached at multiple levels (browser, OS, resolver), reducing latency and network load.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security with DNSSEC – Prevents spoofing and man-in-the-middle attacks by validating DNS responses with cryptographic signatures.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Best Practices for DNS in System Design&lt;/p&gt;

&lt;p&gt;When designing scalable systems, DNS management is a key consideration. Some best practices include:&lt;/p&gt;

&lt;p&gt;Adjusting TTL before updates – Lower TTLs before planned changes to ensure faster propagation.&lt;/p&gt;

&lt;p&gt;Graceful transitions – Keep old servers online temporarily to handle stale records still cached by resolvers.&lt;/p&gt;

&lt;p&gt;Scalability mindset – DNS already handles ~70 billion queries daily and is designed to scale horizontally.&lt;/p&gt;

&lt;p&gt;Hierarchical naming – Use structured naming for better administration and efficient performance.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;/p&gt;

&lt;p&gt;DNS may seem invisible to most users, but it’s the backbone of the internet. From resolving billions of daily queries to enabling load balancing, failover, and security, DNS is one of the most important distributed systems ever designed.&lt;/p&gt;

&lt;p&gt;For system designers, understanding and leveraging DNS is essential for building resilient, scalable, and secure architectures.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Service Discovery: The Backbone of Modern Distributed Systems day 50 of system design</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Sat, 13 Sep 2025 11:31:15 +0000</pubDate>
      <link>https://forem.com/vincenttommi/service-discovery-the-backbone-of-modern-distributed-systems-ld7</link>
      <guid>https://forem.com/vincenttommi/service-discovery-the-backbone-of-modern-distributed-systems-ld7</guid>
      <description>&lt;p&gt;Service Discovery: The Backbone of Modern Distributed Systems&lt;/p&gt;

&lt;p&gt;Back when applications ran on a single server, life was simple. Today’s modern applications are far more complex, consisting of dozens or even hundreds of services, each with multiple instances that scale up and down dynamically. This complexity makes it challenging for services to efficiently find and communicate with each other across networks. That’s where Service Discovery comes into play.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore what service discovery is, why it’s critical, how it works, the different types (client-side and server-side discovery), and best practices for implementing it effectively.&lt;/p&gt;

&lt;p&gt;What is Service Discovery?&lt;/p&gt;

&lt;p&gt;Service discovery is a mechanism that enables services in a distributed system to dynamically find and communicate with each other. It abstracts the complexity of service locations, allowing services to interact without needing to know each other’s exact network addresses.&lt;/p&gt;

&lt;p&gt;At its core, service discovery relies on a service registry, a centralized database that acts as a single source of truth for all services. This registry stores essential information about each service, enabling seamless querying and communication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ls1if7eki21tu656jmk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ls1if7eki21tu656jmk.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A service registry stores details of all services, acting as a central hub for discovery.&lt;/p&gt;

&lt;p&gt;What Does a Service Registry Store?&lt;/p&gt;

&lt;p&gt;A typical service registry record includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Basic Details: Service name, IP address, port, and status.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Metadata: Version, environment, region, tags, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Health Information: Health status and last health check.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Load Balancing Info: Weights and priorities.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Secure Communication: Protocols and certificates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This abstraction is vital in dynamic environments where services are frequently added, removed, or scaled.&lt;/p&gt;

&lt;p&gt;Why is Service Discovery Important?&lt;/p&gt;

&lt;p&gt;Imagine a massive system like Netflix, with hundreds of microservices working together. Hardcoding service locations isn’t feasible—when a service moves or scales, it could break the entire system. Service discovery addresses this by enabling dynamic and reliable service location and communication.&lt;/p&gt;

&lt;p&gt;Key Benefits of Service Discovery&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Reduced Manual Configuration: Services automatically discover and connect, eliminating the need for hardcoding network locations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved Scalability: Service discovery adapts to changing environments as services scale up or down.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fault Tolerance: Integrated health checks allow systems to reroute traffic away from failing instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simplified Management: A central registry simplifies monitoring, management, and troubleshooting.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Service Registration Options&lt;/p&gt;

&lt;p&gt;Service registration is the process by which a service announces its availability to the service registry, making it discoverable. The method of registration depends on the architecture, tools, and deployment environment. Here are the most common approaches:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37zmju5jvlru8jb5k2o5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37zmju5jvlru8jb5k2o5.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Caption: Different approaches to service registration, from manual to orchestrator-based&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Manual Registration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In manual registration, developers or operators manually add service details to the registry. While simple, this approach is impractical for dynamic systems where services frequently scale or move.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Self-Registration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In self-registration, services register themselves with the registry upon startup. The service includes logic to send its network details (e.g., IP address and port) to the registry via API calls (e.g., HTTP or gRPC). Services may also send periodic heartbeat signals to confirm their health and availability.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Third-Party Registration (Sidecar Pattern)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In third-party registration, an external agent or "sidecar" process handles registration. The sidecar runs alongside the service (e.g., in the same container) and registers the service’s details with the registry on its behalf.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Automatic Registration by Orchestrators&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In orchestrated environments like Kubernetes, service registration is automatic. The orchestrator manages the service lifecycle, assigning IP addresses and ports and updating the registry as services start, stop, or scale. For example, Kubernetes uses its built-in DNS for service discovery.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Configuration Management Systems&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tools like Chef, Puppet, or Ansible can manage service lifecycles and update the registry when services are added or removed.&lt;/p&gt;

&lt;p&gt;Types of Service Discovery&lt;/p&gt;

&lt;p&gt;Service discovery can be broadly categorized into two models: client-side discovery and server-side discovery.&lt;/p&gt;

&lt;p&gt;Client-Side Discovery&lt;/p&gt;

&lt;p&gt;In client-side discovery, the client (e.g., a microservice or API gateway) is responsible for querying the service registry and routing requests to the appropriate service instance.&lt;/p&gt;

&lt;p&gt;How It Works&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Service Registration: Services (e.g., UserService, PaymentService) register their network details (IP address, port) and metadata with the service registry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Client Queries the Registry: The client queries the registry to retrieve a list of available instances for a target service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Client Routes the Request: The client selects an instance (e.g., using a load balancing algorithm) and connects directly to it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbph6pa3ywqq61phvsdhs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbph6pa3ywqq61phvsdhs.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example Workflow&lt;/p&gt;

&lt;p&gt;Consider a food delivery app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The PaymentService has three instances running on different servers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The OrderService queries the registry for PaymentService instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The registry returns a list of instances (e.g., IP1:Port1, IP2:Port2, IP3:Port3).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The OrderService selects an instance (e.g., IP1:Port1) and sends the payment request.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Advantages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Simple to implement and understand.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reduces load on central infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disadvantages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Clients must implement discovery logic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Changes in the registry protocol require client updates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example Tool: Netflix’s Eureka is a popular choice for client-side discovery.&lt;/p&gt;

&lt;p&gt;Server-Side Discovery&lt;/p&gt;

&lt;p&gt;In server-side discovery, the client delegates discovery and routing to a centralized server, such as a load balancer or API gateway. The client doesn’t interact with the registry or handle load balancing.&lt;/p&gt;

&lt;p&gt;How It Works&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Service Registration: Services register with the service registry, as in client-side discovery.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Client Sends Request: The client sends a request to a load balancer or API gateway, specifying the target service (e.g., payment-service).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Server Queries the Registry: The load balancer queries the registry to retrieve available service instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Routing: The load balancer selects an instance (based on load, proximity, or health) and routes the request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Response: The service processes the request and responds via the load balancer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftt45n3rdehmm84nscnqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftt45n3rdehmm84nscnqp.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Caption: In server-side discovery, a load balancer handles registry queries and request routing.&lt;/p&gt;

&lt;p&gt;Example Workflow&lt;/p&gt;

&lt;p&gt;For an e-commerce platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The PaymentService registers two instances: IP1:8080 and IP2:8081.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The OrderService sends a request to the load balancer, specifying PaymentService.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The load balancer queries the registry, selects an instance (e.g., IP1:8080), and routes the request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The PaymentService processes the request and responds via the load balancer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Advantages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Centralizes discovery logic, reducing client complexity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easier to manage and update discovery protocols.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disadvantages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Introduces an additional network hop.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The load balancer can become a single point of failure.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example Tool: AWS Elastic Load Balancer (ELB) integrates with AWS’s service registry for server-side discovery.&lt;/p&gt;

&lt;p&gt;Best Practices for Implementing Service Discovery&lt;/p&gt;

&lt;p&gt;To ensure a robust service discovery system, follow these best practices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Choose the Right Model: Use client-side discovery for custom load balancing or server-side discovery for centralized routing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure High Availability: Deploy multiple registry instances and test failover scenarios to prevent downtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automate Registration: Use self-registration, sidecars, or orchestration tools for dynamic environments. Ensure stale services are deregistered.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Health Checks: Monitor service health and automatically remove failing instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Follow Naming Conventions: Use clear, unique service names with versioning (e.g., payment-service-v1) to avoid conflicts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching: Implement caching to reduce registry load and improve performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scalability: Ensure the discovery system can handle service growth.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conclusion&lt;/p&gt;

&lt;p&gt;Service discovery may not be the flashiest part of a distributed system, but it’s a critical component. Think of it as the address book for your microservices architecture. Without it, scaling and maintaining distributed systems would be chaotic. By enabling seamless communication and coordination, service discovery ensures that complex applications run reliably and efficiently.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>systemdesign</category>
      <category>ai</category>
    </item>
    <item>
      <title>What Is the Gossip Protocol? day 49 of system design</title>
      <dc:creator>Vincent Tommi</dc:creator>
      <pubDate>Wed, 10 Sep 2025 06:51:13 +0000</pubDate>
      <link>https://forem.com/vincenttommi/what-is-the-gossip-protocol-3g12</link>
      <guid>https://forem.com/vincenttommi/what-is-the-gossip-protocol-3g12</guid>
      <description>&lt;p&gt;In distributed systems, two common challenges arise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Maintaining system state (e.g., knowing whether nodes are alive)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enabling communication between nodes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are two broad approaches to solving these problems:&lt;/p&gt;

&lt;p&gt;1.Centralized State Management – e.g., Apache ZooKeeper. Provides strong consistency but suffers from scalability bottlenecks and single points of failure.&lt;br&gt;
Gossip Protocol Basics&lt;/p&gt;

&lt;p&gt;The gossip protocol (a.k.a. epidemic protocol) spreads information in a distributed system the same way rumors spread among people.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Each node periodically shares information with a random subset of peers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Over time, messages reach all nodes with high probability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Works best for large, fault-tolerant, decentralized systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Cluster membership management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Failure detection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consensus and metadata exchange&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Application-level data piggybacking&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Peer-to-Peer State Management – highly available, eventually consistent, and scalable. This is where gossip protocols shine.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Broadcast Protocols Compared&lt;/p&gt;

&lt;p&gt;1.Point-to-Point Broadcast – Reliable with retries and deduplication, but fails if sender and receiver crash simultaneously.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Eager Reliable Broadcast – Nodes re-broadcast messages to all others, improving fault tolerance but causing O(n²) message overhead.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;3.Gossip Protocol – Decentralized, efficient, and resilient. Messages eventually reach the entire system.&lt;/p&gt;

&lt;p&gt;Types of Gossip Protocols&lt;/p&gt;

&lt;p&gt;1.Anti-Entropy – Synchronizes replicas by comparing and patching differences (may use checksums or Merkle trees to save bandwidth).&lt;/p&gt;

&lt;p&gt;2.Rumor-Mongering – Spreads only the latest updates quickly; messages are retired after a few rounds.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Aggregation – Computes system-wide values (e.g., averages, sums) by exchanging partial results.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Gossip Communication Strategies&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Push – A node sends updates to random peers (best for few updates).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pull – A node requests updates from peers (best when many updates exist).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Push-Pull – Combines both, achieving faster convergence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance Characteristics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fanout = number of peers contacted per round.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cycle = number of rounds to spread a message across the cluster.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example: ~15 gossip rounds spread a message to 25,000 nodes.&lt;/p&gt;

&lt;p&gt;Performance metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Residue – nodes that didn’t receive the message&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Traffic – number of exchanged messages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convergence – how fast all nodes get the update&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Time Average &amp;amp; Time Last – average and worst-case delivery times&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Properties of Gossip Protocols&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Random peer selection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Local knowledge only&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Periodic pairwise communication&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bounded message sizes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Same protocol across nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Resilient to unreliable networks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decentralized and symmetric&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How Gossip Works (Algorithm Overview)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Each node keeps a membership list with metadata.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Periodically, a node gossips with a random peer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nodes merge metadata, keeping the highest version numbers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A heartbeat counter detects node liveness.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional implementation details include seed nodes, version numbers, generation clocks, and digest messages for synchronization.&lt;/p&gt;

&lt;p&gt;Real-World Use Cases&lt;/p&gt;

&lt;p&gt;Gossip protocols are widely used in modern distributed systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Databases: Cassandra, CockroachDB, Riak, Redis Cluster, Dynamo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Service discovery: Consul&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Blockchains: Hyperledger Fabric, Bitcoin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cloud storage: Amazon S3&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other systems: Failure detection, leader election, load tracking&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Advantages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Scalable – convergence in logarithmic time&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fault tolerant – resilient to crashes, partitions, and message loss&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Robust – node failures don’t disrupt the system&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convergent consistency – state spreads quickly&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decentralized – no single point of failure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple – easy to implement with little code&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bounded load – predictable and low overhead&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disadvantages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Eventually consistent – updates spread probabilistically&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Partition unawareness – subclusters gossip independently during network splits&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bandwidth usage – possible duplicate retransmissions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Latency – tied to gossip intervals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hard to debug – non-determinism complicates testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scalability limits – membership tracking can be costly&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Vulnerable to malicious nodes – unless verified&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Summary&lt;/p&gt;

&lt;p&gt;The gossip protocol is a lightweight, resilient, and scalable communication technique inspired by how rumors spread.&lt;/p&gt;

&lt;p&gt;It has become the backbone of large-scale distributed systems like Amazon Dynamo, Cassandra, and Bitcoin, enabling failure detection, replication, metadata exchange, and consensus.&lt;/p&gt;

&lt;p&gt;Simply put: Gossiping in distributed systems is a boon, while gossiping in real life might be a curse.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>softwaredevelopment</category>
    </item>
  </channel>
</rss>
