<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Faisal Dilawar</title>
    <description>The latest articles on Forem by Faisal Dilawar (@mfdilawar).</description>
    <link>https://forem.com/mfdilawar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2913958%2F5afc6bda-60b2-4182-a749-6ea5118310cf.jpg</url>
      <title>Forem: Faisal Dilawar</title>
      <link>https://forem.com/mfdilawar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mfdilawar"/>
    <language>en</language>
    <item>
      <title>Data Security Fundamentals: A Developer's Guide from Principles to Production</title>
      <dc:creator>Faisal Dilawar</dc:creator>
      <pubDate>Thu, 09 Apr 2026 03:19:24 +0000</pubDate>
      <link>https://forem.com/mfdilawar/data-security-fundamentals-a-developers-guide-from-principles-to-production-363e</link>
      <guid>https://forem.com/mfdilawar/data-security-fundamentals-a-developers-guide-from-principles-to-production-363e</guid>
      <description>&lt;h2&gt;
  
  
  The Grim Reality
&lt;/h2&gt;

&lt;p&gt;Let's start with the uncomfortable truth: data breaches aren't theoretical risks that happen to "other people or companies". They're devastating realities that have destroyed everything that comes their way : businesses, money, user trust. Here are four cautionary tales every developer should know.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sony Pictures (2007)&lt;/strong&gt;: The Plain Text Disaster&lt;br&gt;
&lt;strong&gt;Sony Pictures&lt;/strong&gt; stored passwords and private encryption keys in plain text files and spreadsheets. Yup! When attackers gained access, they didn't need to crack anything, just open a CSV file.&lt;br&gt;
&lt;strong&gt;The damage:&lt;/strong&gt; Massive data exposure, embarrassing internal emails leaked publicly, and a security reputation that took years to rebuild. Estimated at over $100 million in remediation, legal fees, and lost business.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Heartbleed (2014)&lt;/strong&gt;: The Tiny Bug with Massive Impact&lt;br&gt;
A minor coding error in the OpenSSL encryption library—&lt;strong&gt;just a missing bounds check&lt;/strong&gt;—allowed attackers to read server memory. This meant they could extract encryption keys, passwords, and sensitive data from millions of servers worldwide.&lt;br&gt;
&lt;strong&gt;The damage:&lt;/strong&gt; Affected approximately 17% of all secure web servers globally (around 500,000 servers). The bug had existed for two years before discovery, meaning countless credentials and keys were potentially compromised. Companies spent millions patching systems, rotating certificates, and forcing password resets. The reputational damage to OpenSSL and affected organizations was immeasurable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Code Spaces (2014)&lt;/strong&gt;: The Single Point of Failure&lt;br&gt;
&lt;strong&gt;Code Spaces&lt;/strong&gt;, a source code hosting company, stored everything—including their encryption keys—with a single cloud service provider. When an attacker gained access to their AWS console, they had complete control. The attacker deleted backups, destroyed data, and held the company hostage.&lt;br&gt;
&lt;strong&gt;The damage:&lt;/strong&gt; Code Spaces shut down permanently. The company couldn't recover. Their customers lost access to their repositories. Years of business building, gone in hours. This wasn't just a security failure; it was a business extinction event.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Equifax (2017)&lt;/strong&gt;: The Unpatched Vulnerability&lt;br&gt;
&lt;strong&gt;Equifax&lt;/strong&gt; failed to encrypt personal information for 147 million people and didn't patch a known software vulnerability in their database for months after the fix was available. Attackers exploited this gap and walked away with Social Security numbers, birth dates, addresses, and driver's license numbers.&lt;br&gt;
&lt;strong&gt;The damage:&lt;/strong&gt; The breach cost Equifax over $1.4 billion in remediation and settlements. Their CEO resigned. The company's stock plummeted. But the real victims were the 147 million people whose personal information—data that can't be changed like a password—was permanently compromised. Identity theft risks that will follow them for life.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters to You
&lt;/h2&gt;

&lt;p&gt;If you're reading this thinking "but it didn't happened to me" you're missing the point. These were major corporations with security budgets and dedicated InfoSec teams. They failed because somewhere in the chain, developers made architectural decisions that created vulnerabilities.&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable truth: &lt;strong&gt;Security isn't just for the InfoSec team.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As developers, we handle the actual data path—the flow, storage, and transformation of sensitive information. We build the doors. Every API endpoint, database connection, and file system interaction is a door we create. We're responsible for securing them properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defense in Depth Starts Here
&lt;/h3&gt;

&lt;p&gt;Layered security begins with our code. Network controls and firewalls are important, but they're not enough if our implementation is weak. If an attacker bypasses authentication and reaches your database, what's protecting the data? If someone gains access to your server, are your encryption keys sitting in environment variables, easily readable?&lt;/p&gt;

&lt;p&gt;The breaches above happened because someone, somewhere, made a decision:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Let's just put the keys in a spreadsheet for now"&lt;/li&gt;
&lt;li&gt;"We'll patch that vulnerability next sprint"&lt;/li&gt;
&lt;li&gt;"One cloud provider is fine, They are the best"&lt;/li&gt;
&lt;li&gt;"Encryption is too complex, we'll add it later"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those decisions had consequences. Your decisions will too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Basics: Key Terms
&lt;/h2&gt;

&lt;p&gt;Before we dive into security strategies, let's establish a common vocabulary. These terms get thrown around interchangeably, but the distinctions matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Encryption vs. Encoding
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggh6wmtp4rn1mbik1m85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggh6wmtp4rn1mbik1m85.png" alt="Encryption vs Encoding" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Encryption vs Encoding&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Encryption&lt;/strong&gt; is hiding data to prevent unauthorized access. It's like placing your data behind a strong lock that requires a specific key to open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Encoding&lt;/strong&gt; is converting data from one format to another for system compatibility. It's transformation, not protection—anyone can decode it. e.g. Base64 encoding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Encryption at Rest vs. In Transit
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;In Transit:&lt;/strong&gt; Data moving over networks between systems. This is protected by TLS/SSL protocols during transmission—your HTTPS connections, API calls between services, database connections over the network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At Rest:&lt;/strong&gt; Data sitting on disk, in databases, or backup storage. This requires encryption as the final defense line—your database tables, log files, backups, cached data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why both matter:&lt;/strong&gt; TLS protects data while it's moving, but once it reaches the server and gets written to disk, that protection ends. If an attacker bypasses authentication and gains access to your database files or backups.&lt;br&gt;
Network controls like firewalls aren't enough. If an attacker gets through, encryption is ast line of defense for your users' data.&lt;/p&gt;
&lt;h2&gt;
  
  
  The 5 Levels of Encryption Security Maturity
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo26kzp23jms7rf45zioe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo26kzp23jms7rf45zioe.png" alt="Security Maturity Levels Pyramid" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Security Maturity Levels Pyramid&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Not all data requires the same level of protection, and not all organizations have the same operational capacity. Security is a spectrum, and understanding where you fall—and where you &lt;em&gt;should&lt;/em&gt; fall—is critical.&lt;/p&gt;

&lt;p&gt;Here's a broad classification of data security level progressing from highly insecure to advanced security postures:&lt;/p&gt;
&lt;h3&gt;
  
  
  Level 1: Hardcoded Keys
&lt;/h3&gt;

&lt;p&gt;Keys embedded directly in source code. Highly insecure—anyone with code access has the keys.&lt;br&gt;
&lt;strong&gt;When this might be acceptable:&lt;/strong&gt; Temporary files, non-sensitive development data, throwaway prototypes that will never see production. Even then, it's risky.&lt;/p&gt;
&lt;h3&gt;
  
  
  Level 2: Environment Variables
&lt;/h3&gt;

&lt;p&gt;Keys stored on-host in environment variables. Better than hardcoding, but still accessible to anyone with server access.&lt;br&gt;
&lt;strong&gt;When this might be acceptable:&lt;/strong&gt; Internal tools with limited access, development environments, low-sensitivity data where the risk of exposure is minimal and the impact is contained.&lt;/p&gt;
&lt;h3&gt;
  
  
  Level 3: Secrets Management
&lt;/h3&gt;

&lt;p&gt;Centralized systems like HashiCorp Vault or AWS Secrets Manager. Keys stored separately, access controlled, audit trails maintained.&lt;br&gt;
&lt;strong&gt;When this is necessary:&lt;/strong&gt; Any user PII (personally identifiable information), business-critical data, anything subject to regulatory compliance (GDPR, HIPAA, PCI-DSS). &lt;strong&gt;This is the minimum acceptable baseline for sensitive data.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Level 4: Envelope Encryption
&lt;/h3&gt;

&lt;p&gt;Data encrypted with data keys (DEKs), which are themselves encrypted by master keys (KEKs). Limits blast radius of key compromise.&lt;br&gt;
&lt;strong&gt;When this is necessary:&lt;/strong&gt; Financial services, healthcare records, highly regulated industries, any scenario where a single key compromise could expose massive amounts of sensitive data. Banking and fintech typically operate here.&lt;/p&gt;
&lt;h3&gt;
  
  
  Level 5: Zero-Trust Dynamic Keys
&lt;/h3&gt;

&lt;p&gt;Keys rotated automatically, short-lived credentials, assume breach mindset. Most secure but operationally complex.&lt;br&gt;
&lt;strong&gt;When this is necessary:&lt;/strong&gt; Government systems, defense contractors, cryptocurrency platforms, any system where the data is so sensitive that you must assume attackers are already inside your perimeter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; Moving up this ladder increases security but also increases operational complexity and cost. The requirement here is to match your security level to your actual risk profile, not over-engineering for trivial data or under-securing critical information.&lt;/p&gt;
&lt;h2&gt;
  
  
  Choosing Your Approach: It's Not One-Size-Fits-All
&lt;/h2&gt;

&lt;p&gt;The answer to "which security level should I use?" is always: &lt;strong&gt;"It depends."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Security requirements vary based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data sensitivity:&lt;/strong&gt; Is this public information, internal data, or deeply personal user data?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory compliance:&lt;/strong&gt; Are you subject to GDPR, HIPAA, PCI-DSS, or other regulations?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Threat model:&lt;/strong&gt; Who are your adversaries? Random hackers, Organized crime, nation-states?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational constraints:&lt;/strong&gt; What's your team's capacity? What's your budget? What's your scale?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Key Compromise: When, Not If
&lt;/h3&gt;

&lt;p&gt;Here's the hard truth about key compromise: it's not a theoretical, it's a reality. It doesn't just happen to "others". Being prepared isn't optional. IT IS MANDATORY.&lt;br&gt;
Your security analysis and setup must account for both the likelihood and the impact of compromise. Design systems that minimize damage even when keys are exposed.&lt;/p&gt;
&lt;h3&gt;
  
  
  Beyond the Single Strong Wall
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufo08t38ctnnh0ygckiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufo08t38ctnnh0ygckiu.png" alt="Multi-layered Security" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Castle Defense (Multi-layered Security)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Effective security isn't a single strong wall. It's should be a multi-layered mechanism requiring deep architectural thinking and continuous vigilance.&lt;/p&gt;

&lt;p&gt;Think of medieval castle defenses: they didn't just build one massive wall and call it secure. They built multiple walls, each protecting the next. They added moats, drawbridges, gates, towers, and inner keeps. Breaching one layer didn't compromise the whole castle. More importantly, they had plan of what to do when a breach happened.&lt;/p&gt;

&lt;p&gt;Modern security demands the same intricate design. Each layer protects the next, and breaching one doesn't compromise the whole system. This is defense in depth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network firewalls (outer wall)&lt;/li&gt;
&lt;li&gt;Authentication and authorization (the gate)&lt;/li&gt;
&lt;li&gt;Application-level security (inner walls)&lt;/li&gt;
&lt;li&gt;Encryption at rest (the keep where the treasure is stored)&lt;/li&gt;
&lt;li&gt;Key management (the vault within the keep)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If an attacker gets through your firewall, your authentication should stop them. If they bypass authentication, encryption should protect the data. If they somehow get a key, envelope encryption limits what that key can decrypt.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Sample Challenge: Building a Secure Messaging Platform
&lt;/h2&gt;

&lt;p&gt;Now let's move from theory to practice. We're going to walk through a real-world scenario, making decisions and observing its effect.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Problem Statement
&lt;/h3&gt;

&lt;p&gt;You're building a secure messaging platform. Your requirements are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;End-to-End Privacy:&lt;/strong&gt; Protect both message text and file attachments from unauthorized access at rest and in transit. Users trust you with deeply personal conversations—any leak is a total breach of that trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-Effective Storage:&lt;/strong&gt; Leverage AWS S3 for scalable, economical object storage while maintaining security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High Sensitivity:&lt;/strong&gt; Messages are deeply personal. Unlike a data breach of email addresses (bad but recoverable), a breach of private messages can affect personal lives - medical discussions, confidential business negotiations, relationship conversations.&lt;/p&gt;

&lt;p&gt;How do you architect this system?&lt;/p&gt;
&lt;h2&gt;
  
  
  Understanding the Players: Advanced Key Management
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fop6zm46hmbihuubyjyxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fop6zm46hmbihuubyjyxl.png" alt="Key Management Players" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Key Management Players&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Before we solve this problem, we need to understand few things that makes secure encryption at scale possible.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Vault
&lt;/h3&gt;

&lt;p&gt;A centralized key management service (AWS KMS, HashiCorp Vault) that stores and protects your most sensitive cryptographic keys with hardware security. These systems use Hardware Security Modules (HSMs)—specialized, tamper-resistant hardware designed specifically for cryptographic operations.&lt;/p&gt;
&lt;h3&gt;
  
  
  KEK (Key Encryption Key / Master Key)
&lt;/h3&gt;

&lt;p&gt;The Key Encryption Key never leaves the vault. This is your most powerful credential—it encrypts other keys. No application code or user ever reads it. It lives in the HSM, protected by hardware-level security.&lt;/p&gt;
&lt;h3&gt;
  
  
  DEK (Data Encryption Key / Worker Key)
&lt;/h3&gt;

&lt;p&gt;The Data Encryption Key is for single-purpose use. These are short-lived keys that do the actual work of encrypting your application data, then get discarded. Your application uses these, not the master key.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Core Principle: Envelope Encryption
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo36owhlapkhg2bxhevof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo36owhlapkhg2bxhevof.png" alt="Envelope Encryption Flow" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Envelope Encryption Flow&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Envelope encryption ensures your master key (KEK) never touches application servers, dramatically reducing attack surface. Here's how it works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your application requests a DEK from the vault&lt;/li&gt;
&lt;li&gt;The vault generates a random DEK and encrypts it with the KEK&lt;/li&gt;
&lt;li&gt;The vault returns both the plaintext DEK and the encrypted DEK to your application&lt;/li&gt;
&lt;li&gt;Your application uses the plaintext DEK to encrypt data&lt;/li&gt;
&lt;li&gt;Your application stores the encrypted data alongside the encrypted DEK&lt;/li&gt;
&lt;li&gt;Your application immediately wipes the plaintext DEK from memory&lt;/li&gt;
&lt;li&gt;When you need to decrypt, you send the encrypted DEK back to the vault&lt;/li&gt;
&lt;li&gt;The vault decrypts it with the KEK and returns the plaintext DEK&lt;/li&gt;
&lt;li&gt;You decrypt your data and immediately wipe the DEK again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; If an attacker compromises your application server, they can't decrypt old data because they don't have the KEK. They only get access to data encrypted with DEKs they can obtain &lt;em&gt;after&lt;/em&gt; the compromise. Your historical data remains protected.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Data Encryption Lifecycle
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0yuvwkis75a0kso0fnp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0yuvwkis75a0kso0fnp.png" alt="Encryption Flow" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Data Encryption Lifecycle (Encryption Flow)&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Request DEK → Vault generates &amp;amp; encrypts DEK with KEK → Returns plaintext + encrypted DEK → Encrypt data with plaintext DEK → Store encrypted data + encrypted DEK → Wipe plaintext DEK from memory&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqt2ctk8vkfzqu72urut.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqt2ctk8vkfzqu72urut.png" alt="Decryption Flow" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Data Decryption Lifecycle (Decryption Flow)&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Retrieve encrypted data + encrypted DEK → Send encrypted DEK to vault → Vault decrypts with KEK → Returns plaintext DEK → Decrypt data → Wipe plaintext DEK from memory&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Developer responsibility:&lt;/strong&gt; The "wipe" step is critical. You must ensure plaintext keys don't linger in memory, logs, or error messages. A key accidentally logged during an error is a key that's compromised. Memory dumps during crashes can expose keys. Proper key hygiene is non-negotiable.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key Rotation: The Mandatory Refresh Cycle
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2hdi2dhedeeivs7ufst.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2hdi2dhedeeivs7ufst.png" alt="Key Rotation Comparison" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Key Rotation Comparison&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Keys have lifespans. The longer a key exists, the more opportunities an attacker has to compromise it. Rotation limits credential lifespan—if a key is compromised today, rotation ensures it becomes useless tomorrow.&lt;/p&gt;
&lt;h3&gt;
  
  
  KEK Rotation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Handled by:&lt;/strong&gt; Vault infrastructure&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Frequency:&lt;/strong&gt; Annually or on compromise&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; Transparent to applications—the vault handles re-encryption of all DEKs internally.&lt;/p&gt;
&lt;h3&gt;
  
  
  DEK Rotation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Handled by:&lt;/strong&gt; Application code&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Frequency:&lt;/strong&gt; 30-90 days recommended&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; Requires re-encrypting data with new keys, tracking old keys for decryption&lt;/p&gt;

&lt;p&gt;DEK rotation is more complex. You need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate new DEKs&lt;/li&gt;
&lt;li&gt;Re-encrypt data with the new DEKs&lt;/li&gt;
&lt;li&gt;Keep old DEKs available for decrypting data that hasn't been re-encrypted yet&lt;/li&gt;
&lt;li&gt;Track which DEK encrypted which data&lt;/li&gt;
&lt;li&gt;Eventually phase out old DEKs once all data is re-encrypted.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Situation 1: Low Scale Foundation (~1,000 messages/day)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep0r6xa3d0vlpjq1xr12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep0r6xa3d0vlpjq1xr12.png" alt="Situation 1 Architecture (Low Scale)" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Situation 1 Architecture (Low Scale)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You're just launching. You have about 1,000 messages per day. How do you architect encryption?&lt;/p&gt;
&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Minimize blast radius—each compromised key should expose minimal data. If an attacker gets one key, you want them to decrypt as few messages as possible.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;Generate a unique DEK per message. Store the encrypted DEK in S3 metadata alongside the encrypted content.&lt;/p&gt;

&lt;p&gt;Here's the flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User sends a message&lt;/li&gt;
&lt;li&gt;Your application requests a DEK from the vault&lt;/li&gt;
&lt;li&gt;Encrypt the message with the DEK&lt;/li&gt;
&lt;li&gt;Encrypt the DEK with the KEK (vault does this)&lt;/li&gt;
&lt;li&gt;Store the encrypted message in S3&lt;/li&gt;
&lt;li&gt;Store the encrypted DEK in the S3 object's metadata&lt;/li&gt;
&lt;li&gt;Wipe the plaintext DEK from memory&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; If a single DEK is compromised, only one message is exposed. The blast radius is minimal.&lt;/p&gt;
&lt;h3&gt;
  
  
  The New Problem
&lt;/h3&gt;

&lt;p&gt;This works beautifully... until it doesn't.&lt;/p&gt;

&lt;p&gt;Your app goes viral. Suddenly you're at 10,000 messages per day. Then 100,000. Each message requires a vault API call to generate a DEK. Vault services charge per API call.&lt;/p&gt;

&lt;p&gt;At 1,000 messages daily, the cost is negligible—maybe $10/month. But at 100,000 messages per day, you're making 3 million vault API calls per month. Your security bill is now $3,000/month and climbing. And you're hitting API rate limits that throttle your application's performance.&lt;/p&gt;

&lt;p&gt;Your security architecture that was perfect at low scale is now a liability.&lt;/p&gt;
&lt;h2&gt;
  
  
  Situation 2: Scaling the Wall (1,000 requests/second)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptps8ah3qs6c4x09z79x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptps8ah3qs6c4x09z79x.png" alt="Situation 2 Architecture (Scaling)" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Situation 2 Architecture (Scaling)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You're successful. You're now handling 1,000 requests per second. That's 86.4 million messages per day.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;1,000 req/sec creates massive vault bills and API rate limits that throttle performance. The per-message DEK approach is financially and operationally unsustainable.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Solution: The Pragmatism Pivot
&lt;/h3&gt;

&lt;p&gt;Cache a single DEK for 1-hour windows. All messages sent within that hour share one key—dramatically reducing vault calls.&lt;/p&gt;

&lt;p&gt;Instead of 86.4 million vault calls per day, you make 24. Your vault bill drops from $86,000/month to $2/month. Throttling disappears.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the "juice vs. squeeze" decision in action.&lt;/strong&gt; You're trading perfect security (one key per message) for operational feasibility (one key per hour).&lt;/p&gt;
&lt;h3&gt;
  
  
  The New Problem: Blast Radius
&lt;/h3&gt;

&lt;p&gt;Your blast radius just exploded. If a single hourly key is compromised, an attacker can decrypt every message sent during that hour.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before:&lt;/strong&gt; 1 compromised key = 1 message exposed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Now:&lt;/strong&gt; 1 compromised key = 3.6 million messages exposed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fgu98bm4yc18nbjkq9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fgu98bm4yc18nbjkq9u.png" alt="Blast Radius Comparison" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Blast Radius Comparison&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Is this acceptable? It depends on your operational capacity and the kind of data you are working with.&lt;/p&gt;
&lt;h3&gt;
  
  
  Rotation Cost Analysis
&lt;/h3&gt;

&lt;p&gt;Key rotation becomes complex. If you need to rotate a compromised hourly key, you must:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identify every message encrypted with that key&lt;/li&gt;
&lt;li&gt;Re-encrypt 3.6 million messages&lt;/li&gt;
&lt;li&gt;Do this without taking your service offline&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without proper indexing, identifying which S3 objects used which key becomes a nightmare due to inefficiency of S3 metadata search.&lt;/p&gt;

&lt;p&gt;This is where architectural decisions start cascading into other systems.&lt;/p&gt;
&lt;h2&gt;
  
  
  Situation 3: The Searchability Trap (Massive Scale)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31u6dhda73ypiv26wa5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31u6dhda73ypiv26wa5h.png" alt="Situation 3 Architecture (Massive Scale with Mapping)" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: Situation 3 Architecture (Massive Scale with Mapping)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You're now at massive scale. Millions of users, billions of messages. One day, you detect suspicious activity. A DEK might be compromised.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Problem: Incident Response Paralysis
&lt;/h3&gt;

&lt;p&gt;A DEK is compromised, but S3 metadata isn't searchable at scale. How do you quickly identify which files need re-encryption?&lt;/p&gt;

&lt;p&gt;You can't iterate through billions of S3 objects checking metadata. That would take days or weeks. So you can't rotate key as well. During that time, the compromised data remains vulnerable.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Solution: Mapping Infrastructure
&lt;/h3&gt;

&lt;p&gt;Build a database table linking S3 object paths to their DEK identifiers, enabling rapid queries during security incidents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;message_encryption_map
- message_id (primary key)
- s3_object_path
- dek_id
- encrypted_at (timestamp)
- key_rotation_status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when a DEK is compromised, you can query: "Give me all messages encrypted with DEK-12345" and get instant results. You can prioritize re-encryption, track progress, and complete the rotation in hours instead of weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  The New Problem: Database Selection
&lt;/h3&gt;

&lt;p&gt;Which database handles 1,000 writes/sec during rotation without incurring prohibitive I/O costs?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rotation cost:&lt;/strong&gt; High I/O expenses for scanning or bulk-updating mappings across millions of records. You're now spending significant engineering time and infrastructure cost just to maintain the &lt;em&gt;ability&lt;/em&gt; to rotate keys.&lt;/p&gt;

&lt;p&gt;Every DB comes with its own pros and cons. &lt;strong&gt;PostgreSQL:&lt;/strong&gt; Great for complex queries, but write-heavy workloads at this scale get expensive. &lt;strong&gt;DynamoDB:&lt;/strong&gt; Optimized for high-throughput writes, but limited query flexibility. &lt;strong&gt;Cassandra:&lt;/strong&gt; Excellent for write-heavy workloads and horizontal scaling, but operationally complex to manage.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Broader Implications: Advanced Data Management
&lt;/h3&gt;

&lt;p&gt;Notice how a security decision (key rotation requirements) has now forced you to make data architecture decisions. Few examples are as following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Database selection:&lt;/strong&gt; Evaluating PostgreSQL vs. DynamoDB vs. Aurora for different workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leveraging S3:&lt;/strong&gt; Exploring S3 tables for analytics, cold storage, and data lake integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Archiving strategies:&lt;/strong&gt; Designing efficient methods for archiving data from PostgreSQL to S3 while maintaining integrity and accessibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid approaches:&lt;/strong&gt; Considering hybrid data storage solutions to balance performance, cost, and security&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data lifecycle management:&lt;/strong&gt; Implementing processes for cleaning up PostgreSQL records after corresponding object deletions to ensure consistency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Object updates:&lt;/strong&gt; Addressing the complexities of updating encrypted objects and their associated key metadata&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search limitations:&lt;/strong&gt; Strategies for restricted searchability on encrypted data without compromising end-to-end encryption principles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Security isn't isolated from the rest of your architecture. Your encryption strategy ripples through your entire data management approach. This is why security decisions need to be made early and with full awareness of their downstream implications.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Nuclear Option: KEK Compromise
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgb1zb9s5331o82kygsf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgb1zb9s5331o82kygsf.png" alt="KEK Compromise Impact Visualization" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure: KEK Compromise Impact Visualization&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let's talk about the worst-case scenario: your master key (KEK) gets compromised.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters
&lt;/h3&gt;

&lt;p&gt;Remember, the KEK encrypts all your DEKs. If an attacker gets the KEK, they can decrypt every DEK you've ever created. Every message, every file, every piece of encrypted data in your system is now exposed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How This Could Happen
&lt;/h3&gt;

&lt;p&gt;KEKs are stored in hardened vaults with HSM backing, but compromise is still possible due to Insider threat, Vault provider breach, Misconfiguration or even Supply chain attack.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Recovery Process
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Detect the compromise:&lt;/strong&gt; Hopefully through monitoring and audit logs, not through data showing up on the dark web&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate a new KEK:&lt;/strong&gt; The vault creates a fresh master key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-encrypt every DEK:&lt;/strong&gt; Every single DEK in your system must be re-encrypted with the new KEK&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rotate all DEKs:&lt;/strong&gt; Since the old KEK was compromised, you can't trust any DEK it encrypted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-encrypt all data:&lt;/strong&gt; Every message, every file, everything must be re-encrypted with new DEKs&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Cost
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Computational resources:&lt;/strong&gt; Re-encrypting billions of objects requires massive compute. You're spinning up hundreds of workers, running them for days or weeks even months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storage I/O:&lt;/strong&gt; Reading and writing billions of objects generates enormous I/O costs. S3 charges for requests, and you're making billions of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering time:&lt;/strong&gt; Your entire team drops everything to manage this crisis. Weeks or months of productivity lost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downtime:&lt;/strong&gt; Depending on your architecture, you might need to take services offline or operate in degraded mode during re-encryption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business impact:&lt;/strong&gt; Users can't access messages during re-encryption. Customer support is overwhelmed. Trust is shattered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total cost:&lt;/strong&gt; Depending on you scale the direct cost (compute, storage, engineering time) could run in millions. In addition to lost business and reputational damage.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Permanent Damage
&lt;/h3&gt;

&lt;p&gt;Even after spending all this money and effort, the data that was accessed during the compromise is gone. If an attacker extracted messages before you detected the breach, those messages are compromised forever. No amount of money or engineering effort can undo that.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why We Pay for Hardened Vaults
&lt;/h3&gt;

&lt;p&gt;This catastrophic scenario explains why enterprise-grade vaults with HSM backing command premium pricing. The cost of the vault is insurance against the cost of KEK compromise.&lt;/p&gt;

&lt;p&gt;A multi thousand vault bill seems expensive until you compare it to the millions in recovery cost plus permanent reputational damage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strategic Considerations: Your Security Cheat Sheet
&lt;/h2&gt;

&lt;p&gt;After walking through the messaging platform evolution, here are the key principles to guide your security decisions:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prepare for Eventualities
&lt;/h3&gt;

&lt;p&gt;What happens if a key is compromised? What if data is exposed? Do you need recovery capabilities? Plan for worst-case scenarios.&lt;/p&gt;

&lt;p&gt;Don't just have a theoretical incident response plan. Actually test it. Can you execute a key rotation under pressure? Do you have the infrastructure to re-encrypt data quickly? Have you practiced the runbook?&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Define Blast Radius
&lt;/h3&gt;

&lt;p&gt;How much damage is acceptable during a breach?  Limit the scope of potential compromise.&lt;/p&gt;

&lt;p&gt;Design your system so that the attacker needs to work for every piece of data.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Runbooks Are Vital
&lt;/h3&gt;

&lt;p&gt;Avoid "headless chicken" mode during incidents. Document response procedures, rotation steps, and recovery processes.&lt;/p&gt;

&lt;p&gt;Your runbook should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to detect a compromise&lt;/li&gt;
&lt;li&gt;Who to notify and in what order&lt;/li&gt;
&lt;li&gt;Step-by-step rotation procedures&lt;/li&gt;
&lt;li&gt;Scripts and tools for bulk operations&lt;/li&gt;
&lt;li&gt;Communication templates for users&lt;/li&gt;
&lt;li&gt;Post-incident review process&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Test your runbook regularly. &lt;strong&gt;A runbook that's never been executed is just wishful thinking.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Think Like a Thief
&lt;/h3&gt;

&lt;p&gt;Adopt an attacker's perspective. How would you break into your own system? Where are the weak points?&lt;/p&gt;

&lt;p&gt;Conduct threat modeling exercises:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What's the most valuable data in your system?&lt;/li&gt;
&lt;li&gt;What's the easiest way to access it?&lt;/li&gt;
&lt;li&gt;What would you do if you compromised a developer's laptop?&lt;/li&gt;
&lt;li&gt;What if you got access to the production database?&lt;/li&gt;
&lt;li&gt;What if you social-engineered your way into the vault?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Find your vulnerabilities before attackers do.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Pragmatism: Juice vs. Squeeze
&lt;/h3&gt;

&lt;p&gt;Don't over-engineer for non-sensitive data. Don't destroy SLAs with complexity. Don't build unfeasible solutions. Balance security with operational reality. Temp files don't need envelope encryption. User passwords do.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. The Security Baseline
&lt;/h3&gt;

&lt;p&gt;For any sensitive data, start at Level 3 minimum (Centralized Secrets Management). Anything lower requires documented justification.&lt;br&gt;
"It's too complex" isn't a justification. "We don't have time" isn't a justification. "It's too expensive" might be, but you need to quantify the cost of the security measure vs. the cost of a breach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Security as an Ongoing Conversation
&lt;/h2&gt;

&lt;p&gt;You now have the framework to make informed security decisions. You understand the fundamentals, the maturity levels, the trade-offs, and the real-world implications of your choices.&lt;/p&gt;

&lt;p&gt;But here's the final truth: security is never "done."&lt;/p&gt;

&lt;p&gt;Security is an ongoing conversation between architecture and operational reality. The "perfect" system today might be your biggest vulnerability in two years.&lt;/p&gt;

&lt;p&gt;Your job as a developer isn't to achieve perfect security—it's to make informed trade-offs, build defense in depth, plan for compromise, and continuously adapt as your system evolves.&lt;/p&gt;

&lt;p&gt;You own the data path. You build the doors. Lock them well, but know that locks can be picked. Build multiple doors, multiple locks, and have a plan for when someone gets through.&lt;/p&gt;

&lt;p&gt;Make better decisions.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;About the Author: Faisal Dilawar is a Lead Technology Consultant at Technogise with experience building secure, scalable systems.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>encryption</category>
      <category>architecture</category>
      <category>devops</category>
    </item>
    <item>
      <title>Investigating Performance Issues In A Library project</title>
      <dc:creator>Faisal Dilawar</dc:creator>
      <pubDate>Tue, 07 Apr 2026 10:06:47 +0000</pubDate>
      <link>https://forem.com/mfdilawar/investigating-performance-issues-in-a-library-project-27o2</link>
      <guid>https://forem.com/mfdilawar/investigating-performance-issues-in-a-library-project-27o2</guid>
      <description>&lt;p&gt;│ Part 2 of 2 — This piece covers library projects. &lt;a href="https://dev.to/mfdilawar/-investigating-performance-issues-in-an-existing-system-101-7l6"&gt;Part 1&lt;/a&gt; covers deployed applications and services, which come with a different set of constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fundamental Difference
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/mfdilawar/-investigating-performance-issues-in-an-existing-system-101-7l6"&gt;Part 1&lt;/a&gt;, we talked about investigating performance in a deployed system — one where we control the runtime, monitoring and are able to trace requests end to end.&lt;/p&gt;

&lt;p&gt;Libraries are different beasts altogether. We ship code. Someone else runs it.&lt;/p&gt;

&lt;p&gt;We don't control the thread pool size, the hardware, or how many times our function gets called. We don't have dashboards. Usually we don't have logs. And the person filing the bug report often says&lt;br&gt;
something in tune of "your library is slow" — with no reproducible scenario, no profiler output, and no context about how they're&lt;br&gt;
using it.&lt;/p&gt;

&lt;p&gt;This is the library performance problem. And it requires a different mindset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't try to fix it....
&lt;/h2&gt;

&lt;p&gt;Before we go any further I would like to put it out there "If you don't own a library code and have no access to an SME. And on top of that you don't have access to prod data like logs and monitoring then &lt;strong&gt;Don't attempt to fix the performance issues.&lt;/strong&gt; Most probably you will fail in finding and fixing the root cause. &lt;br&gt;
If you are in a pressure situation where you have to fix a bleeding without above tools: This article won't help you. Say a prayer and start debugging things blindly and hopefully you find a band-aid to stop immediate bleeding.&lt;br&gt;
In this article I will mention a few conditions where its better to stop and ask for more details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Most People Go Wrong
&lt;/h2&gt;

&lt;p&gt;Just like part 1, the instinct is to open the codebase and start looking for "obviously slow" things. Maybe there's an allocation in a hot loop. Maybe a regex is being compiled on every call. You find something, fix it, release a patch, and close the issue (you missed saying a prayer in this case).&lt;/p&gt;

&lt;p&gt;Two weeks later, the user says it's still slow.&lt;/p&gt;

&lt;p&gt;What happened? You probably optimized a piece of code that wasn't the bottleneck in their specific usage pattern. Your benchmark showed improvements. Their workload did not.&lt;/p&gt;

&lt;p&gt;The trap is the same as Part 1 — you acted on intuition instead of data. But in a library, the data is harder to get,&lt;br&gt;
which makes the trap easier to fall into.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Prejudice Problem (Library Edition)
&lt;/h2&gt;

&lt;p&gt;The same trap from Part 1 applies here, but with an extra layer: you're tempted to assume the problem is in the client's&lt;br&gt;
code, not yours.&lt;/p&gt;

&lt;p&gt;"They must be calling it wrong." "They're not reusing the object." "Their environment is misconfigured."&lt;/p&gt;

&lt;p&gt;Sometimes that's true. But &lt;strong&gt;start with the assumption that the problem is real and in your library&lt;/strong&gt;. Prove otherwise with&lt;br&gt;
data. &lt;/p&gt;

&lt;h2&gt;
  
  
  Before You Start: The 5 Things You Need (Library Edition)
&lt;/h2&gt;

&lt;p&gt;These are different from Part 1. Some overlap, but the constraints change what's actually achievable.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A clear problem statement from the reporter. "Your library is slow" is not actionable. You need: Which API? What input size? What does slow mean — latency, throughput, memory? Push back until you have specifics. A good problem statement is the foundation of everything that follows. &lt;br&gt;
&lt;strong&gt;If clear problem statement is not available, don't proceed.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A reproducible scenario you control&lt;br&gt;
Unlike Part 1, you probably can't look into someone else's production environment. You need to build the scenario yourself — a&lt;br&gt;
benchmark or test that demonstrates the reported problem under controlled conditions. If you can't reproduce it, you can't&lt;br&gt;
fix it and you can't verify the fix. This is always better than asking the users to basically test your changes and then finding whether the changes have worked or not.&lt;br&gt;
&lt;strong&gt;It's a not a blocker, but is very vital to have confidence in your fix without resorting to gut feeling.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Understanding of your own library's design&lt;br&gt;
This sounds obvious? It isn't. Libraries accumulate complexity. The person investigating may not be the original author.&lt;br&gt;
Know the hot paths — the APIs that get called most frequently, the ones that process large inputs, the ones that are called in loops. These are your candidates. &lt;br&gt;
&lt;strong&gt;Here an SME can be really helpful.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Knowledge of common usage patterns&lt;br&gt;
You don't control how clients uses your library, but you can study it. If possible look at your documentation examples, your issue&lt;br&gt;
tracker, your GitHub discussions. How do people actually call your APIs? What input sizes are typical? What do they call&lt;br&gt;
in loops? This shapes where you look.&lt;br&gt;
&lt;strong&gt;This usually reduces your debug time.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Defined performance targets&lt;br&gt;
Same as Part 1 — "fast" is not a target. Define what acceptable looks like: throughput at a given input size, memory&lt;br&gt;
allocation per operation, latency at P99. Without this, you can't declare that you have achieved your target.&lt;br&gt;
&lt;strong&gt;This will be your goal post.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once you have these, several other things become discoverable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Typical input characteristics&lt;/strong&gt; — size, shape, edge cases. A library that handles 1KB payloads efficiently may fall
apart at 100MB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call frequency patterns&lt;/strong&gt; — is your API called once at startup or thousands of times per second in a hot loop (A heavily executed block of code that repeats rapidly, where even tiny inefficiencies multiply into significant performance bottlenecks.)? The
answer changes what matters. Like Part 1, we don't worry too much about the one call at startup for performance issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime environment assumptions&lt;/strong&gt; — JVM version, GC settings, available memory. You can't control these, but you can
document what you've tested against and what you assume. It also helps if you document known issues with some runtime environments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The First Thing You Build: A Reproducible Benchmark
&lt;/h2&gt;

&lt;p&gt;Before touching any code, build a benchmark that demonstrates the problem like we discussed in pre-requisites.&lt;/p&gt;

&lt;p&gt;This is your equivalent of the reproducible scenario from Part 1 — but in a library context, it's entirely your&lt;br&gt;
responsibility to construct. The reporter won't hand it to you.&lt;/p&gt;

&lt;p&gt;A good benchmark answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which API are we measuring? (e.g. Parser.parse(input))&lt;/li&gt;
&lt;li&gt;With what input? (e.g. a 10MB JSON document was the input)&lt;/li&gt;
&lt;li&gt;Under what call pattern? (e.g. called 1,000 times in a loop)&lt;/li&gt;
&lt;li&gt;What does passing look like? (e.g. throughput &amp;gt; 500 ops/sec)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a proper benchmarking tool — JMH for Java, timeit/pytest-benchmark for Python.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hot Tip&lt;/strong&gt;: Warm up the runtime before measuring. JIT compilers, class loaders and caches all affect early measurements. You would be surprised how skewed your benchmark will be otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Investigation Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 — Again Reproduce First, Theorize Later
&lt;/h3&gt;

&lt;p&gt;Run your benchmark. Confirm the problem exists under controlled conditions.&lt;/p&gt;

&lt;p&gt;If you can't reproduce it, you have three options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go back to the reporter and get more detail about their environment and usage pattern&lt;/li&gt;
&lt;li&gt;Expand your benchmark to cover more scenarios until you find the one that triggers it&lt;/li&gt;
&lt;li&gt;Don't proceed with optimization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not skip this step. Do not start reading code looking for problems until you have a benchmark that shows the problem.&lt;br&gt;
Otherwise you're optimizing in the dark.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Profile, Don't Guess
&lt;/h3&gt;

&lt;p&gt;Once you can reproduce the problem, profile it. Don't read the code — profile it.&lt;/p&gt;

&lt;p&gt;Attach a profiler to your benchmark run and look at where time is actually spent. e.g. JFR (Java Flight Recorder) for Java/Kotlin or py-spy, cProfile for Python.&lt;/p&gt;

&lt;p&gt;What you're looking for is a flame graph (A visual representation of a call stack where the width of each block shows exactly how much CPU time a function and its children consumed) or call tree that shows you which functions consume the most time. The thing you thought was slow may not be. The thing you never suspected could be.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrjf2rupqskog9dqfrt0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrjf2rupqskog9dqfrt0.png" alt="Flame Graph" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 1: Flame Graph&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Identify the Hot Path in Your Library
&lt;/h3&gt;

&lt;p&gt;From the profiler output, identify which internal functions are on the critical path. These are the ones worth optimizing.&lt;/p&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the time in your code, or in a dependency you're calling?&lt;/li&gt;
&lt;li&gt;Is it CPU time (computation) or wall time (waiting on I/O, locks, or allocations)?&lt;/li&gt;
&lt;li&gt;Is it one slow call, or many fast calls that add up?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Last one is very common in libraries. A single call to your API might look fine. But if the client calls it&lt;br&gt;
10,000 times per second, a 50-microsecond allocation per call becomes 500ms of GC pressure (The performance penalty caused by the Garbage Collector frequently pausing the application to clean up a high volume of rapidly created, short-lived objects.) per second.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — Categorize the Bottleneck
&lt;/h3&gt;

&lt;p&gt;Same categories as Part 1, but with library-specific nuances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU-bound&lt;/strong&gt;: Heavy computation per call. Common in parsing, serialization, cryptography, compression. Look for
algorithmic improvements — better data structures, avoiding redundant work, caching computed results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Allocation / GC pressure&lt;/strong&gt;: Creating too many short-lived objects. This is the most common library performance
problem. The client pays the GC cost, not you. Look for object pooling, reusable buffers, or returning primitives instead
of boxed types.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I/O-bound&lt;/strong&gt;: Less common in pure libraries, but relevant if your library wraps file, network, or database access. Look
at whether you're doing unnecessary I/O or whether async patterns would help.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency / thread safety overhead&lt;/strong&gt;: If your library uses locks to be thread-safe, those locks may be contention
points under concurrent load. Look at whether the locking granularity is appropriate, or whether lock-free structures are
viable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Initialization cost amortization&lt;/strong&gt; (Paying a heavy, one-time execution cost upfront—like building a lookup table or parsing a configuration—so that all subsequent calls process much faster.): Some libraries do expensive work at construction time (loading configs, compiling
regexes, building lookup tables). If clients are constructing your objects in a loop instead of reusing them, the fix
might be documentation, not code — or making the expensive object clearly reusable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 5 — Validate Before You Fix
&lt;/h3&gt;

&lt;p&gt;Same discipline as Part 1. Before writing a fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can your benchmark reproduce the problem consistently?&lt;/li&gt;
&lt;li&gt;Can you explain why this specific thing is causing the slowness?&lt;/li&gt;
&lt;li&gt;Does the profiler output support it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If yes to all three — fix it. If not, keep profiling.&lt;/p&gt;

&lt;p&gt;One extra check for libraries: make sure the fix doesn't break correctness. Performance optimizations in libraries could&lt;br&gt;
involve caching, mutability, or reduced copying — all of which can introduce subtle bugs. Your fix needs to pass the full&lt;br&gt;
test suite, not just the benchmark.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6 — Verify and Document
&lt;/h3&gt;

&lt;p&gt;Run your benchmark again after the fix. Measure the delta. Does it match your expectation?&lt;/p&gt;

&lt;p&gt;Then document it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What was the problem?&lt;/li&gt;
&lt;li&gt;What was the fix?&lt;/li&gt;
&lt;li&gt;What input sizes and call patterns does the improvement apply to?&lt;/li&gt;
&lt;li&gt;Are there any trade-offs? (e.g., higher memory usage for better throughput)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because library users need to understand when they'll see the benefit. A fix that helps at 10MB inputs may&lt;br&gt;
not matter at 1KB inputs. Be honest and realistic about the scope.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Closer to Production Visibility (Optional, But Powerful)
&lt;/h2&gt;

&lt;p&gt;One of the hardest parts of library performance work is that you're investigating blind. The client has the production&lt;br&gt;
environment. You have a benchmark. There's a gap between those two things, and that gap is where a lot of investigations&lt;br&gt;
stall.&lt;/p&gt;

&lt;p&gt;There are a few ways to close it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build optional diagnostic logging into your library.&lt;/li&gt;
&lt;li&gt;Most logging frameworks support a concept of named loggers at configurable levels. If your library uses one (like SLF4J in
Java) clients can enable debug-level output from your library without changing your code. Use this. Log things that matter for performance: input sizes, time spent in expensive operations,
cache hit/miss rates, retry counts. Keep it off by default. But make it easy to turn on.&lt;/li&gt;
&lt;li&gt;When a client reports a performance issue, your first ask can be: "Can you enable debug logging for our library and share
the output?" That single step can replace hours of guessing.&lt;/li&gt;
&lt;li&gt;Expose timing hooks or callbacks.
Some libraries go further and expose explicit instrumentation hooks — callbacks or interfaces that clients can implement
to receive timing data. This lets clients pipe your library's internal timings directly into their existing monitoring system — the same
dashboards they use for everything else. You get visibility into their production environment without needing access to
it. They get metrics without having to instrument your code themselves. Something like:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setMetricsListener&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;myMonitoringSystem&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;operationName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;durationMs&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Provide a built-in diagnostic mode (optional but useful).
&lt;/h3&gt;

&lt;p&gt;A step beyond logging: a mode that, when enabled, collects and reports a structured summary of what the library did —&lt;br&gt;
operations performed, time spent, allocations made, retries triggered. Think of it as a flight recorder. The client runs&lt;br&gt;
their workload with diagnostic mode on, exports the report, and sends it to you.&lt;/p&gt;

&lt;p&gt;This is more work to build, but &lt;strong&gt;for libraries where performance is a core concern, it's worth it&lt;/strong&gt;. It's the closest thing&lt;br&gt;
you'll get to having your own monitoring in someone else's production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key principle&lt;/strong&gt;: you can't add monitoring to a client's production environment, but you can make your library observable&lt;br&gt;
enough that the client can do it for you. &lt;strong&gt;Design for observability from the start&lt;/strong&gt; — it's much harder to retrofit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Unique Challenge: You Can't See Their Production
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrcsz0qwwilb8q4zkgk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrcsz0qwwilb8q4zkgk6.png" alt="Production black box." width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 2: Production environment is a black box for library project&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The hardest part of library performance work is that you're always working with incomplete information. The reporter's&lt;br&gt;
production environment is a black box.&lt;/p&gt;

&lt;p&gt;A few things that help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ask for a heap dump or profiler output from their side. Even a rough flame graph from their environment is worth more than your best guess.&lt;/li&gt;
&lt;li&gt;Provide a diagnostic mode or logging hooks. This is especially valuable for intermittent issues you can't reproduce.&lt;/li&gt;
&lt;li&gt;Test against a range of environments. Different JVM versions, GC algorithms, and OS schedulers behave differently. &lt;/li&gt;
&lt;li&gt;Be explicit about your performance contract. Document what you've benchmarked, under what conditions, and what the
expected characteristics are.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Library performance investigation is harder than service performance investigation because you don't own the runtime. But&lt;br&gt;
the discipline is the same: follow the data, not your gut.&lt;/p&gt;

&lt;p&gt;The process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get a clear problem statement — which API, what input, what "slow" means&lt;/li&gt;
&lt;li&gt;Build a reproducible benchmark before touching any code&lt;/li&gt;
&lt;li&gt;Profile the benchmark — don't read code looking for problems&lt;/li&gt;
&lt;li&gt;Identify the hot path from profiler output&lt;/li&gt;
&lt;li&gt;Categorize the bottleneck type&lt;/li&gt;
&lt;li&gt;Validate your hypothesis before fixing&lt;/li&gt;
&lt;li&gt;Verify the fix with the benchmark&lt;/li&gt;
&lt;li&gt;Document the improvement, its scope, and any trade-offs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The mindset shift from &lt;a href="https://dev.to/mfdilawar/-investigating-performance-issues-in-an-existing-system-101-7l6"&gt;Part 1&lt;/a&gt;: you can't observe production, so your benchmark and profiler are your only sources of truth. Invest in making them accurate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/mfdilawar/-investigating-performance-issues-in-an-existing-system-101-7l6"&gt;Part 1&lt;/a&gt; covers the same topic for deployed services — where you have monitoring, distributed tracing, and control over the runtime.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>softwaredevelopment</category>
      <category>softwareengineering</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Investigating Performance Issues In An Existing System: 101</title>
      <dc:creator>Faisal Dilawar</dc:creator>
      <pubDate>Sat, 14 Mar 2026 21:10:50 +0000</pubDate>
      <link>https://forem.com/mfdilawar/-investigating-performance-issues-in-an-existing-system-101-7l6</link>
      <guid>https://forem.com/mfdilawar/-investigating-performance-issues-in-an-existing-system-101-7l6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Part 1 of 2 — This piece covers deployed applications and services. Part 2 covers library projects, which come with a different set of constraints.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Where Most People Go Wrong
&lt;/h2&gt;

&lt;p&gt;You join a project and someone says &lt;em&gt;"We also have performance issues"&lt;/em&gt;. Where do you look first? Someone files a ticket: &lt;em&gt;"The system feels slow"&lt;/em&gt; And someone comes in and asks have you looked at database connection pool settings, tweaking thread counts, adjusting timeout configs or look at optimizing queries. Sounds familiar?&lt;/p&gt;

&lt;p&gt;Two weeks later, latency dropped by 5%. Everyone claps. But the system still feels slow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3674vzkf1707a918r2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3674vzkf1707a918r2g.png" alt="" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 1: "If you torture the data long enough, it will confess to anything." — Ronald Coase&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What happened? We as a developer walked in with a theory and found evidence to support it. The DB query &lt;em&gt;was&lt;/em&gt; slightly inefficient. We also increased thread counts just in case. Maybe even increase some resources. Fixing these &lt;em&gt;did&lt;/em&gt; help a little. But the real bottleneck was a missing cache on the most used workflow that would have taken two days to fix.&lt;/p&gt;

&lt;p&gt;This is the trap. And it's remarkably easy to fall into — even for experienced engineers.&lt;/p&gt;

&lt;p&gt;The goal of this article is to give you a systematic approach so you're following data, not intuition. It's not a playbook but sort of starting point. Each performance issues are almost always unique. And no system is perfect. You can always find small issues in every system. But fixing them may not yield the desired results.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw7creytp6qqxqk9jqpwy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw7creytp6qqxqk9jqpwy.png" alt="problem solving approach comparison" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 2&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Before You Start: These Are The  5 Things You Absolutely Need
&lt;/h2&gt;

&lt;p&gt;You cannot do a meaningful performance investigation without these. If any are missing, get them first — otherwise you're guessing in the dark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Access to the codebase&lt;/strong&gt;&lt;br&gt;
You need to be able to trace execution paths, not just read dashboards. Dashboards tell you &lt;em&gt;that&lt;/em&gt; something is slow. The code tells you &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A monitoring system&lt;/strong&gt;&lt;br&gt;
Even basic metrics — request latency, error rate, CPU usage — are non-negotiable for a deployed service. Without them, you're navigating blind. (For libraries, this is different — we cover that in Part 2.). &lt;br&gt;
If it's not in place as is case in some systems, create one. You need concrete proof of what you have achieved with your changes. It may be you have messed everything up. A monitoring system is the mirror to tell you the truth regarding your changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Understanding of the codebase, or access to a subject matter expert (SME)&lt;/strong&gt;&lt;br&gt;
This is the one people underestimate most. &lt;strong&gt;You cannot optimize code or fix a system you don't understand&lt;/strong&gt;. If it's not your codebase, find the person who knows it and treat them as a key collaborator. &lt;br&gt;
&lt;strong&gt;Hot tip&lt;/strong&gt;: Use AI agents to analyze your codebase if it's possible and generate a comprehensive design of each flow even if you know the code base or have a SME at hand. (&lt;em&gt;Use AI as a starting point, but trust your own tracing more. Also, ensure your organization is comfortable with an AI agent analyzing their codebase&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas2py2s9jt864txz5oen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas2py2s9jt864txz5oen.png" alt="You need to understand the system" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure3: You can't fix a system you don't understand&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Knowledge of the most-used workflows&lt;/strong&gt;&lt;br&gt;
Not every feature gets equal traffic. And fixing performance issue in a very rarely used workflow may not be worthwhile right now. A bug in the login flow matters more than a bug in the settings page. Your monitoring system will usually tell you this directly — look at request frequency, not just latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Defined performance targets&lt;/strong&gt;&lt;br&gt;
"Fast" is not a target. "P99 latency under 200ms for search requests under normal load" is a target. Without a specific number, you can't declare victory and you can't prioritize. &lt;br&gt;
In case it is not defined work with someone to come to a number which should be achievable. You cant do 10 DB queries and achieve a 10ms latency. This number will be your true north to guide you towards the end goal.&lt;/p&gt;



&lt;p&gt;Think of these as your &lt;strong&gt;entry conditions&lt;/strong&gt;. Once you have them, several other things become &lt;em&gt;discoverable&lt;/em&gt; through investigation rather than needing to be handed to you upfront:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure topology&lt;/strong&gt; — visible from deployment configs, cloud console, or a conversation with DevOps. How many instances are deployed. What kind of resources is there in the pod/DB. 1 pod with 2GB RAM and 2 core CPU will not perform equal to 2 pods with 1GB RAM and 1 core CPU each.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency performance map&lt;/strong&gt; — which DBs, caches, queues, and external APIs does this service call, and what are their typical latencies? You can usually get this from code and configuration files. But if it's documented, nothing like it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data characteristics&lt;/strong&gt; — volume, growth rate, and shape of data flowing through the system. Processing 100Kb messages is different that processing 10Gb message. What works for 10,000 requests per hour may that same configuration can be completely useless for 10million messages per hour.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A reproducible test scenario&lt;/strong&gt; — more on this below&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffj2ybt1zvemo6k6sfzbk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffj2ybt1zvemo6k6sfzbk.png" alt="System Performance Framework" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 4: System Performance Framework&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The First Thing You Build: A Reproducible Scenario
&lt;/h2&gt;

&lt;p&gt;Before touching a single line of code or configuration, build a controlled test that demonstrates the performance problem.&lt;/p&gt;

&lt;p&gt;This sounds obvious. &lt;strong&gt;Most people skip it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's why it matters:&lt;/strong&gt; without a reproducible scenario, you can't verify that anything you did actually helped. You might deploy a fix, check production metrics an hour later, and see latency improved. But was that your fix? Or lower traffic? Or a cache that warmed up? You don't know.&lt;/p&gt;

&lt;p&gt;The scenario is your measuring stick. It's the equivalent of a failing test in TDD — you're not done until it passes, and you can't call it passing if you can't run it.&lt;/p&gt;

&lt;p&gt;A good scenario answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What operation are we measuring? (e.g., &lt;code&gt;GET /patients?name=smith&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Under what load? (e.g., 50 concurrent users)&lt;/li&gt;
&lt;li&gt;With what data? (e.g., 1 million patient records in the DB)&lt;/li&gt;
&lt;li&gt;What does "passing" look like? (e.g., P95 &amp;lt; 150ms)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  The Investigation Process
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Step 1 — Measure First, Theorize Later
&lt;/h3&gt;

&lt;p&gt;Pull up your monitoring and answer these questions with data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which endpoints or operations are slow? If there are multiple operations which dont meet SLA pick the one with highest delta between SLA and actual performance. (Look at latency percentiles, not averages)&lt;/li&gt;
&lt;li&gt;Is it constant or spiky? Spiky usually points to GC pauses, lock contention, or cache misses. Constant usually points to an algorithmic or query problem. That would help you focus on real issue. (&lt;em&gt;Spiky latency can also be caused by Network Jitter or Cold Caches. But lets ignore that for now.&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;Is it correlated with load? If latency is fine at 10 req/s but degrades at 100 req/s, you most probably have a concurrency or resource saturation problem.&lt;/li&gt;
&lt;li&gt;When did it start? A sudden change usually means a deployment or a data volume threshold was crossed. Or a configuration change. Could be some change in 3rd party services or upgrade to a newer version of library.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkx2lbcy5g8m9lgxe8u5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkx2lbcy5g8m9lgxe8u5.png" alt="Do not form a hypothesis yet. Just collect facts." width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 5: **Do not form a hypothesis yet.&lt;/em&gt;* Just collect facts.*&lt;/p&gt;


&lt;h3&gt;
  
  
  Step 2 — Identify the Hot Path
&lt;/h3&gt;

&lt;p&gt;Not everything in the system is equally important. Find the operations that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Called frequently&lt;/li&gt;
&lt;li&gt;Slow (high latency)&lt;/li&gt;
&lt;li&gt;High impact to the user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The holy union of those three is where you focus. A rarely-called admin endpoint that takes 2 seconds is less important than a core API that takes 300ms and is called 500 times per second.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubye9daafzoz3g5ts7ya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubye9daafzoz3g5ts7ya.png" alt="Identify hot path" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 6: AI generated this messy image. Still learning how to give good prompt to generate relevant image. (This line is not generated by AI :stuck_out_tongue)&lt;/em&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Step 3 — Trace the Request End to End
&lt;/h3&gt;

&lt;p&gt;For the hot path you identified, trace a single request through every layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → Load Balancer → App Server → [Business Logic] → Database/Cache/External API → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fso10heo75zaa41qzjlj3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fso10heo75zaa41qzjlj3.png" alt="Sample steps of a request" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 7: Usual path of a single request&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At each layer, ask: &lt;em&gt;how much time does this layer contribute? Is it acceptable?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Distributed tracing tools (Jaeger, Zipkin, Datadog APM) show you this as a flame graph or waterfall. If you don't have these, maybe your logs will tell you this. If even that is not possible add logs to get these details. Again, dont assume that my Business Logic is not consuming time, it can only be DB or 3rd party API.&lt;/p&gt;

&lt;p&gt;What you're looking for is &lt;strong&gt;where time is actually spent&lt;/strong&gt;, not where you assume it's spent.&lt;/p&gt;

&lt;p&gt;A common finding: 80% of latency is in one DB query. Another common finding: 30% is in serialization you'd never have guessed. Another: a slow 3rd party API call sitting in the middle of what should be a fast operation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9uftjykaibpboyg079j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9uftjykaibpboyg079j.png" alt="Time breakdown across layers" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 8: Time breakdown across layers&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once your trace tells you &lt;strong&gt;WHICH&lt;/strong&gt; layer is slow, you need to look at the 'shape' of that slowness to categorize it.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 4 — Categorize the Bottleneck
&lt;/h3&gt;

&lt;p&gt;Once you've found where time is spent, categorize it. Each category needs very different solution.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CPU-bound&lt;/strong&gt;&lt;br&gt;
The service is doing heavy computation. &lt;br&gt;
&lt;strong&gt;Symptoms&lt;/strong&gt;: High CPU utilization, scales linearly with load.&lt;br&gt;
&lt;strong&gt;Example&lt;/strong&gt;: Running validation or transformation on every request without caching the result where possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;I/O-bound&lt;/strong&gt;&lt;br&gt;
Time is spent waiting on DB, network, or disk. &lt;br&gt;
&lt;strong&gt;Symptoms&lt;/strong&gt;: CPU is low but latency is high, thread pool exhaustion under load.&lt;br&gt;
&lt;strong&gt;Example&lt;/strong&gt;: An N+1 query — fetching a list of 100 items then making 100 individual DB calls for related data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory / GC pressure&lt;/strong&gt;&lt;br&gt;
Lots of object allocation causing garbage collection pauses. &lt;br&gt;
&lt;strong&gt;Symptoms&lt;/strong&gt;: Latency spikes rather than constant slowness, heap usage that grows and drops periodically.&lt;br&gt;
&lt;strong&gt;Example&lt;/strong&gt;: Creating large intermediate collections in a loop that runs thousands of times per request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Concurrency / contention&lt;/strong&gt;&lt;br&gt;
Threads waiting on each other. &lt;br&gt;
&lt;strong&gt;Symptoms&lt;/strong&gt;: High thread count, low CPU, latency that gets much worse under concurrent load.&lt;br&gt;
&lt;strong&gt;Example&lt;/strong&gt;: A shared resource protected by a &lt;code&gt;synchronized&lt;/code&gt; block that every request needs to acquire.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data volume&lt;/strong&gt;&lt;br&gt;
Queries or algorithms that worked at 10k records fall apart at 10M. &lt;br&gt;
&lt;strong&gt;Symptoms&lt;/strong&gt;: Gradual degradation over time, correlated with data growth.&lt;br&gt;
&lt;strong&gt;Example&lt;/strong&gt;: A missing index, a full table scan, or an in-memory sort of a result set that used to be small.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Above are just some usual categories. Not an exhaustive list&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9huiqz6ekcpiypdo7bdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9huiqz6ekcpiypdo7bdg.png" alt="Categorize the issue" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 9: Categorize the issue&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 5 — Validate Before You Fix
&lt;/h3&gt;

&lt;p&gt;Before writing a single line of fix code, validate your hypothesis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can you reproduce the slow behavior in your reproducible scenario?&lt;/li&gt;
&lt;li&gt;Can you explain &lt;em&gt;why&lt;/em&gt; this specific thing is causing the slowness?&lt;/li&gt;
&lt;li&gt;Does the data support it? (e.g., slow query logs, profiler output, GC logs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can answer yes to all three, you've found the root cause. Now fix it.&lt;/p&gt;

&lt;p&gt;If not, go back to Step 3 and keep tracing. At this stage you may end up finding multiple issues. Not a single Root cause. Use your judgement to pick your fights. Your primary focus is the root cause.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnc3gwjf0iubc8vpcyv0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnc3gwjf0iubc8vpcyv0.png" alt="Validate before you fix" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 10: Validate **before&lt;/em&gt;* you fix*&lt;/p&gt;




&lt;h2&gt;
  
  
  The Prejudice Problem
&lt;/h2&gt;

&lt;p&gt;There is one common trap that I see very commonly (Although I did the same when I was naive). &lt;/p&gt;

&lt;p&gt;You start a performance investigation already believing you know the answer — &lt;em&gt;"it's the DB"&lt;/em&gt;, &lt;em&gt;"it's the thread pool"&lt;/em&gt;, &lt;em&gt;"it's the network"&lt;/em&gt;, &lt;em&gt;"it's the 3rd party api"&lt;/em&gt; — &lt;strong&gt;you will almost always find evidence to support that belief&lt;/strong&gt;. No system is perfect. If you look hard enough at any layer, you'll find something to improve. And improving it will most likely help a little.&lt;/p&gt;

&lt;p&gt;But &lt;strong&gt;"a little"&lt;/strong&gt; is not the same as fixing the root cause. And chasing the wrong thing costs weeks of effort while users continue to experience slowness.&lt;/p&gt;

&lt;p&gt;The discipline is to stay in data-collection mode until the data points clearly at something. Your hypothesis should be the last thing that forms, not the first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfkiojq8552j30govyd4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfkiojq8552j30govyd4.png" alt="Tackling low hanging fruits may not be the best" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;Figure 11: Tackling low hanging fruits may not be the best solution for performance enhancements&lt;/em&gt; &lt;/p&gt;




&lt;h2&gt;
  
  
  A Note on Performance Targets
&lt;/h2&gt;

&lt;p&gt;One thing that kills performance investigations: &lt;strong&gt;nobody defined what "good" looks like.&lt;/strong&gt; You fix something, latency improves, but no one knows &lt;strong&gt;"Is this enough?".&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before you start, establish numbers. Some useful ones (&lt;strong&gt;Look these terms up if you are not sure&lt;/strong&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;P50 / P95 / P99 latency&lt;/strong&gt; — average hides outliers; percentiles don't&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throughput at peak load&lt;/strong&gt; — requests per second the system must handle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error rate under load&lt;/strong&gt; — a system that's fast but drops 2% of requests isn't performing well&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource utilization ceiling&lt;/strong&gt; — at what CPU/memory level does performance degrade?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These become your success criteria. The reproducible scenario you built in step one should be testing against these.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Performance investigation done right is less glamorous than people expect. It's mostly measurement, tracing, and resisting the urge to jump to a solution.&lt;/p&gt;

&lt;p&gt;The process, stripped down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Establish the 5 prerequisites before starting
2. Build a reproducible scenario first
3. Measure — let data tell you where time is spent
4. Identify the hot path
5. Trace end to end across layers
6. Categorize the bottleneck type
7. Validate your hypothesis before fixing
8. Verify the fix using your reproducible scenario
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mindset that makes this work: &lt;strong&gt;follow the data, not your gut&lt;/strong&gt;. Your intuition about where the problem is might be right. But until the data confirms it, it's just a theory.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part 2 covers the same topic for library projects — where you don't have a deployment, monitoring is your responsibility to build, and "production" is someone else's process.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>performance</category>
      <category>latency</category>
      <category>programming</category>
    </item>
    <item>
      <title>Upgrading Java Libraries: A Developer’s Guide to Compatibility</title>
      <dc:creator>Faisal Dilawar</dc:creator>
      <pubDate>Mon, 09 Mar 2026 11:16:41 +0000</pubDate>
      <link>https://forem.com/mfdilawar/upgrading-java-libraries-a-developers-guide-to-compatibility-429i</link>
      <guid>https://forem.com/mfdilawar/upgrading-java-libraries-a-developers-guide-to-compatibility-429i</guid>
      <description>&lt;p&gt;In software engineering, we often treat "upgrading" as a purely positive step—new features, better performance, and patched vulnerabilities. However, when your project is used as a &lt;strong&gt;library&lt;/strong&gt; by other applications, an upgrade can be a minefield.&lt;/p&gt;

&lt;p&gt;Most experienced developers are vary of upgrading core/major libraries. And most of the people maintaing library projects &lt;strong&gt;(bless them!!)&lt;/strong&gt; dont think about the actual devs using them.&lt;/p&gt;

&lt;p&gt;While you control your own codebase, you don't control the hundreds or thousands of downstream projects that depend on your API. Here we wil talk about the updating you Java (or any other language) library projects without breaking the world.&lt;/p&gt;

&lt;h3&gt;
  
  
  Upgrading an Application
&lt;/h3&gt;

&lt;p&gt;When you upgrade an internal service, you have complete visibility. If you rename a method, you can refactor every caller in one go. You have a single team to coordinate with and an immediate feedback loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Upgrading a Library
&lt;/h3&gt;

&lt;p&gt;Library maintainers face "unknown unknowns." Your code is used in ways you never envisioned by teams you've never met. Every change must be viewed through the lens of backward compatibility because you cannot coordinate with all consumers simultaneously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Consumer Pain Points: What Breaking Changes Feel Like
&lt;/h2&gt;

&lt;p&gt;Breaking changes aren't just technical hurdles; they are business costs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compilation Failures:&lt;/strong&gt; When signatures change or classes disappear, consumer velocity grinds to a halt as developers hunt through changelogs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Documentation Gaps:&lt;/strong&gt; Missing Javadocs (@ param, @return, @throws) turn an API into a guessing game.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Intuition Failures:&lt;/strong&gt; Method names that suggest one behavior but implement another erode trust in your library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Internal Behavior Shifts:&lt;/strong&gt; The "Silent Killer." The code compiles, but logic breaks at runtime. These cause the most expensive production incidents.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Gold Standard: Maintaining Backward Compatibility
&lt;/h2&gt;

&lt;p&gt;Follow a simple mantra fro as long as possible: &lt;strong&gt;"Don't Break whats working."&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Add, Don't Remove
&lt;/h3&gt;

&lt;p&gt;Instead of modifying the signature of an existing method, introduce a new overload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Calculator&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (Safe Evolution):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Calculator&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Original method delegates to the new implementation with defaults&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DEFAULT&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// New overload provides enhanced functionality&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;multiplier&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Deprecate gracefully in phased manner
&lt;/h3&gt;

&lt;p&gt;Mark obsolete APIs with the &lt;code&gt;@Deprecated&lt;/code&gt; annotation. Use the &lt;code&gt;since&lt;/code&gt; attribute and &lt;code&gt;forRemoval=true&lt;/code&gt; to signal intent. Always link to the replacement in the Javadoc. Use this only if the older method will most probably work but for all intents and purposes the newer version is better.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="cm"&gt;/**
 * @deprecated Use {@link #newMethod()} instead.
 * This method will be removed in version 3.0.
 */&lt;/span&gt;
&lt;span class="nd"&gt;@Deprecated&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;since&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2.0"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;forRemoval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;oldMethod&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Legacy implementation maintained for 2-3 major versions&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. The "Silent Killer": Internal Behavior Changes
&lt;/h3&gt;

&lt;p&gt;The most problematic breaking changes are those that pass the compiler but fail at runtime. Consider changing a return type from &lt;code&gt;null&lt;/code&gt; to &lt;code&gt;Optional&lt;/code&gt; or a Custom Exception.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Scenario:&lt;/strong&gt;&lt;br&gt;
A library method used to return &lt;code&gt;null&lt;/code&gt; if a user wasn't found. Now, it returns &lt;code&gt;Optional&amp;lt;User&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Consumer's Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;finder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findUser&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; 
    &lt;span class="c1"&gt;// This check is now PERMANENTLY true because Optional is an object!&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;process&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// NullPointerException when calling methods on empty Optional&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We saw this specific iussue once when we were upgrading java in our project and had to upgrade another libarary as part of that.Luckily both the library and app code was being maintained by us.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hard Breaks &amp;amp; Dependency Hell
&lt;/h2&gt;

&lt;p&gt;Sometimes a hard break is unavoidable due to security flaws or architectural debt. In these cases adding details to javadocs is the most helpful things to do for end users:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explain the Why:&lt;/strong&gt; Was it a security patch? A performance bottleneck?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explain the How:&lt;/strong&gt; Provide clear migration scripts or "Search and Replace" instructions if possible.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Diamond Dependency Problem
&lt;/h3&gt;

&lt;p&gt;When your library and the app require different versions of the same third-party dependency, consumers often face &lt;code&gt;NoSuchMethodError&lt;/code&gt;. You can help them by documenting how to use Maven exclusions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.yourlibrary&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;awesome-lib&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;2.0.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;exclusions&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;exclusion&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.conflicting.lib&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;clash-artifact&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/exclusion&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/exclusions&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If nothing else your documentation and release notes will help them plan the migration better.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pre-Publish Checklist
&lt;/h2&gt;

&lt;p&gt;Before you publish your next release, ask yourself:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Did I test existing consumer code?&lt;/li&gt;
&lt;li&gt;Is my Javadoc complete (@ param, @return, @throws)?&lt;/li&gt;
&lt;li&gt;Are deprecations clearly marked with a timeline?&lt;/li&gt;
&lt;li&gt;Did I explain the "Why" for any hard breaks in the release notes?&lt;/li&gt;
&lt;li&gt;Have I checked for behavioral contract shifts?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxe3rcn4fl16m7f1rt41y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxe3rcn4fl16m7f1rt41y.png" alt="Don't break whats working" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the current environment of AI dependency, the agents also use your documentation as its guiding light for using ypur library whose codebase is not available.&lt;br&gt;
Library development is an exercise of responsibility. Every change affects a developer who trusted your API contract. Ship with their needs in mind.&lt;/p&gt;

</description>
      <category>dependency</category>
      <category>java</category>
      <category>coding</category>
    </item>
    <item>
      <title>Two Versions, One Project - A Guide to Java Dependency Shading</title>
      <dc:creator>Faisal Dilawar</dc:creator>
      <pubDate>Mon, 11 Aug 2025 04:02:06 +0000</pubDate>
      <link>https://forem.com/mfdilawar/two-versions-one-project-a-guide-to-java-dependency-shading-5bd9</link>
      <guid>https://forem.com/mfdilawar/two-versions-one-project-a-guide-to-java-dependency-shading-5bd9</guid>
      <description>&lt;p&gt;Here, we'll learn how to use two versions of the same library in a single JVM project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem statement:
&lt;/h2&gt;

&lt;p&gt;In modern Java development we have the luxury of using build automation tools like maven or gradle to handle our dependencies once we tell them what we need. They are amazing at managing things when dealing with multiple libraries and transient dependencies.&lt;br&gt;
But sometimes we can run into an issue where they may not be able to pick the correct dependency version. &lt;/p&gt;

&lt;p&gt;Consider the following example.&lt;br&gt;
A library has 2 versions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Version 1 (v1) is the older one and has 3 methods

&lt;ul&gt;
&lt;li&gt;methodA, methodB and methodC&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Version 2 (v2) is the newer one and has 3 methods

&lt;ul&gt;
&lt;li&gt;methodA, methodB (with slightly different logic but same signature) and methodD&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You have your application setup with maven and your application needs version1 and specifically &lt;strong&gt;methodC&lt;/strong&gt;.&lt;br&gt;
Your application also has a dependency on a 3rd party library which needs version 2 and &lt;strong&gt;methodD&lt;/strong&gt; or maybe even newer &lt;strong&gt;methodB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What to do in that case? As maven will ensure only one version is used.  Basically "nearest win" strategy. Either v1 or v2. And you can't have 2 classes with same package_name + file_name in your application. Diamond dependency?&lt;/p&gt;
&lt;h2&gt;
  
  
  Shading to the rescue
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;In a nutshell, it packs the complete version1 (and its dependency if needed) of the library in your jar.&lt;/li&gt;
&lt;li&gt;It also renames the v1 package/path. So com.organization.project.library becomes something like com.organization.project.&lt;strong&gt;shaded&lt;/strong&gt;.library.&lt;/li&gt;
&lt;li&gt;While packaging maven will replace all the references (like import statements and fully qualified name) of the non-shaded (v1) package name with the shaded path. &lt;/li&gt;
&lt;li&gt;The 3rd party library will obviously keep on using the non-shaded (v2) path as it's not been modified by Maven.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4td9kndc938xzo4u5wiw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4td9kndc938xzo4u5wiw.png" alt="Dependency shading" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Its like my wife telling me not to buy any new bike and me try to convince her it's not exactly a bike but a lawn mower (Partially true story).&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;FYI: Shading plugin basically rewrites the bytecode.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  A real-world example
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Guava&lt;/strong&gt; is a high-quality utility library used in many projects. But it suffers from one major flaw: it often breaks backward compatibility. Let's say your application uses &lt;strong&gt;v32&lt;/strong&gt;, but you include a 3rd party library that relies on &lt;strong&gt;v23&lt;/strong&gt;. This can lead to &lt;strong&gt;java.lang.NoSuchMethodError&lt;/strong&gt;.&lt;br&gt;
To handle this you bundle a copy of Guava v23 classes inside your final JAR and relocate its packages from &lt;em&gt;com.google.common.*&lt;/em&gt; to a private path like &lt;em&gt;com.&lt;strong&gt;myapp.shaded&lt;/strong&gt;.guava.com.google.common.*&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Sample pom for above mentioned example (check the build/plugin part):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;project&lt;/span&gt; &lt;span class="na"&gt;xmlns=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0"&lt;/span&gt; &lt;span class="na"&gt;xmlns:xsi=&lt;/span&gt;&lt;span class="s"&gt;"http://www.w3.org/2001/XMLSchema-instance"&lt;/span&gt; &lt;span class="na"&gt;xsi:schemaLocation=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;modelVersion&amp;gt;&lt;/span&gt;4.0.0&lt;span class="nt"&gt;&amp;lt;/modelVersion&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.myapp&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;shading-example&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0-SNAPSHOT&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;packaging&amp;gt;&lt;/span&gt;jar&lt;span class="nt"&gt;&amp;lt;/packaging&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;properties&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;project.build.sourceEncoding&amp;gt;&lt;/span&gt;UTF-8&lt;span class="nt"&gt;&amp;lt;/project.build.sourceEncoding&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.source&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.source&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.target&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.target&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;guava.version&amp;gt;&lt;/span&gt;32.1.3-jre&lt;span class="nt"&gt;&amp;lt;/guava.version&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/properties&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.google.guava&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;guava&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;${guava.version}&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.example&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;some-data-library&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.2.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;exclusions&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;exclusion&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.google.guava&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;guava&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;/exclusion&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/exclusions&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;build&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;plugins&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;plugin&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.maven.plugins&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-shade-plugin&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;3.5.1&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;executions&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;execution&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;phase&amp;gt;&lt;/span&gt;package&lt;span class="nt"&gt;&amp;lt;/phase&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;goals&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;goal&amp;gt;&lt;/span&gt;shade&lt;span class="nt"&gt;&amp;lt;/goal&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;/goals&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;configuration&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;artifactSet&amp;gt;&lt;/span&gt;
                                &lt;span class="nt"&gt;&amp;lt;includes&amp;gt;&lt;/span&gt;
                                    &lt;span class="nt"&gt;&amp;lt;include&amp;gt;&lt;/span&gt;com.example:some-data-library&lt;span class="nt"&gt;&amp;lt;/include&amp;gt;&lt;/span&gt;
                                &lt;span class="nt"&gt;&amp;lt;/includes&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;/artifactSet&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;relocations&amp;gt;&lt;/span&gt;
                                &lt;span class="nt"&gt;&amp;lt;relocation&amp;gt;&lt;/span&gt;
                                    &lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;com.google.common&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
                                    &lt;span class="nt"&gt;&amp;lt;shadedPattern&amp;gt;&lt;/span&gt;com.myapp.shaded.guava&lt;span class="nt"&gt;&amp;lt;/shadedPattern&amp;gt;&lt;/span&gt;
                                &lt;span class="nt"&gt;&amp;lt;/relocation&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;/relocations&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;createDependencyReducedPom&amp;gt;&lt;/span&gt;false&lt;span class="nt"&gt;&amp;lt;/createDependencyReducedPom&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;/execution&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;/executions&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.google.guava&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;guava&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;23.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/plugin&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/plugins&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/build&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/project&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Jackson (com.fasterxml.jackson)&lt;/strong&gt; and &lt;strong&gt;Kryo (com.esotericsoftware.kryo)&lt;/strong&gt; are also 2 more examples where developers can face similar issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shading is Simple. Right?
&lt;/h2&gt;

&lt;h2&gt;
  
  
  A more complex problem statement
&lt;/h2&gt;

&lt;p&gt;What if your application itself needs &lt;strong&gt;both versions&lt;/strong&gt; of the library simultaneously? I know it’s a very unusual scenario. And I &lt;strong&gt;hope&lt;/strong&gt; you don’t have to face something similar. But we faced this and lived to tell the tale (&lt;em&gt;you are reading it, right?&lt;/em&gt;).&lt;br&gt;
Here, the standard shading approach fails. As the shading plugin modifies &lt;strong&gt;all occurrences&lt;/strong&gt; of the original package, renaming them to the new shaded path. And we don’t want that. We want some of the references to use v1 and other v2. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I &lt;strong&gt;sincerely hope&lt;/strong&gt; you are not in a scenario to support 3 different versions. If yes, do write an article about your own miserable coding life.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;In our case&lt;/strong&gt;, the environment itself where we had to deploy the application was providing us with a runtime dependency. The newer version of the library was essential for our application to function in the evolving runtime environment. At the same time, we needed the older version of that library to read existing, persisted data. So the older version was mandatory for us. And of course, no backward compatibility (you thought this would be easy?).&lt;/p&gt;
&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;To solve this, we moved all the code that relied on the older library version into its own, separate library project. &lt;/li&gt;
&lt;li&gt;In the pom.xml of our library project, we shaded the old dependency much like the Guava example (with a small difference we'll explain later). &lt;/li&gt;
&lt;li&gt;And then we basically generated two distinct artifacts from this project: the original, standard JAR and the new, shaded JAR.&lt;/li&gt;
&lt;li&gt;This new, shaded JAR (containing the old library) and the original version of the library were both added as dependencies to our main application. &lt;/li&gt;
&lt;li&gt;And the code in our main application that needed the v1 was updated to import the new, relocated packages from our custom-shaded JAR.  And any code which needed referencing to v2 was kept as is.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Library pom.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;project&lt;/span&gt; &lt;span class="na"&gt;xmlns=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0"&lt;/span&gt;
         &lt;span class="na"&gt;xmlns:xsi=&lt;/span&gt;&lt;span class="s"&gt;"http://www.w3.org/2001/XMLSchema-instance"&lt;/span&gt;
         &lt;span class="na"&gt;xsi:schemaLocation=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;modelVersion&amp;gt;&lt;/span&gt;4.0.0&lt;span class="nt"&gt;&amp;lt;/modelVersion&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.myapp.wrappers&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;guava-v23-wrapper&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;packaging&amp;gt;&lt;/span&gt;jar&lt;span class="nt"&gt;&amp;lt;/packaging&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;properties&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;project.build.sourceEncoding&amp;gt;&lt;/span&gt;UTF-8&lt;span class="nt"&gt;&amp;lt;/project.build.sourceEncoding&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.source&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.source&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.target&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.target&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/properties&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.google.guava&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;guava&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;23.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;build&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;plugins&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;plugin&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.maven.plugins&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-shade-plugin&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;3.5.1&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;executions&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;execution&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;phase&amp;gt;&lt;/span&gt;package&lt;span class="nt"&gt;&amp;lt;/phase&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;goals&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;goal&amp;gt;&lt;/span&gt;shade&lt;span class="nt"&gt;&amp;lt;/goal&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;/goals&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;configuration&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;shadedClassifierName&amp;gt;&lt;/span&gt;shaded-guava-v23&lt;span class="nt"&gt;&amp;lt;/shadedClassifierName&amp;gt;&lt;/span&gt;

                            &lt;span class="nt"&gt;&amp;lt;relocations&amp;gt;&lt;/span&gt;
                                &lt;span class="nt"&gt;&amp;lt;relocation&amp;gt;&lt;/span&gt;
                                    &lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;com.google.common&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
                                    &lt;span class="nt"&gt;&amp;lt;shadedPattern&amp;gt;&lt;/span&gt;com.myapp.shaded.guava.v23&lt;span class="nt"&gt;&amp;lt;/shadedPattern&amp;gt;&lt;/span&gt;
                                &lt;span class="nt"&gt;&amp;lt;/relocation&amp;gt;&lt;/span&gt;
                            &lt;span class="nt"&gt;&amp;lt;/relocations&amp;gt;&lt;/span&gt;
                        &lt;span class="nt"&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;/execution&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;/executions&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/plugin&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/plugins&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/build&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Major change is introduction of tag shadedClassifierName. It tells the maven-shade-plugin not to replace the main artifact. Instead, it creates a new JAR and appends -shaded-guava-v23 to its name.&lt;br&gt;
In application pom.xml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.google.guava&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;guava&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;${guava.latest.version}&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.myapp.wrappers&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;guava-v23-wrapper&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;classifier&amp;gt;&lt;/span&gt;shaded-guava-v23&lt;span class="nt"&gt;&amp;lt;/classifier&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In your application code, you can basically now do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.google.common.base.Strings&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GuavaVersionHandler&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;useNewGuava&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;padEnd&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'.'&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;useOldGuava&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;myapp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;shaded&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;guava&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;v23&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;base&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isNullOrEmpty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;GuavaVersionHandler&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GuavaVersionHandler&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;useNewGuava&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"modern"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"New Guava result: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;result2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;useOldGuava&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Old Guava result: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;result2&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;So there you have it. Dependency shading is a powerful, if slightly deceptive, tool in the fight against dependency hell. Sometimes, you just need to put a cat costume on a library to keep a third-party dependency happy. Other times, you have to convince your own application that your new bicycle is actually a lawnmower to maintain backward compatibility.&lt;/p&gt;

&lt;p&gt;While it shouldn't be your first resort—as it adds complexity and size to your project—knowing how to effectively shade a dependency is the perfect escape hatch for otherwise impossible version conflicts. Use this power wisely, and happy (and less miserable) coding! And remember if you dabble into shading, test the hell out of your application/code.&lt;/p&gt;

&lt;p&gt;Would love to hear if you have used any other tools or strategies besides shading to resolve version conflicts in your projects.&lt;/p&gt;

</description>
      <category>java</category>
      <category>tutorial</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
