Forem: mohideen sahib

PORT VS SOCKET

mohideen sahib — Sun, 01 Mar 2026 01:33:55 +0000

1️⃣ What Is a Port?

A port is just a number (0–65535) that identifies a service on a machine.

Think of it like:

IP address → identifies the machine

Port → identifies the application inside the machine

Examples:

22 → SSH

80 → HTTP

443 → HTTPS

3306 → MySQL

When you see:

192.168.1.10:443

That means:

Machine IP = 192.168.1.10

Service = running on port 443

👉 A port by itself does NOT mean a connection exists.
It just means a process is listening.

2️⃣ What Is a Socket?

A socket is a full communication endpoint.

It includes:

IP address + Port + Protocol (TCP/UDP)

But a real TCP connection is uniquely identified by:

Source IP + Source Port + Destination IP + Destination Port + Protocol

Example:

Client: 10.0.0.5:51512
Server: 192.168.1.10:443
Protocol: TCP

That 5-tuple defines one unique connection.

So:

Port Socket

Just a number Full communication endpoint
Identifies a service Identifies a connection
Exists without traffic Exists during communication

3️⃣ How Is a Socket Created?

Sockets are created by the Operating System kernel, not directly by your application.

Applications only request them via system calls.

🔹 Client Side

When your browser connects to HTTPS:

Step 1 — socket()

Application asks kernel to create a socket.

Kernel:

Allocates socket structure in memory

Returns a file descriptor

Step 2 — connect()

Kernel:

Assigns an ephemeral port (e.g., 51512)

Initiates TCP 3-way handshake:

SYN

SYN-ACK

ACK

After handshake → connection becomes ESTABLISHED.

🔹 Server Side

When Nginx starts:

socket()

Creates listening socket.

bind()

Reserves port (e.g., 443).

listen()

Marks socket as listening.

accept()

When client connects:

Kernel creates a new socket

Listening socket stays open

One new socket per client

If 10,000 clients connect → 10,000 sockets.

4️⃣ Who Manages the Socket?

👉 The Linux kernel TCP/IP stack.

It manages:

TCP state (SYN_SENT, ESTABLISHED, TIME_WAIT)

Send/receive buffers

Sequence numbers

Congestion control

Memory allocation

Applications only:

Read

Write

Everything else = kernel responsibility.

5️⃣ Why Does a Socket Use a File Descriptor?

Now the interesting part.

Sockets do not write to disk.

So why do they use file descriptors (FDs)?

Because in Unix/Linux:

“Everything is a file.”

This philosophy originated in early Unix at Bell Labs.

Linux treats:

Files

Sockets

Pipes

Terminals

Devices

epoll

eventfd

All as file descriptors.

6️⃣ What Is a File Descriptor Actually?

A file descriptor is:

Just an integer

Index into a per-process table

Points to a kernel object

Example:

0 → stdin
1 → stdout
2 → stderr
3 → first opened socket/file

When you call:

int fd = socket(AF_INET, SOCK_STREAM, 0);

Kernel:

Creates socket object

Stores it in process FD table

Returns a small integer

That integer is just a handle.

It does NOT mean disk file.

7️⃣ Why Reuse File Descriptor Mechanism?

Because it gives a unified API:

Same syscalls work for:

Files

Sockets

Pipes

Like:

read(fd)
write(fd)
close(fd)
poll(fd)
epoll(fd)

No special “network API” needed.

That abstraction is extremely powerful.

8️⃣ Why This Matters in Real Systems

In high-traffic systems:

50,000 concurrent connections
= 50,000 sockets
= 50,000 file descriptors

If you see:

Too many open files

It usually means:

You exhausted file descriptors

Not disk files

Check:

ulimit -n

Containers and Kubernetes pods share the node kernel, so:

Node FD limits matter

Socket exhaustion is real

TIME_WAIT floods can kill throughput

9️⃣ Visual Summary

Port

Just a service identifier

No active communication

Socket

Kernel object

Represents a live connection

Contains TCP state + buffers

File Descriptor

Integer handle

Points to kernel object

Used for unified I/O abstraction

10️⃣ Final Mental Model

Think of it like this:

IP = Building

Port = Door

Socket = Active phone call between two doors

File Descriptor = The call reference number your OS uses internally

Inside the AWS US-East-1 Outage: Why DNS Failure Triggered a Global Cloud Crisis

mohideen sahib — Tue, 21 Oct 2025 02:27:17 +0000

What Really Happened in the AWS US-East-1 Outage and Why It Was So Bad: An Initial Writeup Based on AWS Communications

While many tech professionals have detailed AWS’s recent US-East-1 outage, my view is shaped by extensive experience managing DNS outages in on-premises environments. This writeup is an initial analysis based on AWS’s official statements and public information.

Why AWS Outage Became a Doomsday Event Unlike Typical On-Prem DNS Failures

DNS outages are a fundamental failure point in any distributed system. No provider, including AWS, can fully eliminate DNS risk. Yet in on-prem environments, DNS disruptions—even with tight application dependencies—usually recover fast and stay localized, enabling quick service restoration.

AWS operates at hyperscale—millions of interdependent APIs, services, and control planes deeply coupled and globally dispersed. DNS in AWS underpins service discovery, authentication, authorization, and control-plane orchestration. The US-East-1 DNS failure that hit DynamoDB endpoints triggered cascading failures across IAM, Lambda, EC2, CloudWatch, and more. Retry storms and state synchronization extended outage timelines, transforming a typical DNS hiccup into a prolonged global incident.

Rough Dependency Mapping of Key Affected AWS Services and Their DNS Endpoint Dependencies

This dependency mapping and analysis are personal assessments based on publicly available AWS documentation, outage reports, and professional experience. Due to AWS’s proprietary and complex architecture, some inferred details may not exactly represent internal implementations. This post aims to provide an informed approximation grounded in official public information and practical knowledge, not an authoritative AWS internal architecture description.

DynamoDB (dynamodb.us-east-1.amazonaws.com)
- Services that depend on DynamoDB:
- IAM: Uses DynamoDB to store and retrieve authentication tokens, session state, and authorization policies. This enables IAM to validate credentials and enforce access control.
- Lambda: Uses DynamoDB for state persistence and event metadata storage. Lambda functions may read/write data to DynamoDB tables as part of normal workflows.
- CloudWatch: Stores custom metrics and alarms related to resource usage and function executions in DynamoDB.
- Why the dependency matters: DynamoDB acts as a fast, globally distributed NoSQL store holding critical authorization, session, and configuration data. If unresolved or inaccessible, IAM cannot authenticate or authorize, leading to login and API failures.
IAM (Identity and Access Management)
- Depends on:
- DynamoDB: for policy storage, session tokens, and metadata.
- KMS (Key Management Service): for cryptographic key operations to securely sign and validate tokens.
- Lambda: for custom authorization flows and policy evaluations that can trigger functions dynamically.
- Services that depend on IAM:
- All AWS Services: Every service requiring access control checks (EC2, Lambda, S3, etc.) queries IAM for validated credentials and permissions.
- AWS Console & Support: User portal and case-raising systems rely on IAM for authentication and enrollment.
- Why the dependency matters: IAM is the cornerstone for secure identity and access control. Any interruption cascades into login failures and administrative lockouts.
Lambda
- Depends on:
- S3: for fetching function code and layers during cold starts.
- IAM: for getting execution roles and permission tokens.
- Event Sources: like S3, EventBridge, or DynamoDB streams for triggering executions.
- Services that depend on Lambda:
- Application Workflows and System Integrations: Lambda enables event-driven architectures, allowing asynchronous processing in many AWS services.
- Why the dependency matters: Lambda’s dynamic, scalable compute depends on timely availability of code from S3, secure token access via IAM, and event triggers—all reliant on DNS-based resolution and availability.
EC2 and VPC
- Depends on:
- IAM: for instance credentials and access tokens.
- Metadata Service: to fetch configuration and instance metadata at runtime.
- AMI Catalogs (via S3/EC2 API Endpoints): for retrieving machine images to launch new instances.
- Services that depend on EC2:
- Customer Applications and Services: rely on EC2 instances for compute, networking, and storage access.
- Why the dependency matters: EC2 provisioning and ongoing instance operations rely on credential validation and configuration data resolvable only through DNS-based AWS endpoints. Failures in these dependencies delay provisioning and impact workloads.
CloudWatch
- Depends on:
- IAM: for authenticating metric and log uploads.
- DynamoDB or other data stores: for storing monitoring data and alarm state.
- Services that depend on CloudWatch:
- All AWS Users and Services: rely on CloudWatch for operational visibility and automated response triggers.
- Why the dependency matters: Loss of monitoring visibility impacts incident response and auto-remediation capabilities critical during outages.
Route 53
- Depends on:
- Internal Control Plane Services: to verify DNS zones, health checks, and routing policies.
- Services that depend on Route 53:
- All AWS Services and Customer Applications: depend on Route 53 for DNS resolution, failover routing, and global traffic management.
- Why the dependency matters: DNS is foundational for AWS internal and external communications. Route 53’s partial degradation affected failover and traffic routing during the outage.

What Customers Did During the Outage — Help or Hurt?

Many customers sought to fail over to standby regions. However:

Human ability to log into IAM management consoles and promote Disaster Recovery (DR) regions was impaired because IAM’s global authentication backbone remained dependent on US-East-1 endpoints.
Hybrid on-prem + AWS DR setups faced manual complexity, needing reconfiguration of on-prem services to point to DR sites.
Traffic redirection often requires updating Route 53 DNS records for warm/standby sites. While Route 53 health checks ordinarily enable hot-hot failover by routing traffic away from degraded sites, Route 53 itself experienced partial degradation, limiting automated failover efficacy.
Many customers reported backlogs and slow performance in US-East-1, driving them to failover attempts that risked data conflicts due to asynchronous replication, especially for global DynamoDB tables and IAM policies.

Did Login Failures Occur Across Regions? Disaster Recovery State?

Yes. Because IAM and DynamoDB global tables anchor on US-East-1, login and authentication failures were seen in failover regions. Effective disaster recovery requires not only traffic failover but also resilient global state replication and authentication services. Without this, DR activation is hampered by login and token validation failures.

Official AWS Root Cause Summary (Public)

Amazon confirmed the core issue was a DNS resolution failure for DynamoDB API endpoints in the US-East-1 region starting late October 19, 2025. Though DNS issues were mitigated early October 20, retry storms and internal networking load balancer faults prolonged service impact for hours, affecting thousands of customers and multiple AWS services, including Amazon’s own platforms.

Final Thoughts: DNS is an Unavoidable Fundamental Risk—not an AWS Fault

DNS underpins all distributed services globally and cannot be engineered to be infallible. This outage highlights the need for system architects to anticipate DNS failures, build architectures with decoupled control planes, multi-region resilience, caching, and failover strategies focused on graceful degradation over catastrophic failure.

AWSOutage #DNSFailure #IAM #DynamoDB #CloudResilience #MultiRegion #DisasterRecovery #DevOps #SRE #Infrastructure

[Boost]

mohideen sahib — Fri, 17 Oct 2025 12:34:20 +0000

🗄️DB Performance 101: A Practical Deep Dive into Backend Database Optimization⚡

Arijit Ghosh ・ Oct 16

#database #postgres #tutorial #sql

Why S3, NFS, and EFS Are Not Block Storage

mohideen sahib — Fri, 17 Oct 2025 12:14:45 +0000

☁️ Myth vs Fact: Why S3, NFS, and EFS Are Not Block Storage

💭 The common doubt:

“Everything — NFS, EFS, S3, or even EBS — ultimately saves data on some disk, right?
Then why call some object storage, some file storage, and others block storage?”

Let’s bust this myth once and for all 👇

🧱 1️⃣ Block Storage — The Raw Disk

Block storage is the lowest layer.
You talk directly to the storage device — just like /dev/sda on Linux.

No concept of files yet.
You format it yourself (mkfs.ext4, mkfs.xfs) to create a filesystem.
Best suited for databases, VMs, and OS disks.

🧩 Examples:
AWS EBS, iSCSI volumes, SAN disks.

📦 Analogy:
You’re given a bare hard disk.
You decide how to format, partition, and use it.

📂 2️⃣ File Storage — The Shared Filesystem Layer

File storage sits on top of block storage and exposes a filesystem interface.
Here you work with files and folders, not raw blocks.

The server side (like NFS/EFS) already formatted and manages the filesystem.
You just mount it on your client using mount -t nfs ... or mount -t efs ....
Great for shared environments where multiple servers need file access.

🧩 Examples:
NFS, AWS EFS, SMB, CIFS.

📦 Analogy:
Instead of giving you a disk, someone gives you a shared folder that’s already organized and formatted.

🪣 3️⃣ Object Storage — The API Level

This is the highest level of abstraction.
You don’t see files, folders, or disks — you deal with objects (data + metadata).

Accessed via HTTP APIs (PUT, GET).
No filesystem.
Great for scalable, distributed systems.

🧩 Examples:
AWS S3, MinIO, Azure Blob, GCS.

📦 Analogy:
You hand over a file to a receptionist (the API) who stores it in a massive warehouse.
You never see where it goes — you just ask for it later using its unique ID.

🔍 The Real Difference Is How You Access Data

Type	Access Interface	What You Manage	Example
Block	OS Disk (Raw Blocks)	Sectors / Blocks	EBS, iSCSI
File	Filesystem (Paths)	Files & Folders	NFS, EFS
Object	API Calls (HTTP)	Objects & Metadata	S3, MinIO

⚡ Myth vs Fact

Myth	Fact
“All storage is block storage since it ends up on disks.”	Physically true, but the user interface and protocol define the storage type.
“EFS and S3 are both network storages, so they’re similar.”	Nope! EFS is file-level (POSIX filesystem), S3 is object-level (HTTP-based).
“NFS uses block storage, so it’s block-level.”	It uses block storage underneath, but it exposes a file interface, not blocks.
“For file storage, we always format the disk.”	Only for local disks. For NFS/EFS, the server has already done that formatting.

🧠 TL;DR

Storage type is not about where data lives —
it’s about how you access and manage* it.*

🧱 Block → raw disk control
📂 File (NFS/EFS) → filesystem view
🪣 Object (S3) → API-based storage

Everything ends up on physical disks,
but what you touch — blocks, files, or objects — defines its nature.

💬 Bonus Thought

Databases prefer block storage because they want total control of how bytes hit the disk.
But backups, images, and logs shine in object storage — scalable, simple, and metadata-rich.

VMware Snapshots Explained: Internals, Pitfalls, and Deep Dive into Base + Delta Mechanics

mohideen sahib — Wed, 15 Oct 2025 10:19:40 +0000

🧠 VMware Snapshots — The Complete Deep Dive

Snapshots are one of VMware’s most powerful yet misunderstood features.
They let you capture a VM’s exact state (disk, memory, and config) and return to it later.
But they also impact performance and datastore health if used carelessly.

This post explains — in detail — how snapshots work, what happens during revert, OS impact, and cluster-level risks.

⚙️ 1. What Is a VMware Snapshot?

A snapshot preserves a VM’s disk, memory, and power state at a point in time.
After it’s taken:

The base disk becomes read-only.

All new writes go to a delta disk.

Optionally, memory and CPU state are saved too.

🧩 2. Files Created During a Snapshot

File Purpose

Base Disk (.vmdk) Original virtual disk, becomes read-only.
Delta Disk (-delta.vmdk) Stores changes after the snapshot.
Memory File (.vmem) Captures RAM contents if “snapshot memory” is enabled.
Snapshot Metadata (.vmsn) Records configuration, disk, and memory references.

🔍 3. How Snapshot Works

Creation

VMware freezes disk I/O briefly.

A new delta file (vmname-000001-delta.vmdk) is created.

Writes now go to the delta file, keeping the base disk intact.

Retention

Each snapshot adds another delta file, forming a chain.

Reads span across all deltas and the base disk.

Deletion (Commit)

Changes in delta files are merged back into the base disk.

Deletion can trigger heavy I/O depending on delta size.

🔄 4. What Happens During Snapshot Revert

Disk State

VMware reconstructs the snapshot point by combining the base and snapshot delta.

The VM now reads from the reconstructed snapshot state.

Memory State

If memory was captured, .vmem restores RAM and CPU registers.

Processes resume exactly as they were — no reboot.

OS Behavior

The OS is not rebooted, but uptime resets if memory is restored.

Some sessions may drop briefly, but the VM remains reachable.

⚠️ 5. Why OS Takes Time to Stabilize After Revert

Even if the vSphere task shows “Revert completed”, the guest OS may need minutes to recover.
That’s because:

Disk caches, journaled filesystems (ext4/NTFS), and swap files revalidate.

VMware triggers background I/O to reattach or consolidate delta data.

This causes temporary CPU and I/O spikes until the OS stabilizes.

🧠 6. Why Delta Files Are Needed

Even when reverting to the base disk, VMware must read delta files because:

They contain changed blocks since the snapshot.

To restore the exact state, VMware applies those deltas backward.

Hence, deltas remain essential even when reverting to “base.”

📁 7. .vmsn and .vmem Explained

File Description

.vmsn Snapshot descriptor containing VM config, disk, and memory pointers.
.vmem Memory dump used to resume the VM’s running state instantly.

🧱 8. What You Can’t Do While Snapshots Exist

Snapshots freeze certain VM operations. You can’t:

Change hardware version.

Expand disks or modify RDM mappings.

Add or remove virtual disks.

Convert the VM to a template (in some cases).

These are blocked to maintain snapshot integrity.

🧮 9. Uptime, Reachability & Performance

Uptime Reset: If memory was saved, uptime reverts to snapshot time.

Reachability: Minor drop during revert; VM becomes accessible soon after.

Performance: Expect short-lived I/O spikes post-revert.

Duration: Snapshot and revert times scale with VM disk size and snapshot depth.

⚡ 10. Speeding Up Snapshots and Reverts

By default, VMware snapshots all attached disks, slowing large VMs.

✅ Pro Tip

If a VM has large, static disks (e.g., archives or NFS mounts):

Temporarily detach those before taking or reverting a snapshot.

Only attached disks are processed, reducing time drastically.

⚠️ Caution

Never detach disks with OS mounts or active apps.

Always reattach using the same SCSI IDs after the operation.

🧩 11. Managing Snapshots in vSphere

A. Take a Snapshot

Right-click VM → Snapshots → Take Snapshot
Name it (e.g., PrePatch_2025-10-15).
Check:

✅ Snapshot the VM’s memory

✅ Quiesce guest file system

Click OK

B. View Snapshots

Right-click VM → Snapshots → Manage Snapshots

C. Revert

Select snapshot → Revert to Snapshot
Wait for the OS to settle before use.

🧹 12. Best Practices

Avoid keeping snapshots >72 hours.
Consolidate or delete snapshots regularly.
Monitor datastore space — deltas grow fast.
Verify app health after revert.
Never use snapshots as backups.

⚠️ 13. When Snapshots Grow Too Large — Cluster-Wide Impact

What Happens When They Accumulate

Each snapshot creates a delta that grows with every write.

VMware must traverse all deltas to read a block — adding latency.

Long snapshot chains cause severe disk I/O degradation.

VM-Level Impact

Slower I/O and degraded performance.

Long consolidation (merge) times.

Backup jobs slow down or fail.

Datastore fill-up can pause or crash VMs.

Cluster-Level Impact

Datastore Pressure: Deltas consume vast space.

vMotion Failures: Large chains increase transfer time.

I/O Spikes: Snapshot consolidations trigger datastore storms.

vSAN Issues: More objects and resync operations, slowing cluster balance.

Prevention

Automate snapshot cleanup with vCenter alarms or scripts.

Monitor datastore usage.

Keep chain depth ≤ 2–3.

Schedule consolidations during off-hours.

Use backup tools that auto-remove snapshots.

In Short

Large snapshots are silent datastore killers.
The more deltas you keep, the slower your VMs — and the riskier your cluster.
Consolidate early, consolidate often.

🧾 14. Summary

Area Key Point

Snapshot Role Point-in-time rollback for quick recovery or testing
Delta Files Hold all post-snapshot changes
Revert Restores disk/memory state without reboot
OS Impact May pause briefly as background I/O completes
Performance Tip Detach static disks for faster snapshot ops
Cluster Risk Large deltas impact datastore and vMotion
Best Practice Keep snapshots short-lived and managed

✍️ In Short

VMware snapshots are like time machines — powerful but costly.
Every revert, merge, and delta read adds I/O overhead.
Use them wisely, monitor size, and let the OS stabilize before declaring success.

Crash Dumps in Linux Kernel & Application Deep Dive

mohideen sahib — Wed, 15 Oct 2025 03:41:08 +0000

Crash Dumps in Linux: Kernel & Application Deep Dive

Crash dumps are essential for diagnosing system-level and application-level failures. They capture memory and execution state at the time of a crash, helping engineers identify root causes and prevent recurrence.

In Linux, there are two main types of crash dumps:

Kernel Crash Dump (kdump) – triggered when the kernel itself crashes.
Application Core Dump (coredump) – triggered when a process crashes.

1️⃣ Kernel Crash Dump (kdump)

When the Linux kernel crashes, it may leave the system unstable. kdump provides a safe way to capture a memory snapshot (vmcore) for post-mortem analysis.

How kdump Works

Crashkernel Reservation

At boot, a portion of RAM is reserved for the crash kernel via the GRUB kernel parameter:

crashkernel=512M

This memory is isolated from the main kernel, ensuring a stable environment to capture the dump.

Kernel Panic Handling

When the main kernel encounters a panic or fatal exception, the panic handler executes.

The panic handler invokes kexec, which jumps to the preloaded crash kernel in reserved memory.

Crash Kernel Boot

The crash kernel boots without BIOS/UEFI initialization or full hardware reinitialization.

Minimal drivers and services are loaded to safely capture memory.

Dump Collection

The crash kernel reads memory from the crashed main kernel and saves it as vmcore.

Storage options: local disk, NFS, or remote crash dump server.

Crashkernel Size Recommendations

Must be large enough to store the kernel memory, but not excessively reduce main system RAM.

Typical sizing rules:

RAM Size Crashkernel Size

< 2 GB 128–256 MB
2–8 GB 256–512 MB
8–64 GB 512–1024 MB

64 GB 1–2 GB

Rationale: The dump size depends on used kernel memory + active processes. Too small → dump fails; too large → reduces usable RAM.

Configuring kdump

Install kdump tools:

yum install kexec-tools # RHEL/CentOS
apt install kdump-tools # Debian/Ubuntu

Enable and start the service:

systemctl enable kdump
systemctl start kdump

Configure dump storage (/etc/kdump.conf):

Local storage

path /var/crash

Remote NFS

net nfsserver:/kdump

Optional: Reduce dump size:

core_collector makedumpfile -c --message-level 1

Remote NFS Dumps & Cleanup

Requirements:

Network interface must be up in the crash kernel.

NFS server must be reachable during crash kernel execution.

Cleanup strategies:

Remove dumps older than 30 days

find /mnt/kdump/ -type f -mtime +30 -exec rm -f {} \;

Limit total size

du -sh /mnt/kdump/

Automate via cron or systemd timers on the NFS server.

Testing & Analysis

Manual trigger:

echo c > /proc/sysrq-trigger

Verify dump:

ls -lh /var/crash

Analyze using crash:

crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/.../vmcore

Important checks:

Kernel panic messages

Last running processes

Memory corruption / Oops logs

Device driver states

2️⃣ Application Core Dump (coredump)

When an application crashes, Linux can capture a memory snapshot for debugging.

Triggering Core Dumps

Automatic: Segmentation fault, abort, unhandled exception.

Manual: Sending a signal:

kill -ABRT
kill -SIGSEGV

The process may be temporarily unserviceable while writing the dump.

Systemd-Based Core Dumps

Handled by systemd-coredump.

Dependencies:

systemd-coredump.service

systemd-journald (logging)

ulimit -c or LimitCORE in the unit file:

[Service]
LimitCORE=infinity

Not all units generate dumps. Restrictive unit options may block core dumps:

NoNewPrivileges=yes

PrivateTmp=yes

ProtectSystem=full/strict

ProtectHome=yes

ReadOnlyPaths / InaccessiblePaths

LimitCORE=0

Core Dump Configuration (/etc/systemd/coredump.conf)

[Coredump]
Storage=external # Disk storage
Compress=yes # Compress dumps
ProcessSizeMax=2G # Max size per dump
ExternalSizeMax=10G # Max total storage for all dumps
KeepFree=500M # Minimum free disk space

How cleanup happens:

systemd-coredump calculates current dump storage usage.
If adding a new dump exceeds ExternalSizeMax or violates KeepFree, oldest dumps are deleted.
New dump is written only after usage is within limits.

No cron jobs required — cleanup is dynamic during dump creation.

Enabling Core Dumps for Your Service

Set LimitCORE:

[Service]
LimitCORE=infinity

Ensure writable storage: /var/lib/systemd/coredump or external disk.
Avoid restrictive options like NoNewPrivileges or PrivateTmp.
For user services: Set ulimit -c unlimited in the shell or service environment.

Reviewing Core Dumps

List dumps:

coredumpctl list

Debug with GDB:

coredumpctl gdb

Key points:

Stack traces

Faulting instruction

Thread states

Memory allocations

Linked libraries

✅ Key Takeaways

Kernel dump: For system crashes; uses crashkernel + kexec.

Crashkernel sizing: Based on RAM usage; too small → dump fails.

Remote storage: Requires cleanup and monitoring.

Core dump: For processes; retention via ExternalSizeMax and KeepFree.

Only units without restrictive options generate dumps.

Core dumps can be triggered manually with signals; cleanup still applies.

Mastering LVM: From Basics to Advanced Migration, Backup & Recovery

mohideen sahib — Tue, 14 Oct 2025 23:50:13 +0000

Linux LVM (Logical Volume Manager) transforms static partitions into a flexible, portable, and recoverable storage layer. Beyond simple resizing, LVM enables migrations, RAID mirroring, disaster recovery, and SAN integrations (like NetApp).

This post takes you from fundamentals to deep operational concepts — including vgexport, vgimport, vgchange, vgrename, metadata recovery, RAID, and safe PV resizing practices.

🧩 1. LVM Building Blocks

Component	Purpose
PV (Physical Volume)	Disk or partition initialized for LVM
VG (Volume Group)	Pool combining PVs into one logical space
LV (Logical Volume)	Virtual partition carved from VG

pvcreate /dev/sdb
vgcreate vg_data /dev/sdb
lvcreate -n lv_app -L 50G vg_data
mkfs.ext4 /dev/vg_data/lv_app
mount /dev/vg_data/lv_app /mnt/app

⚙️ 2. Extending, Resizing & Removing Storage

🧱 Two Ways to Grow Storage — and Why One is Riskier

You can expand LVM capacity by either resizing a disk (existing PV) or adding a new disk (new PV) to your VG.

Scenario 1: Extending an Existing PV (Riskier)

If your underlying LUN or disk was expanded (say from 100 GB → 200 GB):

Rescan the device:

   echo 1 > /sys/class/block/sdb/device/rescan
   fdisk -l /dev/sdb

We may need to use resizepart if the lvm pv we want to increase is a partition and not a seperate disk

Start parted and print partition table:parted /dev/sdb
(parted) print
This shows all partitions and their numbers.
Resize the partition:(parted) resizepart 2 100%
Replace 2 with your partition number (e.g., /dev/sdb2).100% tells parted to extend the partition to use all available free space at the end of the disk.This operation is safe and does not delete or re-create the partition if there is unallocated space adjacent to the partition��.
Exit parted:(parted) quit

Resize the PV:

   pvresize /dev/sdb

Validate the VG:

   pvs
   vgdisplay vg_data

Now your VG reflects additional Free PE space.

⚠️ Risks:

Rescan failures or cached geometry can corrupt LVM metadata.
Multipath or clustered systems may see inconsistent disk layouts.
If expansion fails mid-process, recovery can be tricky.

✅ Before resizing, take an LVM metadata backup:

vgcfgbackup vg_data

If corruption happens:

vgcfgrestore vg_data

(Only restores structure, not the actual data.)

Scenario 2: Adding a New Disk (Safer)

Instead of resizing an existing PV, add a new disk:

pvcreate /dev/sdc
vgextend vg_data /dev/sdc

Then expand an LV:

lvextend -L +50G /dev/vg_data/lv_app
resize2fs /dev/vg_data/lv_app

✅ Recommended for SAN/NetApp/Production systems
✅ No dependency on device rescans or geometry changes
✅ Easy rollback

Method	Description	Risk	Use Case
`pvresize`	Reclaims resized disk space	⚠️ High	Virtual/Dev Environments
`vgextend`	Adds new PV to VG	✅ Low	SAN/Physical Servers

🟢 pvmove — Safely Move Data Between Disks

pvmove allows you to migrate data from one physical volume (PV) to another within the same volume group (VG). It’s essential when replacing disks or redistributing space.

Example:

vgextend vg_data /dev/sdd1 # Add a new PV to VG
pvmove /dev/sdb1 /dev/sdd1 # Move data off old PV

⚠️ Failure Scenarios & Safety

Not enough free space in VG
pvmove requires free extents on another PV or newly added disk.
If insufficient space: operation fails cleanly:
No extents available for allocation
Interrupted move (system crash or power loss)
Temporary metadata tracks progress.
You can safely resume or abort:
pvmove --continue /dev/sdb1
pvmove --abort /dev/sdb1
Incorrect target VG
pvmove works within a single VG only.
Moving across VGs requires vgextend + vgreduce, which is more complex.

Key Points:

Non-destructive: Data is copied and verified before updating metadata.

Requires enough free space in the target PV or VG.

Can pause, resume, or abort using:

pvmove --abort /dev/sdb1
pvmove --continue /dev/sdb1

After migration, old PVs can be safely removed with vgreduce.

vgreduce

Purpose: Remove a PV from a VG.

Behavior:

Non-destructive if the PV is empty (no logical volumes or extents allocated). It just updates the VG metadata to forget the PV.

Destructive if the PV still contains data — LVM will refuse to remove it, but if you force it (with --force), you can destroy data.

Example (safe usage):

vgreduce vg_data /dev/sdb1

Only works if /dev/sdb1 has no allocated extents (moved away via pvmove).

Example (unsafe usage):

vgreduce --force vg_data /dev/sdb1

Forces removal even if data exists — can destroy all data on that PV.

✅ Rule of Thumb:

Always check PV usage first:

pvs -o+pv_used
lvs -a -o+devices

Only remove PVs that are completely free.
If data exists, first use pvmove to migrate it, then run vgreduce.

🟢 lvresize — Expand or Reduce Logical Volumes

lvresize changes the size of a logical volume (LV). It works both ways: increasing or decreasing the LV size.

Increase LV Size Example:

lvresize -L +20G /dev/vg_data/lv_home

Then resize filesystem (XFS example)

xfs_growfs /dev/vg_data/lv_home

Reduce LV Size Example (Caution!):

lvresize -L 50G /dev/vg_data/lv_home # Resize LV to total 50G
resize2fs /dev/vg_data/lv_home # Shrink filesystem first (ext4)

Important Notes When Reducing:

Always shrink the filesystem first; failing to do so will corrupt data.

Ensure the LV contains enough free space for reduction; check usage with df or lvs.

Consider a backup before reducing — it’s destructive if done incorrectly.

🚀 3. Advanced LVM Operations

🔹 `vgexport` & `vgimport`

Used to migrate or clone VGs between systems without copying data.

vgexport

vgexport vg_data

Marks VG as “exported” (hidden from local LVM)
Does not delete data
Safe before unmapping or SAN snapshot

vgimport

vgimport vg_data
vgchange -ay vg_data

Reads PV headers
Re-registers VG
Clears export flag

Check usage:

pvs

PV         VG        Fmt  Attr PSize   PFree
/dev/sdb   vg_data   lvm2 x--  100.00g  0
/dev/sdc   vg_data   lvm2 a--  200.00g 50.00g

(x → exported, a → active)

🔹 `vgrename`

Used to rename a VG (especially useful after importing a clone).

vgrename vg_data vg_data_clone
vgscan --cache
lvs

Avoids duplicate VG name conflicts during SAN clone imports.

🔹 `vgchange`

Activate or deactivate a VG:

vgchange -ay vg_data   # Activate
vgchange -an vg_data   # Deactivate

Commonly used after imports or before unmounting for maintenance.

🧱 4. LVM Metadata Backup & Restore

LVM automatically stores metadata backups under /etc/lvm/backup/.

Manual backup:

vgcfgbackup vg_data

Restore only structure, not data:

vgcfgrestore vg_data

🧩 Use Cases

VG corruption
Accidental LV deletion
Disk failure recovery

Remember:

🧠 Metadata backups restore the layout, not user data.
You’ll still need file-level recovery for lost contents.

💾 5. LVM RAID & Mirroring (Modern Way)

LVM supports software RAID natively with --type raidX.
Avoid old -m mirror syntax except for legacy systems.

RAID Type	Command Example	Description
RAID 0	`lvcreate -L 200G --type raid0 -i2 -n lv_raid0 vg_data /dev/sdb /dev/sdc`	Striping only
RAID 1	`lvcreate -L 100G --type raid1 -m1 -n lv_raid1 vg_data /dev/sdb /dev/sdc`	Mirroring
RAID 5	`lvcreate -L 500G --type raid5 -i3 -n lv_raid5 vg_data /dev/sdb /dev/sdc /dev/sdd`	Striping + parity
RAID 6	`lvcreate -L 600G --type raid6 -i4 -n lv_raid6 vg_data /dev/sdb /dev/sdc /dev/sdd /dev/sde`	Double parity
RAID 10	`lvcreate -L 400G --type raid10 -i2 -n lv_raid10 vg_data /dev/sdb /dev/sdc /dev/sdd /dev/sde`	Mirrored stripes

View RAID info:

lvs -a -o +devices,raid_sync_action

Convert an existing LV:

lvconvert --type raid1 -m1 vg_data/lv_app /dev/sdc

Repair or replace failed disk:

pvmove /dev/sdb /dev/sdf
vgreduce vg_data /dev/sdb
lvconvert --repair vg_data/lv_data

🧠 RAID Parity & Mirroring Notes:

--type raid1 → Mirrors data across devices
--type raid5/6/10 → Adds parity redundancy
Modern kernels auto-sync during rebuilds
Always prefer hardware RAID (NetApp, etc.) in enterprise setups

🧭 6. Using LVM with NetApp Snapshots

If your backend storage is NetApp, and you share a snapshot clone as a new LUN, you can:

Export the VG from the source:

   vgexport vg_data

Map the snapshot clone to a new host.
On the new host:

   vgimport vg_data
   vgrename vg_data vg_data_clone
   vgchange -ay vg_data_clone

✅ Safely mount and test the clone.
No data copy. No downtime.

🧮 7. Migration Without Rsync

When moving volumes between servers:

vgexport vg_data
# Move or reattach disks/SAN
vgimport vg_data
vgchange -ay vg_data

Optionally rename VG:

vgrename vg_data vg_clone

✅ No rsync required — data stays on the same blocks.
✅ Ideal for SAN or virtualized migrations.

🧩 8. Quick Reference

Task	Command	Purpose
Extend VG	`vgextend vg_data /dev/sdc`	Safely add new disk
Extend PV	`pvresize /dev/sdb`	Use resized disk space
Backup Metadata	`vgcfgbackup vg_data`	Save structure info
Restore Metadata	`vgcfgrestore vg_data`	Restore LVM layout
Export/Import	`vgexport / vgimport`	Migrate VG across systems
Rename VG	`vgrename vg_data vg_clone`	Avoid name conflicts
Create RAID	`lvcreate --type raid1	5	6	10 ...`	Software RAID
Repair RAID	`lvconvert --repair`	Fix degraded array

🧠 Final Thoughts

Always prefer adding new PVs over resizing existing ones.
pvresize is convenient but riskier for production SANs.
vgexport / vgimport make migrations and SAN snapshot reuses instant.
LVM metadata backups restore structure only, not content.
Modern LVM RAID offers software redundancy, but hardware RAID or NetApp mirrors are better for critical workloads.

LVM remains one of the most powerful abstractions in Linux storage, bridging raw disks, SANs, and enterprise reliability into one logical framework.

Forem: mohideen sahib

PORT VS SOCKET

Inside the AWS US-East-1 Outage: Why DNS Failure Triggered a Global Cloud Crisis

What Really Happened in the AWS US-East-1 Outage and Why It Was So Bad: An Initial Writeup Based on AWS Communications

Why AWS Outage Became a Doomsday Event Unlike Typical On-Prem DNS Failures

Rough Dependency Mapping of Key Affected AWS Services and Their DNS Endpoint Dependencies

What Customers Did During the Outage — Help or Hurt?

Did Login Failures Occur Across Regions? Disaster Recovery State?

Official AWS Root Cause Summary (Public)

Final Thoughts: DNS is an Unavoidable Fundamental Risk—not an AWS Fault

AWSOutage #DNSFailure #IAM #DynamoDB #CloudResilience #MultiRegion #DisasterRecovery #DevOps #SRE #Infrastructure

[Boost]

🗄️DB Performance 101: A Practical Deep Dive into Backend Database Optimization⚡

Arijit Ghosh ・ Oct 16

Why S3, NFS, and EFS Are Not Block Storage

☁️ Myth vs Fact: Why S3, NFS, and EFS Are Not Block Storage

🧱 1️⃣ Block Storage — The Raw Disk

📂 2️⃣ File Storage — The Shared Filesystem Layer

🪣 3️⃣ Object Storage — The API Level

🔍 The Real Difference Is How You Access Data

⚡ Myth vs Fact

🧠 TL;DR

💬 Bonus Thought

VMware Snapshots Explained: Internals, Pitfalls, and Deep Dive into Base + Delta Mechanics

Crash Dumps in Linux Kernel & Application Deep Dive

Local storage

Remote NFS

Remove dumps older than 30 days

Limit total size

Mastering LVM: From Basics to Advanced Migration, Backup & Recovery

🧩 1. LVM Building Blocks

⚙️ 2. Extending, Resizing & Removing Storage

🧱 Two Ways to Grow Storage — and Why One is Riskier

Scenario 1: Extending an Existing PV (Riskier)

Scenario 2: Adding a New Disk (Safer)

Then resize filesystem (XFS example)

🚀 3. Advanced LVM Operations

🔹 vgexport & vgimport

vgexport

vgimport

🔹 vgrename

🔹 vgchange

🧱 4. LVM Metadata Backup & Restore

💾 5. LVM RAID & Mirroring (Modern Way)

🧭 6. Using LVM with NetApp Snapshots

🧮 7. Migration Without Rsync

🧩 8. Quick Reference

🧠 Final Thoughts

🔹 `vgexport` & `vgimport`

🔹 `vgrename`

🔹 `vgchange`