Forem: urgensherpa

slow boot disk initialization on linux

urgensherpa — Tue, 05 Aug 2025 03:28:44 +0000

Scenario: the boot disk is slow SAN multipath disk where the root is installed, due to this the boot process fails

The Boot Process and The "Wait"

Kernel and initramfs Start: After GRUB, the Linux kernel is loaded into memory along with the initramfs image. The kernel then unpacks the initramfs and executes the /init
script within it.
Parsing Kernel Parameters: The /init script (the main script you see in the file list) begins by parsing the kernel command line (/proc/cmdline). It looks for several parameters,
but the most important ones for this scenario are:
- root=...: This specifies the root device. For your SAN disk, this would likely be a UUID (e.g., root=UUID=...) or a device mapper path (e.g., root=/dev/mapper/mpath-a).
- rootdelay=...: This is a crucial parameter. The init script explicitly parses this value and stores it in a shell variable ROOTDELAY.
The Wait for the Root Device: The core logic for mounting a local filesystem is in /scripts/local. This script is called by the main /init script. The critical part of this
process is the call to the wait-for-root program.
- The source code for this is in src/wait-for-root.c. It's a small C program that takes two arguments: the DEVICE path and a TIMEOUT.
- The init script will launch wait-for-root $ROOT $ROOTDELAY.
  1. The Race Condition: Your 50-Second Delay

This is where your scenario becomes interesting. The wait-for-root program does not just sleep; it actively listens for udev events. udev is the kernel's device manager, and it creates
device nodes (like /dev/sda, /dev/disk/by-uuid/...) as hardware is detected.

The Goal: wait-for-root is waiting for a udev event that announces the creation of the block device matching the root= parameter.
The Timeout: The program sets an alarm() for the number of seconds specified by ROOTDELAY. The default value for ROOTDELAY if not specified on the kernel command line is 30 seconds.

Here’s what happens:

Scenario A: rootdelay is too short (e.g., default 30s)
1. The init script calls wait-for-root /dev/disk/by-uuid/... 30.
2. Your SAN multipath disk is still initializing and hasn't been detected by the kernel yet. No udev events for it have been sent.
3. After 30 seconds, the alarm() in wait-for-root goes off. The program exits with an error code.
4. The init script sees the failure and cannot mount the root filesystem.
5. Outcome: The boot process halts and drops you into the initramfs emergency shell (usually busybox). You will see a message like "Gave up waiting for root file system device."
Scenario B: rootdelay is long enough (e.g., rootdelay=60)
1. The init script calls wait-for-root /dev/disk/by-uuid/... 60.
2. The script waits.
3. At around the 50-second mark, the multipath driver finishes its setup, and the kernel recognizes the final SAN device. udev creates the device node and its symlinks.
4. wait-for-root receives the udev event, sees that it matches the device it's looking for, prints the filesystem type, and exits successfully.
5. Outcome: The init script proceeds to mount the device, and the boot continues normally. You will experience a ~50 second pause during the early boot phase before the screen changes or Plymouth (the boot splash) appears.

Solution

In this scenario, the boot process will fail if the default rootdelay is used. The system will drop to an initramfs prompt because it gives up waiting before the SAN disk is ready.

To fix this, you must add or edit the rootdelay parameter on the kernel command line in your bootloader (GRUB).

Edit /etc/default/grub.
Find the GRUB_CMDLINE_LINUX_DEFAULT or GRUB_CMDLINE_LINUX line.
Add rootdelay=60 (or a higher value like 90 for safety) to the string. For example: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash rootdelay=60"
Run sudo update-grub to apply the changes.
Reboot.

Method 2:
Create a script that manually scans for new LUNs or waits for a specific device file to appear before allowing the boot to continue.

Example Scenario: A script that waits for your specific multipath device to appear.

Create a hook script file:
sudo touch /etc/initramfs-tools/hooks/wait_for_my_san
sudo chmod +x /etc/initramfs-tools/hooks/wait_for_my_san
Edit the script:
The hook needs to copy a script into the initramfs that will run during boot.

File: /etc/initramfs-tools/hooks/wait_for_my_san

    1     #!/bin/sh
    2     PREREQ=""
    3     prereqs()
    4     {
    5         echo "$PREREQ"
    6     }
    7 
    8     case $1 in
    9     prereqs)
   10         prereqs
   11         exit 0
   12         ;;
   13     esac
   14 
   15     . /usr/share/initramfs-tools/hook-functions
   16     # Copy a script into the initramfs to be run at boot time
   17     copy_exec /usr/local/sbin/wait-for-san-device /scripts/local-premount/

Create the boot-time script:
sudo touch /usr/local/sbin/wait-for-san-device
sudo chmod +x /usr/local/sbin/wait-for-san-device
Edit the boot-time script:
This script will contain the logic that polls for the device.

File: /usr/local/sbin/wait-for-san-device


    1     #!/bin/sh
    2     # This script runs in the initramfs just before the root device is mounted.
    3 
    4     # The expected multipath device for our root filesystem
    5     # Replace with your actual device name from /dev/mapper/
    6     ROOT_MPATH_DEVICE="/dev/mapper/mpath_root"
    7 
    8     echo "Waiting for SAN device ${ROOT_MPATH_DEVICE} to appear..."
    9 
   10     # Wait for up to 90 seconds
   11     for i in $(seq 1 90); do
   12         if [ -b "${ROOT_MPATH_DEVICE}" ]; then
   13             echo "Found ${ROOT_MPATH_DEVICE}."
   14             exit 0
   15         fi
   16         sleep 1
   17     done
   18 
   19     echo "Gave up waiting for ${ROOT_MPATH_DEVICE}."
   20     # Returning a non-zero exit code might drop you to a shell
   21     exit 1

Rebuild the initramfs: sudo update-initramfs -u

Method 3

Create initramfs local premount script
e.g /etc/initramfs-tools/scripts/local-premount/scriptName

#!/bin/sh

PREREQ=""

prereqs()
{
  echo "$PREREQ"
}

case $1 in
prereqs)

  prereqs

exit 0

;;

esac

echo "Sleeping 60 seconds for san to be initialized"

/usr/bin/sleep 60

exit 0

Now update-initramfs -u

Direct Copy (/etc/initramfs-tools/scripts/):
* Use Case: Best for adding simple, self-contained shell scripts that do not require any extra binaries or libraries beyond what's already included in the initramfs by default (like sleep, echo, mount, etc.).
* Advantage: Very simple and quick. No need to write a separate hook file.
* Disadvantage: It's a "dumb" copy. If your script needs a specific binary (e.g., multipath, lsscsi), this method will not automatically find and add that binary or its library dependencies. The script would fail during boot.

Hooks (/etc/initramfs-tools/hooks/):
- Use Case: The standard, most robust method. It's necessary whenever your script has dependencies.
- Advantage: Gives you full programmatic control. You can use helper functions like copy_exec to intelligently copy a binary and all of its required libraries into the initramfs. This is essential for any non-trivial task.
- Disadvantage: Requires creating a separate hook file, which is slightly more work.

Part 1 OpenTelemetry

urgensherpa — Sun, 16 Feb 2025 15:55:35 +0000

Opentelemetry: An Observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.

I have intermediate programming experience in languages like Python and Go. I've worked on a couple of projects where the application generated logs to flat files or sent events to a remote syslog over TCP/UDP. Initially, I wondered why OpenTelemetry was necessary. However, it seems to offer much more than traditional logging for observability. One feature I find particularly helpful is its distributed tracing. In microservices, a single request often spans multiple services, and traditional logging alone struggles to track the flow of a request across these services—though it is possible with some customization.

With OpenTelemetry Traces, it seem to provide a way to track the entire lifecycle of a request, including which services it touched, how long each step took, and where errors occurred(can help to find hotspots in execution flow).

With Trace Context Propagation: OT automatically propagates trace context (e.g., trace IDs) across service boundaries, making it easy to correlate logs and traces.

I haven't used OpenTelemetry yet, but I'm just getting started today and I'm excited to explore its features.

This is especially useful for debugging performance issues or errors in distributed systems.

Visualizing repository , service layers and interfaces

urgensherpa — Thu, 02 May 2024 14:06:35 +0000

Image

go module

urgensherpa — Sun, 04 Feb 2024 17:27:36 +0000

Go programs are organized into packages. A package is a directory of Go code that's all compiled together. Functions, types, variables, and constants defined in one source file are visible to all other source files within the same package (directory).

A repository contains one or more modules. A module is a collection of Go packages that are released together.

A GO REPOSITORY TYPICALLY CONTAINS ONLY ONE MODULE, LOCATED AT THE ROOT OF THE REPOSITORY.
A file named go.mod at the root of a project declares the module. It contains:

The module path
The version of the Go language your project requires
Optionally, any external package dependencies your project has
The module path is just the import path prefix for all packages within the module. Here's an example of a go.mod file:

module github.com/user123/exampleproject

go 1.20

require github.com/google/examplepackage v1.3.0

Each module's path not only serves as an import path prefix for the packages within but also indicates where the go command should look to download it. For example, to download the module golang.org/x/tools, the go command would consult the repository located at https://golang.org/x/tools.

An "import path" is a string used to import a package. A package's import path is its module path joined with its subdirectory within the module. For example, the module `github.com/google/go-cmp` contains a package in the directory cmp/. That package's import path is github.com/google/go-cmp/cmp.

Packages in the standard library do not have a module path prefix.

DO I NEED TO PUT MY PACKAGE ON GITHUB?
You don't need to publish your code to a remote repository before you can build it. A module can be defined locally without belonging to a repository. However, it's a good habit to keep a copy of all your projects on a remote server, like GitHub.

The $GOPATH environment variable will be set by default somewhere on your machine (typically in the home directory, ~/go). Since we will be working in the new "Go modules" setup, you don't need to worry about that. If you read something online about setting up your GOPATH, that documentation is probably out of date.

These days you should avoid working in the $GOPATH/src directory. Again, that's the old way of doing things and can cause unexpected issues, so better to just avoid it.

GET INTO YOUR WORKSPACE
Navigate to a location on your machine where you want to store some code. For example, I store all my code in ~/workspace, then organize it into subfolders based on the remote location. For example,

~/workspace/github.com/Stebalien/go-address-validator = https://github.com/Stebalien/go-address-validator

That said, you can put your code wherever you want.

FIRST LOCAL PROGRAM
Create a new directory and enter it:

mkdir hellogo
cd hellogo

Inside the directory declare your module's name:

go mod init {REMOTE}/{USERNAME}/hellogo

Where {REMOTE} is your preferred remote source provider (i.e. github.com) and {USERNAME} is your Git username. If you don't use a remote provider yet, just use example.com/username/hellogo

Print your go.mod file:

cat go.mod

Why does Go include a remote URL in module paths?

to simplify remote download of pkgs

Implementing CORS with go-chi

urgensherpa — Wed, 10 Jan 2024 18:59:39 +0000

All web browsers implement a security model known as the Same-Origin Policy (SOP). It restricts domains from accessing and retrieving data from other domains’ resources; this helps protect users from malicious scripts that could access their sensitive data or perform unauthorized actions on their behalf. This led to creation of Cross-Origin Resource Sharing (CORS) is an HTTP-header based mechanism that allows a server to indicate any origins (domain, scheme, or port) other than its own from which a browser should permit loading resources.

corsHandler := cors.Handler(cors.Options{
        AllowedOrigins:   []string{"https://site1.com"},
        AllowedMethods:   []string{"GET", "POST", "PUT", "DELETE", "OPTIONS"},
        AllowedHeaders:   []string{"Accept", "Authorization", "Content-Type", "X-CSRF-Token", "Access-Control-Allow-Origin"},
        ExposedHeaders:   []string{"Link"},
        AllowCredentials: false,
        MaxAge:           300, // Maximum value not ignored by any of the major browsers
    })

The AllowedOrigins field in the corsHandler configuration in main.go specifies which origins are allowed to access the server's resources. In this case, only requests from https://site1.com are allowed.

This is a security measure known as Cross-Origin Resource Sharing (CORS). It prevents web pages from making requests to a different domain than the one the web page came from, unless the server specifies that it allows such requests.

If a request comes from an origin not listed in AllowedOrigins, the server will respond with a CORS error and the browser will block the request. This helps protect your server from potentially malicious requests from unknown origins.

Detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm

urgensherpa — Mon, 02 Oct 2023 04:48:09 +0000

Recently while deploying 3 node kubernetes cluster( v 1.28.2-1.1) I came across this issue in

kubeadm init stage
W1001 10:56:31.090889 28295 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.

After digging in further, I found that the sandbox image used by containerd runtime is older than what is expected by kubelet.

containerd config dump

apart from this i also saw the cgroup type inconsistency (it should have been systemd)
Here is what I have in /etc/containerd/config.toml

version = 2

root = "/var/lib/containerd"
state = "/run/containerd"
oom_score = 0
imports = ["/etc/containerd/runtime_*.toml", "./debug.toml"]

[grpc]
  address = "/run/containerd/containerd.sock"
  uid = 0
  gid = 0

[debug]
  address = "/run/containerd/debug.sock"
  uid = 0
  gid = 0
  level = "info"

[metrics]
  address = ""
  grpc_histogram = false

[cgroup]
  path = ""

[plugins]
  [plugins."io.containerd.monitor.v1.cgroups"]
    no_prometheus = false
  [plugins."io.containerd.service.v1.diff-service"]
    default = ["walking"]
  [plugins."io.containerd.gc.v1.scheduler"]
    pause_threshold = 0.02
    deletion_threshold = 0
    mutation_threshold = 100
    schedule_delay = 0
    startup_delay = "100ms"
  [plugins."io.containerd.runtime.v2.task"]
    platforms = ["linux/amd64"]
    sched_core = true
  [plugins."io.containerd.service.v1.tasks-service"]
    blockio_config_file = ""
    rdt_config_file = ""
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.k8s.io/pause:3.9"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true

the issue in kubeadm init should be resolved (more problems encountered ...more below)

In Linux, a process namespace is a feature that provides isolation and separation of processes, particularly in the context of containers and virtualization. Namespaces allow different processes to have their own view of system resources, such as process IDs, network interfaces, file systems, and more. This isolation is essential for creating containers, virtual machines, and other forms of process separation.
redhat.com/
the pause container image used to initialize and maintain the container namespaces for other containers within a pod. This "pause container" is a minimalistic, lightweight container that does nothing but sleep indefinitely. It acts as template where the user containers are spawned from ,it holds the namespaces even when the other containers crash/dies.

Pod Initialization: When a pod is created, Kubernetes initializes the pod's namespaces (e.g., PID, network, IPC) by creating the sandbox pause container within those namespaces. The pause container is typically a minimalistic container that sleeps indefinitely and does not perform any active work.

Pod Containers: The other containers defined in the pod specification (e.g., application containers) are then created within the same namespaces as the pause container. These containers share the same namespaces, which allows them to interact with each other as if they were running on the same host.

Init and Signal Handling: The pause container acts as an "init" process for the pod's namespaces. It handles signals sent to the pod, such as SIGTERM. When a signal is sent to the pod (e.g., when the pod is terminated), the pause container receives the signal and ensures that it is propagated to the other containers within the pod. This allows the other containers to perform graceful shutdown and cleanup.

Cleanup and Termination: Once the other containers have completed their tasks or terminated, the pause container remains running. If the pod is deleted or terminated, the pause container is responsible for shutting down any remaining containers within the pod's namespaces.

The flannel pods were still in crashloopback state , checking logs further i saw

W1003 17:32:40.526877       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1003 17:32:40.544578       1 kube.go:145] Waiting 10m0s for node controller to sync
I1003 17:32:40.544610       1 kube.go:490] Starting kube subnet manager
I1003 17:32:41.544855       1 kube.go:152] Node controller sync successful
I1003 17:32:41.544918       1 main.go:232] Created subnet manager: Kubernetes Subnet Manager - ubuntu1
I1003 17:32:41.544929       1 main.go:235] Installing signal handlers
I1003 17:32:41.545237       1 main.go:543] Found network config - Backend type: vxlan
I1003 17:32:41.545279       1 match.go:206] Determining IP address of default interface
I1003 17:32:41.545967       1 match.go:259] Using interface with name enp3s0 and address 192.168.1.111
I1003 17:32:41.545997       1 match.go:281] Defaulting external address to interface address (192.168.1.111)
I1003 17:32:41.546148       1 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E1003 17:32:41.546522       1 main.go:335] Error registering network: failed to acquire lease: node "ubuntu1" pod cidr not assigned
W1003 17:32:41.546809       1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:491: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I1003 17:32:41.546831       1 main.go:523] Stopping shutdownHandler...

this issue was resolved by creating the file /run/flannel/subnet.env on all nodes.

cat <<EOF | tee /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.0/16
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
EOF

Run patch on master node which applies to worker nodes as well:-

kubectl patch node masternode1 -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'

kubectl patch node workernode1 -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'

kubectl patch node workernode2 -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'

On master node:
delete existing flannel if any kubectl delete -f kube-flannel.yml and run

kubectl apply -f kube-flannel.yml --kubeconfig /etc/kubernetes/admin.conf

if the nodes are not patched below error is observed on the problematic flannel node in crashloopback state

E1003 18:11:28.374694       1 main.go:335] Error registering network: failed to acquire lease: node "nodea" pod cidr not assigned
I1003 18:11:28.374936       1 main.go:523] Stopping shutdownHandler...
W1003 18:11:28.375084       1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:491: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

Reference:-
https://stackoverflow.com/questions/50833616/kube-flannel-cant-get-cidr-although-podcidr-available-on-node/58618952#58618952

Observation of sequential and asynchronous execution

urgensherpa — Fri, 07 Jul 2023 02:49:03 +0000

Goal: process 100K sqlite files where each file is approx 300MB (decode blob field and delete matching rows)
Machine specs: cores:64, mem:100G
The machine already has other critical services running hence the max_workers=15 is set. If it is not throttled the memory usage goes through the roof. It is approximately max_workers X size of a file opened. By default the max_workers = number of cores X 5

Concurrent/Async:
asyn_process.py

import sqlite3
import json
import concurrent.futures
import logging
import time

logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler('/tmp/mylog.log')
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

def rm_eventx_from_db(sqlitefilename,logger):
    try:
        conn = sqlite3.connect(sqlitefilename)
        cursor = conn.cursor()

        cursor.execute('SELECT ID,LOG FROM OLD_LOGS')
        idlist=[]

        for row in cursor.fetchall():
            colid = row[0]
            msg = row[1]
            m = msg.decode('utf-8')
            msgjson = json.loads(m)
            # print(msgjson['_normalized_fields']['event_id'])
            if msgjson['_normalized_fields']['event_id'] == 12345:
                idlist.append(colid)
        for delete_id in idlist:
            cursor.execute('DELETE FROM OLD_LOGS WHERE ID = ?', (delete_id,))

        conn.commit()

        cursor.close()
        conn.close()
        logger.warning(f"processing done for {sqlitefilename}")
    except Exception as e:
        logger.warning(f"rm_eventx_from_db err: {sqlitefilename} "+str(e))

def vaccumdb(sqlitefilename):
    try:
        conn = sqlite3.connect(sqlitefilename)
        cursor = conn.cursor()
        cursor.execute('VACUUM')
        cursor.close()
        conn.commit()
        conn.close()
    except Exception as e:
        logger.warning(f"vaccum_db err: {sqlitefilename} "+str(e))    

def main():
    start_time = time.perf_counter()
    futures=[]
    listfile = '/tmp/filelist.txt'
    base_path='/data/storage/archive/'

    with open(listfile, 'r') as file:
        with concurrent.futures.ThreadPoolExecutor(max_workers=15) as executor:
            for line in file:
                line = line.strip()
                file_path=base_path+str(line)
                print(file_path)
                futures.append(executor.submit(rm_eventx_from_db,file_path,logger))
        for future in concurrent.futures.as_completed(futures):
            logger.warning("futures msg : "+str(future.result()))      
    fut_vac=[]
    with open(listfile, 'r') as file:
        with concurrent.futures.ThreadPoolExecutor(max_workers=15) as executor:
            for line in file:
                line = line.strip()
                file_path=base_path+line
                fut_vac.append(executor.submit(vaccumdb,file_path))
    for future in concurrent.futures.as_completed(fut_vac):
        logger.warning("vaccum futures msg : "+str(future.result()))             

    end_time = time.perf_counter()
    execution_time = end_time - start_time
    print(f"Elapsed time: {execution_time:.6f} Seconds")

if __name__ == "__main__":
    main()

here is some top stats

# top -H -p 1545043

top - 15:10:49 up 233 days, 23:17,  1 user,  load average: 9.39, 11.37, 12.03
Threads:  16 total,   2 running,  14 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11.5 us, 11.4 sy,  0.4 ni, 74.9 id,  1.1 wa,  0.0 hi,  0.6 si,  0.0 st
MiB Mem : 100699.4 total,   3401.5 free,  83303.5 used,  13994.4 buff/cache
MiB Swap:   4096.0 total,     26.1 free,   4069.9 used.  16514.7 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                     
1545055 root      20   0 5464740   4.3g  15252 S  26.3   4.4   1:59.90 python async_process1.py                                                                                                                             
1545059 root      20   0 5464740   4.3g  15252 R  25.0   4.4   1:54.33 python async_process1.py                                                                                                                             
1545061 root      20   0 5464740   4.3g  15252 S  24.7   4.4   1:54.30 python async_process1.py                                                                                                                             
1545062 root      20   0 5464740   4.3g  15252 S  24.3   4.4   1:53.59 python async_process1.py                                                                                                                             
1545067 root      20   0 5464740   4.3g  15252 S  24.3   4.4   1:53.75 python async_process1.py                                                                                                                             
1545057 root      20   0 5464740   4.3g  15252 S  24.0   4.4   1:53.75 python async_process1.py                                                                                                                             
1545058 root      20   0 5464740   4.3g  15252 R  23.7   4.4   1:53.95 python async_process1.py                                                                                                                             
1545066 root      20   0 5464740   4.3g  15252 S  23.7   4.4   1:54.01 python async_process1.py                                                                                                                             
1545063 root      20   0 5464740   4.3g  15252 S  23.3   4.4   1:54.32 python async_process1.py                                                                                                                             
1545064 root      20   0 5464740   4.3g  15252 S  23.3   4.4   1:54.03 python async_process1.py                                                                                                                             
1545065 root      20   0 5464740   4.3g  15252 S  23.3   4.4   1:53.85 python async_process1.py                                                                                                                             
1545068 root      20   0 5464740   4.3g  15252 S  23.3   4.4   1:53.48 python async_process1.py                                                                                                                             
1545069 root      20   0 5464740   4.3g  15252 S  23.3   4.4   1:54.11 python async_process1.py                                                                                                                             
1545056 root      20   0 5464740   4.3g  15252 S  23.0   4.4   1:53.73 python async_process1.py                                                                                                                             
1545054 root      20   0 5464740   4.3g  15252 S  22.7   4.4   1:59.47 python async_process1.py                                                                                                                             
1545043 root      20   0 5464740   4.3g  15252 S   0.0   4.4   0:01.89 python async_process1.py

the total memory consumed by script is 4.3GB

After observing the log, is is observed that number of processed files per minute can vary from 2 to 15.

Below is a synchronous execution code
sync_process2.py

import sqlite3
import json
import concurrent.futures
import logging
import time

logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler('/tmp/mylog2.log')
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

def rm_eventx_from_db(sqlitefilename,logger):
    try:
        conn = sqlite3.connect(sqlitefilename)
        cursor = conn.cursor()

        cursor.execute('SELECT ID,LOG FROM OLD_LOGS')
        idlist=[]

        for row in cursor.fetchall():
            colid = row[0]
            msg = row[1]
            m = msg.decode('utf-8')
            msgjson = json.loads(m)
            # print(msgjson['_normalized_fields']['event_id'])
            if msgjson['_normalized_fields']['event_id'] == 36870:
                idlist.append(colid)
        for delete_id in idlist:
            cursor.execute('DELETE FROM OLD_LOGS WHERE ID = ?', (delete_id,))

        conn.commit()

        cursor.close()
        conn.close()
        logger.warning(f"processing done for {sqlitefilename}")
    except Exception as e:
        logger.warning(f"rm_eventx_from_db err: {sqlitefilename} "+str(e))

def vaccumdb(sqlitefilename):
    try:
        conn = sqlite3.connect(sqlitefilename)
        cursor = conn.cursor()
        cursor.execute('VACUUM')
        cursor.close()
        conn.commit()
        conn.close()
    except Exception as e:
        logger.warning(f"vaccum_db err: {sqlitefilename} "+str(e))    

def main():
    start_time = time.perf_counter()
    futures=[]
    listfile = '/tmp/filelist2.txt'
    base_path='/data/archives/lake/'

    with open(listfile, 'r') as file:
            for line in file:
                line = line.strip()
                file_path=base_path+str(line)
                print(file_path)
                rm_eventx_from_db(file_path,logger)
                vaccumdb(file_path)         

    end_time = time.perf_counter()
    execution_time = end_time - start_time
    print(f"Elapsed time: {execution_time:.6f} Seconds")

if __name__ == "__main__":
    main()

It is observed that 99% of time 3 files are being processed per minute.

Below is cpu + mem usage stat


top - 02:20:56 up 234 days, 10:27,  1 user,  load average: 95.08, 95.59, 95.43
Tasks: 1178 total,   2 running, 1176 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.8 us,  9.8 sy,  0.1 ni, 77.7 id,  1.3 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem : 100699.4 total,    637.1 free,  80412.8 used,  19649.5 buff/cache
MiB Swap:   4096.0 total,     17.7 free,   4078.3 used.  19406.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                     
1352886 root      20   0 5223396   4.1g  18236 S 339.0   4.1 284:48.95 python /script/asyn_process.py                                                                                                               
2542922 root      20   0  311076 295640   5452 R  99.7   0.3  27:14.71 python /script/sync_process.py

Async Python code execution offers advantages over synchronous execution when it comes to processing files at a faster rate. However, the choice between the two approaches involves tradeoffs that depend on available resources, time constraints, and existing processes.

Considering that the database operation is CPU-intensive, Python may not be the most suitable tool for such tasks.

Flamegraphs part 1

urgensherpa — Thu, 06 Oct 2022 05:58:44 +0000

As a developer/sysadmin you can create flamegraphs to to create visualizations of system performance data recorded with the perf tool. This perf output shows a stack trace followed by a count, for a total of #N number of samples

git clone the flamegraph scripts:

cd /home/ubuntu
git clone https://github.com/brendangregg/FlameGraph

Sampling a go programme which downloads an linuxmint iso

package main
import (
    "fmt"
    "net/http"
    "io"
    "os"
)

func check(e error) {
    if e != nil {
        panic(e)
    }
}

func main() {
    d1 := []byte("helo world\n")
    for i := 0; i < 10000; i++ {
        resp, err := http.Get("https://mirrors.layeronline.com/linuxmint/stable/21/linuxmint-21-cinnamon-64bit.iso")
        //body, err := io.ReadAll(resp.Body)
        fmt.Println(resp.StatusCode)
        check(err)
        defer resp.Body.Close()
        file, err := os.Create("/tmp/hello.iso")
        size, err := io.Copy(file, resp.Body)
        defer file.Close()
        fmt.Printf("downloaded %s with size %d", file, size)
        err = os.WriteFile("/tmp/check.txt", d1, 0644)
        check(err)
    }
}

compile/build the main.go and run it ./main
Get the process id of main e.g ps aux | grep main

perf record -a -F 99 -g -p 1464 -- sleep 20
Running above command creates a perf.data file

perf script > perf.out //it will by default read perf.data from current working directory and redirects stdout to a file perf.out(ascii file)
This command reads the input file and displays the trace recorded.

Creating flame graph:-

./FlameGraph/stackcollapse-perf.pl perf.out  | ./FlameGraph/flamegraph.pl > flame1006.svg

Download and view flame1006.svg in browser

Here we can observe that io.copybuffer function is using most cpu time (io.copy source)
looking further net.(*netFD).Read is the function call using most cpu time. This net.(*netFD).Read implements func (*IPConn) Read
Conn is a generic stream-oriented network connection.
this function reads data from the connection. This func https://go.dev/src/net/http/transfer.go ¶
we also see ksys_read() is called. This function is responsible for retrieving the struct fd that corresponds with the file descriptor passed in by the user. The struct fd structure contains the struct file_operations structure within it.

sock_read_iter is fired when receiving a message on a socket
By looking at the graph we can conclude that the cpu usage by main programme is spent mostly on reading data from the connection.

Context

urgensherpa — Sun, 07 Aug 2022 16:17:00 +0000

Context allow you to send cancellation signal to asynchronous process/ go-routine.
context

Methods provided by context interface:-

Value(key) This method returns the value associated with the specified key.

**Done() **This method returns a channel that can be used to receive a cancelation notification.

Deadline() This method returns the Time that represents the deadline for the request and a bool value that will be false if no deadline has been specified.

Err() This method returns an error that indicates why the Done channel received a signal. The context package defines two variables that can be used to compare the error: Canceled indicates that the request was canceled, and DeadlineExeeded indicates that the deadline passed.

Functions https://pkg.go.dev/context#pkg-functions

Background()
WithCancel(ctx)
WithDeadline(ctx,time)
WithTimeout(ctx,duration)
WithValue(ctx, key,val)

package main

import (
    "sync"
    "time"
    "fmt"
    "context"
)

func processRequest(ctx context.Context, wg *sync.WaitGroup, count int) {
    total := 0
    for i := 0; i < count; i++ {
        select {
        case <-ctx.Done():
            fmt.Println("Stopping processing - request cancelled")
            goto end
        default:
            fmt.Printf("Processing request: %v \n", total)
            total++
            time.Sleep(time.Millisecond * 250)
        }
    }
    fmt.Println("%v Request processed...", total)
end:
    wg.Done()
}
func main() {
    waitGroup := sync.WaitGroup{}
    waitGroup.Add(1)
    fmt.Println("Request dispatched...")
    ctx, cancel := context.WithCancel(context.Background())

    go processRequest(ctx, &waitGroup, 10)
    time.Sleep(time.Second)
    fmt.Println("Canceling req")
        cancel()
    waitGroup.Wait()

}

---

% go run contxt.go
Request dispatched...
Processing request: 0 
Processing request: 1 
Processing request: 2 
Processing request: 3 
Canceling request
Stopping processing - request cancelled

Graceful shutdown

urgensherpa — Sat, 06 Aug 2022 18:04:00 +0000

Graceful shutdown of a program involves

Sending SIGTERM(notify
program that its going to be killed) ,upon receiving this signal the program stops receiving further requests(could be web requests, database operations etc)
Finalize ongoing/pending tasks(in-flight data)
Release resources (file locks, memory) and terminate(with exit status 0)

Process should be robust against sudden death(eg. power failure, hardware failures). Using robust message queue(beanstalkd) is recommended for such scenarios

Queue Concept-

producer puts job in the Queue(eg. json, xml)
consumer monitors that Queue and takes/reserve the available job
consumer deletes the job from Queue

Read disposability

Graceful shutdown example for golang
https://pkg.go.dev/os/signal#Notify

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    // Hello world, the web server
    helloHandler := func(w http.ResponseWriter, req *http.Request) {
        fmt.Fprintf(w, "You requessted %s - %s", req.URL, req.Method)
        fmt.Println("Serving requests from hey...")
        time.Sleep(time.Second * 2)
        defer fmt.Println("remaining")  // acts as remaining requests 
    }
    api := &http.Server{
        Addr:           ":8080",
        Handler:        http.HandlerFunc(helloHandler),
        ReadTimeout:    10 * time.Second,
        WriteTimeout:   10 * time.Second,
        MaxHeaderBytes: 1 << 20,
    }

    serverErrors := make(chan error, 1)
    shutdown := make(chan os.Signal, 1)
    go func() {
        log.Printf("main api listening on %s", api.Addr)
        serverErrors <- api.ListenAndServe()
    }()

    signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
    //above is not blocking operation although waiting on shutdown channel
    select {
    case err := <-serverErrors:
        log.Fatalf("Error while listening and starting http server: %v", err)

    case <-shutdown:
        log.Println("main: Starting shutdown")
        const timeout = 5 * time.Second
        // Context - it is used for Cancellation and propagation, the context.Background() gives empty context
        ctx, cancel := context.WithTimeout(context.Background(), timeout)
        defer cancel()
        err := api.Shutdown(ctx)
        /*Shutdown gracefully shuts down the server without interrupting any active connections. Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down.If the provided context expires before the shutdown is complete, Shutdown returns the context's error, otherwise it returns any error returned from closing the Server's underlying Listener(s).*/
        if err != nil {
            log.Printf("main: Graceful shutdown didnot complete in %v:%v", timeout, err)
            err = api.Close()
            //Close() immediately closes all active net.Listeners and any connections in state StateNew, StateActive, or StateIdle. For a graceful shutdown, use Shutdown.
        }
        if err != nil {
            log.Fatalf("main: could not stop server gracefully Error: %v", err)
        }
    }
}

additional reading: link1(for context timeouts)