Forem: Anand k

How I Deployed a Live Blockchain Node (ARC) on AWS EC2 - A Complete Step-by-Step Guide

Anand k — Sun, 03 May 2026 04:09:31 +0000

Introduction

This article documents a complete, real-world deployment of an Arc blockchain node on AWS EC2. Unlike tutorials that show only the happy path, this guide captures every error encountered, explains why it happened, and shows exactly how it was fixed.
By the end of this guide you will have a fully operational blockchain node with 5 validators, a block explorer, and a complete monitoring stack running on AWS.

Architecture Overview

The full stack consists of the following components running in Docker containers on a single EC2 instance:
Arc Consensus Node (arc_consensus) = 5 validator nodes + 1 full node
Arc Execution Node (arc_execution) = EVM-compatible execution layer
Blockscout = blockchain explorer with PostgreSQL database
Nginx = reverse proxy routing traffic to Blockscout
Prometheus = metrics collection from all services
Grafana = visualization and dashboards
cAdvisor + Node Exporter = container and system metrics

Part 1: Setting Up the AWS EC2 Instance

1.1 Choosing the Right Instance Type

Building and running a blockchain node is resource-intensive. The wrong instance size will cause build failures or poor performance. The recommended configuration is:
Instance Type = t3.xlarge or better (Rust compilation needs 4+ vCPUs)
vCPUs 4 = Parallel Docker builds
RAM 16 GB = Multiple containers + DB
Storage (EBS) = 100 GB SSD (gp3) (Docker images + chain data)
OS = Ubuntu 22.04 LTS

Important = Using a t3.medium (2 vCPU, 4GB) will cause the Rust compilation to run out of memory and fail after 30-60 minutes.

1.2 Configuring Security Group Inbound Rules

After launching the instance, configure the Security Group to allow external access to required ports:

Important = Opening only port 80 is not enough. Grafana (3000) and Prometheus (9090) need their own inbound rules.

Part 2: Installing Required Tools

2.1 Connect to Your EC2 Instance

ssh -i your-key.pem ubuntu@your-ec2-public-ip

2.2 Clone the Arc Node Repository

cd ~
git clone https://github.com/circlefin/arc-node
cd arc-node
git submodule update --init --recursive

Important - The submodule step may take several minutes. Do not interrupt it.

2.3 Install System Dependencies

sudo apt-get update
sudo apt install docker.io make nodejs npm libclang-dev -y
sudo service docker start
sudo usermod -aG docker $USER
Note - After adding yourself to the docker group, fully close and reopen the terminal for the change to take effect.

2.4 Install Node.js 22

The system Node.js version is outdated. Version 22 is required:
sudo npm install -g n
sudo n 22
hash -r

2.5 Install Foundry

curl -L https://foundry.paradigm.xyz | bash
source ~/.bashrc
foundryup -i v1.4.4

Note - If foundryup is not found after source ~/.bashrc, fully close and reopen the terminal, cd back into arc-node, and run foundryup -i v1.4.4 again.

2.6 Update Docker Compose

The system Docker Compose version is incompatible with Arc node. Install v2.24.0 manually:
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL https://github.com/docker/compose/releases/download/v2.24.0/docker-compose-linux-x86_64 -o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose

2.7 Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
When prompted, type 1 and press Enter to proceed with the default installation.
source $HOME/.cargo/env

2.8 Install npm Dependencies

cd ~/arc-node
npm install

Part 3: Starting the Node

3.1 Run make testnet

cd ~/arc-node
make testnet
On the first run, Arc compiles its Rust source code inside Docker. This takes 60 - 180 minutes. The system will be under heavy load. This is completely normal (do not interrupt the process).

Note : If the build fails partway through, run make testnet again. Docker caches completed layers so it will resume from where it left off.

3.2 Verify the Node is Running

docker ps
You should see the following containers running:
• validator1_cl, validator2_cl, validator3_cl, validator4_cl, validator5_cl
• validator1_el, validator2_el, validator3_el, validator4_el, validator5_el
• full1_cl, full1_el
• blockscout-backend, blockscout-frontend, blockscout-proxy
• blockscout-db

3.3 Start the Monitoring Stack

Grafana and Prometheus are in a separate compose file and must be started independently:
docker compose -f /home/ubuntu/arc/arc-node/.quake/monitoring/compose.yaml up -d

Important - The monitoring stack is not included in make testnet.

Part 4: Configuration Changes Made

4.1 blockscout.yaml = Frontend API Host

File: arc-node/deployments/blockscout.yaml
Before (broken on remote servers)
NEXT_PUBLIC_API_HOST: localhost
NEXT_PUBLIC_APP_HOST: localhost

After
NEXT_PUBLIC_API_HOST:
NEXT_PUBLIC_APP_HOST:

4.2 compose.yaml = Network Configuration

File: arc-node/.quake/localdev/compose.yaml
Before
blockscout:
driver: bridge
internal: true # blocks backend from reaching chain RPC

After
blockscout:
driver: bridge
internal: false

4.3 monitoring/compose.yaml = Grafana User and Ports

File: arc-node/.quake/monitoring/compose.yaml
Before
user: '501'
ports:

127.0.0.1:3000:3000

After
user: '472'
ports:

0.0.0.0:3000:3000

4.4 prometheus.yml = Correct Scrape Targets

scrape_configs:

job_name: 'validators' static_configs:
- targets:
  - 'host.docker.internal:9101'
  - 'host.docker.internal:9201'
  - 'host.docker.internal:9301'

Part 5: Final Working State

Part 6: Load Testing the Node

With the node fully running, test it by sending real transactions:

make testnet-load RATE=10 TIME=30

This sends 10 transactions per second for 30 seconds, a total of 300 transactions across all 5 validators. The output confirms successful transaction delivery:
30.067s: Total sent 303 txs (35752 bytes), 10.1 tx/s

After running the load test, refresh the Blockscout explorer at http:/// to see the transactions appear in real time.

Conclusion

Docker

1) Bind mount host paths must exist before docker compose up so Docker does not create them
2) Container-to-container communication uses internal ports, not host-mapped ports
3) internal: true on a network isolates ALL external access including inter-service calls
4) Each service runs as a specific UID so always chown data directories to match

Networking

1) Frontend environment variables like NEXT_PUBLIC_API_HOST are resolved by the browser, not the server
2) Always use the public IP for any variable that the browser reads
3) Opening port 80 in a Security Group does NOT open 3000 or 9090 so each needs its own rule

Debugging

1) Read docker logs carefully - every crash has an exact error message
2) Port scan with curl to find actual metrics endpoints instead of guessing
3) Use docker exec ss -tlnp to see what a container is actually listening on
4) A cascade failure (many errors at once) usually has one root cause so find the first error

Kubernetes Troubleshooting Guide: Real-Time Scenarios & Solutions

Anand k — Tue, 24 Mar 2026 06:59:17 +0000

Kubernetes is powerful, but with that power comes complexity. In real-world DevOps environments, issues like pod failures, scheduling problems, and resource mismanagement are common. Understanding how to troubleshoot these effectively is what separates a beginner from a skilled DevOps engineer.

ImagePullBackOff Issue

One of the most common errors in Kubernetes is ImagePullBackOff, which occurs when a container image cannot be pulled.

Causes:
Invalid or non-existent image
Private repository without authentication
Solution:

For private images, use ImagePullSecrets:

kubectl create secret docker-registry demo
--docker-server=your-registry-server
--docker-username=your-name
--docker-password=your-password
--docker-email=your-email

Then reference it in your deployment:
spec:
imagePullSecrets:
- name: demo
For AWS ECR:
kubectl create secret docker-registry ecr-secret
--docker-server=${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com
--docker-username=AWS
--docker-password=$(aws ecr get-login-password)
--namespace=default

CrashLoopBackOff

This error indicates that a container is repeatedly crashing and restarting.

Common Reasons:
Misconfigurations (env variables, volumes)
Incorrect commands in Dockerfile
Application bugs
Liveness probe failures
Insufficient CPU or memory

How It Works:
Kubernetes restarts the container with increasing delay:

First retry: ~10 seconds
Next retry: ~60 seconds
This is called backoff strategy.

Fix:
Check logs: kubectl logs
Describe pod: kubectl describe pod
Validate configs and probes

Liveness & Readiness Probes

Kubernetes uses probes to monitor application health.
Types:
Liveness Probe → Restarts container if unhealthy
Readiness Probe → Controls traffic routing

Misconfigured probes can cause continuous restarts → CrashLoopBackOff.

Resource Management (Critical in Real-Time)

In shared clusters, improper resource usage can affect all applications.
Problem:
One application consumes excessive CPU/memory → others fail
Solutions:
1) Resource Quota (Namespace Level)
Limits total resources a namespace can use
2) Resource Limits (Pod Level)
Restricts individual pod usage

Important Rule:
Never blindly increase resources. Always identify the root cause and allocate the correct usage.

Pod Not Schedulable

If a pod is stuck in Pending, it means the scheduler cannot place it on any node.

Debug:
kubectl describe pod
Common Causes & Fixes:

1) Node Selector: Forces pod to run on a specific node

nodeSelector:
node-name: arm-worker

If label doesn’t match → pod won’t schedule
Fix:
kubectl edit node

2) Node Affinity: More flexible than nodeSelector:

Required → Must match
Preferred → Try to match, else fallback

3) Taints: Prevents pods from scheduling on nodes.
Types:
NoSchedule
NoExecute
PreferNoSchedule

kubectl taint nodes nodename key=value:NoSchedule

4) Tolerations: Allows specific pods to run on tainted nodes.

6.StatefulSet & Persistent Volume Issues

Stateful applications depend on storage.

Problem:
Pods stuck in Pending due to missing Persistent Volume (PV)

Root Cause:
Incorrect StorageClass

Example issue:
storageClassName: ebs

This works in AWS but fails in other environments.

Solution
storageClassName: standard
Debug:
kubectl get storageclass
kubectl describe pod

Note:

Delete old PVC before reapplying:
kubectl delete pvc

OOMKilled (Out Of Memory)

Occurs when a container exceeds memory limits.

Causes:
Low memory limits
Memory leaks in application

Debug:
Check pod events
For Java apps:
Thread dump → kill -3
Heap dump → jstack

Example:

If app needs 2GB but limit is 200MB → crash is inevitable

Kubernetes troubleshooting is not about memorizing commands, it’s about understanding system behavior.