Forem: Sergio Andres Usma

Jetson Containers Quickstart on NVIDIA Jetson AGX Orin 64GB

Sergio Andres Usma — Sun, 05 Apr 2026 23:36:25 +0000

Abstract

This document describes how to run NVIDIA Jetson‑optimized AI containers from the dustynv/jetson-containers project on an NVIDIA Jetson AGX Orin 64GB Developer Kit with Ubuntu 22.04.5 LTS and JetPack 6.2.2 (L4T 36.5.0), focusing on LLMs, speech, vision, and development tools. It consolidates the original Jetson Containers Quickstart PDF into an operational tutorial with copy‑paste docker run commands and n8n integration pointers tailored to a system where n8n itself runs in Docker on port 5678. The tutorial targets engineers who want to run multiple local AI services on the same Jetson and orchestrate them via OpenAI‑compatible APIs without relying on external cloud providers.

1. Target Hardware and Software Environment

Your system matches the reference environment of the Jetson Containers Quickstart Guide: Jetson AGX Orin 64GB, Ubuntu 22.04.5 aarch64, JetPack 6.2.2 (L4T 36.5.0), CUDA 12.6, cuDNN 9.3.0, and TensorRT 10.3.0.30. This platform has 64 GB unified memory and is validated to run all 51 containers in the guide, including 70B‑parameter LLMs in GPU‑accelerated runtimes.

Before launching AI containers, ensure:

Docker is installed and configured with NVIDIA runtime (JetPack 6.x already provides nvidia-container-runtime, you mainly add Docker itself).
GPU works inside Docker:

docker run --runtime nvidia --rm \
  dustynv/cuda:12.8-samples-r36.4.0-cu128-24.04 \
  /usr/local/cuda/extras/demo_suite/deviceQuery

(Optional) Create directories for persistent data:

mkdir -p ~/.ollama \
         ~/.cache/huggingface \
         ~/sd-models \
         ~/comfyui-models \
         ~/comfyui-output \
         ~/ml-workspace \
         ~/notebooks \
         ~/aim-data \
         ~/ha-config

Use these directories as bind‑mounts so models and configuration survive container recreation.

2. LLM Inference Engines (OpenAI-Compatible)

2.1 Ollama — General Purpose LLM Runtime

Ollama is a user‑friendly way to run LLaMA, Mistral, Qwen, Gemma, Phi, and DeepSeek models with an OpenAI‑compatible REST API on your Jetson. The guide notes that AGX Orin 64GB can run 70B models comfortably using this runtime.

Start Ollama:

docker run --runtime nvidia -it -d \
  --name ollama \
  --network host \
  -v ~/.ollama:/root/.ollama \
  dustynv/ollama:r36.4.0

Pull a model and chat:

# Pull a model
curl http://localhost:11434/api/pull \
  -d '{"name": "llama3.2:3b"}'

# Chat completion (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role":"user","content":"Hello from n8n!"}]
  }'

n8n configuration:

Credential: OpenAI API credential.
API Key: any string (e.g. ollama).
Base URL: http://<jetson-ip>:11434/v1.
Model: llama3.2:3b or any model pulled into Ollama.

2.2 llama.cpp — GGUF, Quantized LLM Server

llama.cpp excels at running quantized GGUF models with low latency and memory usage. The quickstart provides an OpenAI‑compatible server configuration suitable for AGX Orin.

Start llama.cpp server:

docker run --runtime nvidia -it -d \
  --name llama-server \
  --network host \
  -v /models:/models \
  dustynv/llama_cpp:r36.4.0 \
  llama-server \
    --model /models/llama-3.1-8b-q4.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    --n-gpu-layers 999 \
    --ctx-size 8192

The server exposes OpenAI‑style endpoints on http://<jetson-ip>:8080/v1.

n8n configuration:

Node: OpenAI Chat Model.
Base URL: http://<jetson-ip>:8080/v1.
Model: choose name according to your server configuration; llama.cpp will map GGUF to logical model IDs.

2.3 vLLM — High Throughput LLM Serving

vLLM uses PagedAttention to reach significantly higher throughput than naive Hugging Face inference, which is useful for multi‑user services.

Start vLLM:

docker run --runtime nvidia -it -d \
  --name vllm \
  --network host \
  --shm-size=8g \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  dustynv/vllm:r36.4.0 \
  python3 -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3.2-3B-Instruct \
    --host 0.0.0.0 \
    --port 8000

Exposes OpenAI‑compatible endpoints at http://<jetson-ip>:8000/v1.

n8n configuration:

Node: OpenAI Chat Model.
Base URL: http://<jetson-ip>:8000/v1.
Enable streaming mode if you want streamed responses.

2.4 SGLang — Structured Output and JSON

SGLang is designed for structured outputs and JSON‑constrained decoding using RadixAttention.

Start SGLang:

docker run --runtime nvidia -it -d \
  --name sglang \
  --network host \
  --shm-size=8g \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  dustynv/sglang:r36.4.0 \
  python3 -m sglang.launch_server \
    --model-path meta-llama/Llama-3.2-3B-Instruct \
    --host 0.0.0.0 \
    --port 30000

n8n usage pattern:

Use HTTP Request node pointing to http://<jetson-ip>:30000/v1/chat/completions and include response_format: {"type":"json_object"} in the body when you need strict JSON.

2.5 MLC and nanoLLM — Orin‑Optimized and Multimodal

MLC LLM compiles models targeting Jetson’s GPU architecture for fast token generation.

Start MLC LLM:

docker run --runtime nvidia -it -d \
  --name mlc \
  --network host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  dustynv/mlc:r36.4.0

According to the quickstart, MLC frequently achieves the fastest token rates on AGX Orin among the tested engines.

nanoLLM provides higher‑level multimodal pipelines with vision‑language and voice capabilities.

Start nanoLLM with VILA:

docker run --runtime nvidia -it -d \
  --name nano-llm \
  --network host \
  --shm-size=8g \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  dustynv/nano_llm:r36.4.0 \
  python3 -m nano_llm.serve \
    --model Efficient-Large-Model/VILA1.5-3b \
    --host 0.0.0.0 \
    --port 9000

Multimodal example:

curl http://localhost:9000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "VILA1.5-3b",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "http://example.com/img.jpg"}},
        {"type": "text", "text": "What is in this image?"}
      ]
    }]
  }'

n8n:

Node: OpenAI Chat Model.
Base URL: http://<jetson-ip>:9000/v1.
Use messages with image_url and text parts when building prompts.

3. Speech and Audio Containers

3.1 faster-whisper — STT Server

faster‑whisper is a fast speech‑to‑text server offering OpenAI‑compatible endpoints on Jetson.

Start faster‑whisper:

docker run --runtime nvidia -it -d \
  --name faster-whisper \
  --network host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  dustynv/faster-whisper:r36.4.0 \
  python3 -m faster_whisper.server \
    --host 0.0.0.0 \
    --port 8000

Exposes /v1/audio/transcriptions and works with OpenAI Chat Model or HTTP Request nodes.

n8n pattern:

HTTP Request, method POST, URL http://<jetson-ip>:8000/v1/audio/transcriptions.
Body: form‑data with file (binary audio) and model (e.g. "whisper-1").

3.2 kokoro-tts — Lightweight Local TTS

kokoro‑tts offers an OpenAI‑compatible /v1/audio/speech endpoint with multiple voices.

Start kokoro‑tts:

docker run --runtime nvidia -it -d \
  --name kokoro-tts \
  --network host \
  dustynv/kokoro-tts:r36.4.0

Generate MP3:

curl http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Hello from your Jetson!",
    "voice": "af_bella",
    "response_format": "mp3"
  }' \
  --output speech.mp3

n8n:

HTTP Request, Response Format = File, then return or store the binary audio.

3.3 speaches — Unified Speech In/Out

speaches exposes both STT and TTS endpoints compatible with OpenAI’s audio APIs.

Start speaches:

docker run --runtime nvidia -it -d \
  --name speaches \
  --network host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  dustynv/speaches:r36.4.0

Ports and endpoints are listed in the API quick reference (port 8000, OpenAI‑compatible).

A complete on‑device voice pipeline can be built as: Webhook (audio) → faster‑whisper STT → LLM (Ollama or vLLM) → kokoro‑tts or speaches TTS → Webhook response.

4. Vision, Diffusion, and VLM Containers

4.1 Stable Diffusion WebUI — Text‑to‑Image UI + API

The Stable Diffusion WebUI container gives you a full browser interface and REST API for image generation.

Start Stable Diffusion WebUI:

docker run --runtime nvidia -it -d \
  --name sd-webui \
  --network host \
  -v ~/sd-models:/workspace/stable-diffusion-webui/models \
  dustynv/stable-diffusion-webui:r36.4.0 \
  python3 launch.py --api --listen --port 7860

Web UI: http://<jetson-ip>:7860.

API txt2img example:

curl http://localhost:7860/sdapi/v1/txt2img \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "mountain landscape",
    "steps": 20,
    "width": 512,
    "height": 512
  }'

n8n:

HTTP Request → parse JSON → Move Binary Data to convert base64 images[0] to binary → send to Telegram, save file, etc.

4.2 ComfyUI — Graph‑Based Diffusion Workflows

ComfyUI is a node‑based interface with an HTTP API.

Start ComfyUI:

docker run --runtime nvidia -it -d \
  --name comfyui \
  --network host \
  -v ~/comfyui-models:/root/ComfyUI/models \
  -v ~/comfyui-output:/root/ComfyUI/output \
  dustynv/comfyui:r36.4.0

API flow:

POST /prompt → get prompt_id.
GET /history/{prompt_id} repeatedly until outputs appear.
GET /view?filename={filename}&type=output to download the image.

Use a sequence of HTTP Request nodes in n8n to implement the polling and retrieval.

4.3 VILA and Related VLMs

The VILA container provides an efficient vision‑language model with an OpenAI‑compatible API.

Start VILA:

docker run --runtime nvidia -it -d \
  --name vila \
  --network host \
  --shm-size=8g \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  dustynv/vila:r36.4.0

According to the quick reference, VILA uses port 8000 and integrates via OpenAI Chat Model node.

In n8n, send messages that include an image_url object and text, similar to the nanoLLM example.

5. Development, Experiment Tracking, and Smart Home

5.1 L4T-ML, PyTorch, and JupyterLab

L4T‑ML is an all‑in‑one ML environment that bundles PyTorch, TensorFlow, scikit‑learn, and JupyterLab optimized for JetPack 6.x.

Start L4T‑ML JupyterLab:

docker run --runtime nvidia -it -d \
  --name l4t-ml \
  --network host \
  --shm-size=8g \
  -v ~/ml-workspace:/workspace \
  dustynv/l4t-ml:r36.4.0 \
  jupyter lab --ip=0.0.0.0 --allow-root --no-browser

Access via http://<jetson-ip>:8888 in your browser.

Alternatively, the standalone dustynv/jupyterlab:r36.4.0 container provides just JupyterLab:

docker run --runtime nvidia -it -d \
  --name jupyterlab \
  --network host \
  -v ~/notebooks:/notebooks \
  dustynv/jupyterlab:r36.4.0 \
  jupyter lab --ip=0.0.0.0 --allow-root --no-browser --NotebookApp.token=''

PyTorch‑focused images (dustynv/pytorch, dustynv/l4t-pytorch) can be run via jetson-containers run ... as described in the build docs and are fully compatible with JetPack 6.2.2.

5.2 AIM Experiment Tracker

AIM is a lightweight REST‑accessible experiment tracker container.

Start AIM:

docker run --runtime nvidia -it -d \
  --name aim \
  --network host \
  -v ~/aim-data:/aim/data \
  dustynv/aim:r36.4.0 \
  aim up --host 0.0.0.0 --port 43800

Web UI and API at http://<jetson-ip>:43800.
n8n can poll api/runs and api/metrics using HTTP Request nodes to monitor training.

5.3 Home Assistant Core on Jetson

Home Assistant Core can run as a container for local smart‑home control.

Start Home Assistant:

docker run -it -d \
  --name homeassistant \
  --network host \
  -v ~/ha-config:/config \
  dustynv/homeassistant-core:r36.4.0

Access UI at http://<jetson-ip>:8123 and create a Long‑Lived Access Token under your profile.

n8n integration:

HTTP Request node with URL like http://<jetson-ip>:8123/api/states or /api/services/....
Authentication: Bearer token using the Long‑Lived Access Token.
Build flows like "sensor state change → LLM decision → Home Assistant service call" as outlined in the quickstart.

6. n8n Integration Patterns and Networking Notes

The quickstart highlights that your n8n instance runs in Docker on port 5678 and must reach Jetson services via the Jetson's LAN IP, not localhost, because container networking isolates localhost inside the n8n container. For OpenAI‑compatible services, configure the OpenAI Chat Model node with the Base URL pointing to http://<jetson-ip>:<port>/v1, while for other services use HTTP Request nodes and explicit paths.

OpenAI‑compatible containers and ports (from API quick reference):

Container	Port	Base URL example
ollama	11434	`http://<jetson-ip>:11434/v1`
llama_cpp	8080	`http://<jetson-ip>:8080/v1`
vLLM	8000	`http://<jetson-ip>:8000/v1`
sglang	30000	`http://<jetson-ip>:30000/v1`
mlc	8080	`http://<jetson-ip>:8080/v1`
nano_llm	9000	`http://<jetson-ip>:9000/v1`
speaches	8000	`http://<jetson-ip>:8000/v1`
faster-whisper	8000	`http://<jetson-ip>:8000/v1` or audio paths
kokoro-tts	8880	`http://<jetson-ip>:8880/v1`
VILA	8000	`http://<jetson-ip>:8000/v1`

Example: full on‑device voice assistant pipeline in n8n:

Webhook (POST /voice-input) → receives audio
  ↓
HTTP Request → POST /v1/audio/transcriptions (faster-whisper or speaches)
  Body: form-data (file: binary audio, model: "whisper-1")
  ↓
OpenAI Chat Model → local LLM (Base URL = Ollama or vLLM)
  ↓
HTTP Request → POST /v1/audio/speech (kokoro-tts or speaches)
  Body: {"model":"kokoro","input":"{{$json.text}}","voice":"af_bella"}
  ↓
Webhook Response → returns audio binary

This pattern uses only local containers on Jetson and keeps all data on‑device.

7. Practical Recommendations and Next Steps

The quickstart confirms that all 51 dustynv/jetson-containers images tagged r36.4.0 are compatible with JetPack 6.x and have been tested on Jetson AGX Orin 64GB with CUDA 12.6. For production use on your board, the guide suggests mounting persistent caches, using --shm-size=8g for transformer‑based containers, benchmarking vLLM vs MLC vs llama.cpp on your target models, and eventually switching from --network host to explicit port mappings on isolated Docker networks.

[Beginner] Docker Tutorial for jetson-containers on Jetson AGX Orin

Sergio Andres Usma — Sun, 05 Apr 2026 23:20:36 +0000

Abstract

This tutorial explains how to use Docker with the jetson-containers project on an NVIDIA Jetson AGX Orin 64 GB running Ubuntu 22.04 and JetPack 6.2.2, focusing on beginner-friendly concepts and commands. It introduces basic container terminology, shows how to safely back up configuration files, and then walks through everyday Docker operations like starting, stopping, rebuilding, and re-running containers. The goal is to give new users a practical, copy‑pasteable reference they can keep open on the Jetson while working with jetson-containers.

1. Basic Concepts: Images, Containers, Volumes

For beginners, it helps to understand a few core terms before running commands.

Image: A read‑only template that contains an application and its OS-level dependencies (for example, a pre-built PyTorch + CUDA environment for Jetson).
Container: A running instance of an image with its own filesystem, processes, and network configuration; you can start, stop, and delete containers without touching the original image.
Dockerfile: A text file with instructions on how to build an image (which base image to use, what packages to install, what commands to run).
Volume: A directory from the host (your Jetson) that is mounted inside the container so changes persist on disk even if you delete the container.
Registry: A server that stores images (for example, Docker Hub or the GitHub Container Registry used by many Jetson projects).

Mental model: the image is like an ISO of a Linux distro, and the container is like the running system you boot from that ISO, with volumes acting as your persistent home folder.

2. Safety First: Backing Up Configuration Files

Before experimenting with Docker and jetson-containers, you should back up any configuration files or project directories you are going to modify.

2.1. Backing up directories on the Jetson

Pick a backup directory on your Jetson, for example ~/backups/jetson-containers.

mkdir -p ~/backups/jetson-containers

To back up a directory that will be mounted into a container (for example ~/projects/my-app):

# Backup with timestamp
cp -a ~/projects/my-app \
  ~/backups/jetson-containers/my-app_$(date +%Y%m%d-%H%M%S)

To back up a single file that might be edited (for example docker-compose.yml or a config file):

cp docker-compose.yml \
  docker-compose.yml.bak_$(date +%Y%m%d-%H%M%S)

If you are about to edit a file inside a project:

# From inside the project directory
cp config.yaml config.yaml.bak_$(date +%Y%m%d-%H%M%S)

Restoring is just the reverse:

cp ~/backups/jetson-containers/my-app_20260405-120000/* \
   ~/projects/my-app/

3. Checking Docker on Jetson AGX Orin

You already have Docker 29.3.1 installed on your Jetson with arm64 support, which is what you need for jetson-containers.

Verify Docker is running:

docker version
docker info

If docker info shows errors about permissions, add your user to the docker group and re‑login:

sudo usermod -aG docker $USER
# Then log out and log back in, or reboot

To test with a simple container:

docker run --rm arm64v8/ubuntu:22.04 uname -a

This command pulls an arm64 Ubuntu image and prints the kernel info, confirming Docker is working.

4. Using jetson-containers: Typical Workflow

This section uses generic patterns you can adapt to the jetson-containers project (git clone, build, and run commands are similar across JetPack 6 projects).

4.1. Cloning the jetson-containers repository

From your home directory:

cd ~
git clone https://github.com/dusty-nv/jetson-containers.git
cd jetson-containers

Back up the repository before heavy changes:

cp -a ~/jetson-containers \
  ~/backups/jetson-containers/jetson-containers_$(date +%Y%m%d-%H%M%S)

4.2. Building a jetson-containers image

Within the jetson-containers repository, there are scripts or Dockerfiles to build images optimized for your JetPack version.

Example pattern:

# Example: build an image for a specific package or stack
./scripts/build.sh <image-name>
# or directly with Docker
docker build -t my-jetson-image -f Dockerfile .

Replace <image-name> with the target defined by jetson-containers (for example, a PyTorch or L4T base image name).

Key flags for docker build:

docker build \
  -t my-jetson-image \        # Tag (name) for your image
  -f Dockerfile .             # Dockerfile and build context

5. Running, Stopping, and Inspecting Containers

This is the heart of day‑to‑day container usage.

5.1. Starting a new container with jetson-containers

A typical docker run command for Jetson should:

Use the correct image (from jetson-containers).
Pass through GPU access.
Mount your project directory as a volume.
Optionally set the container name.

Generic pattern:

docker run -it --rm \
  --gpus all \
  --network host \
  --ipc host \
  -v ~/projects/my-app:/workspace/my-app \
  --name my-jetson-container \
  my-jetson-image \
  /bin/bash

Explanation of key flags:

-it: Interactive terminal.
--rm: Delete the container when it exits (good for experiments).
--gpus all: Give the container access to the Jetson GPU.
--network host: Share the host network stack (useful for ROS, web services, etc).
--ipc host: Share IPC for better performance with some frameworks.
-v host:container: Mount a host directory into the container.
--name: Easy name for managing the container later.

5.2. Listing running and stopped containers

# Only running containers
docker ps

# All containers (running and stopped)
docker ps -a

5.3. Attaching and entering a running container

If a container is running in the background:

docker exec -it my-jetson-container /bin/bash

This opens a shell inside the running container.

5.4. Stopping and removing containers

To stop a running container:

docker stop my-jetson-container

To forcibly stop (if it hangs):

docker kill my-jetson-container

To remove a stopped container:

docker rm my-jetson-container

To remove all stopped containers:

docker container prune

6. Re-running and Rebuilding Images

When you change a Dockerfile or the jetson-containers configuration, you often need to rebuild images.

6.1. Re-running a container with the same configuration

If you used --name my-jetson-container, Docker keeps the container configuration until it is removed.

To start it again after it has been stopped:

docker start my-jetson-container
docker exec -it my-jetson-container /bin/bash

If you used --rm, the container is deleted on exit, so you must run docker run again (the image itself remains).

6.2. Forcing a rebuild of an image

When you modify a Dockerfile or build context and want Docker to ignore previous cache layers:

docker build --no-cache -t my-jetson-image -f Dockerfile .

If jetson-containers provides a build script, you can usually pass a similar --no-cache flag or use an environment variable, for example:

# Example pattern; adapt to the actual script interface
NO_CACHE=1 ./scripts/build.sh <image-name>

To remove an image and force a clean rebuild:

docker rmi my-jetson-image
docker build -t my-jetson-image -f Dockerfile .

List images to verify:

docker images

6.3. Updating images from a registry

If jetson-containers publishes pre-built images, you can pull the latest version:

docker pull <registry>/<namespace>/<image>:<tag>

After pulling, re-run your containers using the updated tag to test new versions safely (after backing up your mounted project directory).

7. Managing Data and Volumes Safely

To avoid losing important work when deleting containers, always use volumes (host directories mounted into containers).

7.1. Using host directories as volumes

Example:

mkdir -p ~/projects/my-app
docker run -it --rm \
  -v ~/projects/my-app:/workspace/my-app \
  my-jetson-image \
  /bin/bash

Anything you save to /workspace/my-app inside the container appears in ~/projects/my-app on the Jetson and persists when the container is removed.

7.2. Using named Docker volumes (optional)

For simple persistent storage managed by Docker:

# Create a volume
docker volume create my-jetson-volume

# Use it in a container
docker run -it --rm \
  -v my-jetson-volume:/data \
  my-jetson-image \
  /bin/bash

List volumes and remove unused ones:

docker volume ls
docker volume prune

8. Useful Command Reference Table

Below is a quick reference table you can keep near your terminal.

Task	Command (Jetson terminal)
Check Docker version	`docker version`
List running containers	`docker ps`
List all containers	`docker ps -a`
List images	`docker images`
Start new container	`docker run -it --rm --gpus all --network host --ipc host -v ~/proj:/workspace my-img`
Stop container	`docker stop <name-or-id>`
Force stop container	`docker kill <name-or-id>`
Remove stopped container	`docker rm <name-or-id>`
Remove image	`docker rmi <image>`
Exec into running container	`docker exec -it <name-or-id> /bin/bash`
Build image	`docker build -t my-img -f Dockerfile .`
Build image without cache	`docker build --no-cache -t my-img -f Dockerfile .`
Start stopped container	`docker start <name>`
Backup directory before changes	`cp -a ~/dir ~/backups/dir_$(date +%Y%m%d-%H%M%S)`
Backup single file before editing	`cp file.txt file.txt.bak_$(date +%Y%m%d-%H%M%S)`

Table 1 — Common Docker commands on Jetson

9. Conclusion

With these concepts and commands, you can confidently use Docker and jetson-containers on your Jetson AGX Orin without risking important project data, thanks to consistent use of backups and volumes. As you become more comfortable, you can refine the docker run patterns, create your own Dockerfiles, and integrate jetson-containers more deeply into your development workflow.

Fast Large-file and LLM Downloads with aria2 on NVIDIA Jetson AGX Orin

Sergio Andres Usma — Sun, 05 Apr 2026 22:32:55 +0000

Abstract

This tutorial documents the configuration and use of aria2 to download very large files and LLM weight archives on an NVIDIA Jetson AGX Orin Developer Kit 64 GB running Ubuntu 22.04.5 LTS aarch64 with JetPack 6.2.2. It focuses on high-concurrency HTTPS downloads from Hugging Face and similar model repositories, with commands tuned for edge hardware, multi-gigabyte single-file models in GGUF or safetensors format, and the object storage redirect behavior common to modern model hosting platforms.

The guide covers robust resume strategies using .aria2 control files and session files that allow downloads to survive reboots, intermittent connectivity, and signed URL expiration. Failure scenarios encountered in practice — including HTTP 403 rate limits near completion, hash-like output filenames resulting from redirect chains, and missing control metadata — are addressed with safe, prescriptive recovery steps that avoid data corruption or accidental full re-downloads.

The document targets advanced Linux and Jetson users who regularly fetch multi-GB model artefacts and want a repeatable, resilient pattern for aria2 on ARM64. Readers will finish with a working installation, a set of reusable power commands, a reliable batch-resume workflow, and the knowledge to debug common failure modes without discarding already-downloaded data.

1. Hardware and software environment

The environment documented throughout this tutorial is an NVIDIA Jetson AGX Orin Developer Kit with 64 GB unified memory, running a standard JetPack 6.2.2 software stack on Ubuntu 22.04.5 LTS aarch64. This configuration represents a current-generation edge AI development system with sufficient CPU cores, RAM, and storage throughput to benefit from aria2's parallel download capabilities. The commands and flag values in subsequent sections are validated against this environment.

Component	Version / Value
Hardware	NVIDIA Jetson AGX Orin Developer Kit 64 GB
OS	Ubuntu 22.04.5 LTS aarch64
Kernel	5.15.185-tegra
L4T	36.5.0
JetPack	6.2.2
CUDA	12.6
cuDNN	9.3
TensorRT	10.3

Table 1 — Jetson AGX Orin software stack

The tutorial assumes that network routing, DNS, and internet connectivity to Hugging Face are already functional on the device. No proxy or VPN configuration is assumed, although aria2 supports those if needed. Storage is assumed to be NVMe or SSD formatted as ext4, which affects the recommended file allocation strategy discussed in section 6.

2. Installing aria2

aria2 is available from the official Ubuntu 22.04 aarch64 package repositories and requires no external PPA or manual build on this platform.

sudo apt update
sudo apt install -y aria2
aria2c -v

aria2c -v prints version details and build flags, confirming that the binary is functional and correctly linked. If the command is not found, verify that /usr/bin is in PATH. On a standard Jetson Ubuntu installation, no adjustments are required.

Create dedicated directories for model files and their associated .aria2 control files before starting any downloads. Co-locating them in stable directories is essential for reliable resume behavior, as aria2 expects the data file and its sidecar to share the same directory and base name.

mkdir -p ~/models
mkdir -p ~/downloads

Moving or renaming a data file after a partial download breaks the association aria2 relies on to continue from the correct offset. Establish these directories once and use them consistently across all aria2 invocations.

3. Fast single-file downloads from Hugging Face

3.1 Core throughput flags: `-x`, `-s`, and `-k`

For a single large file, aria2 opens multiple HTTP connections and divides the target into segments that are fetched concurrently. Three flags control this behavior:

-x N / --max-connection-per-server=N: number of parallel HTTP connections opened to the server.
-s N / --split=N: number of segments the file is divided into for parallel download.
-k SIZE / --min-split-size=SIZE: minimum size per segment (e.g., 64M); prevents excessive small chunks for large files.

The URL must always be the last positional argument. A common mistake is placing -s immediately before the URL with no numeric value between them, which causes aria2 to interpret the URL as the split count and fail silently.

Correct usage for a Hugging Face GGUF file:

aria2c -x 16 -s 16 \
  "https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/resolve/main/gemma-4-31B-it-Q4_K_M.gguf?download=true"

If the server returns HTTP 429 responses or imposes rate limits, reduce concurrency:

aria2c -x 8 -s 8 \
  "https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/resolve/main/gemma-4-31B-it-Q4_K_M.gguf?download=true"

Example of download command to download gemma-4-26B-A4B-it-UD-Q4_K_M.gguf

aria2c \
  -c \
  -x8 -s8 \
  -k32M \
  --retry-wait=10 \
  --max-tries=0 \
  --file-allocation=none \
  --summary-interval=60 \
  -o gemma-4-26B-A4B-it-UD-Q4_K_M.gguf \
  'https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/resolve/main/gemma-4-26B-A4B-it-UD-Q4_K_M.gguf?download=true'

3.2 Power command template for large model downloads

The following command is the recommended baseline for large model downloads. It combines high concurrency with resilient retry behavior, explicit resume support, and a stable output filename.

aria2c -c -x16 -s16 -k64M \
  --retry-wait=5 --max-tries=0 \
  --file-allocation=none \
  --summary-interval=60 \
  -d ~/models \
  -o gemma-4-31B-it-Q4_K_M.gguf \
  "https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/resolve/main/gemma-4-31B-it-Q4_K_M.gguf?download=true"

Flag-by-flag explanation:

-c / --continue=true: resume an existing partial file if the .aria2 control file is present.
-x16 / -s16: 16 parallel connections and 16 file segments for maximum throughput on a fast link.
-k64M: 64 MB minimum segment size; reduces the number of chunks for very large files.
--retry-wait=5: pause 5 seconds before each retry on transient errors.
--max-tries=0: retry indefinitely; aria2 will not give up until stopped manually.
--file-allocation=none: skip pre-allocation of the full file size, avoiding a blocking write at startup on Jetson NVMe storage.
-d ~/models: explicit target directory.
-o <filename>: explicit output filename, preventing query-string characters or hash-like object keys from appearing in the filename on disk.

For throttled or unstable connections, use the conservative variant:

aria2c -c -x8 -s8 -k16M \
  --retry-wait=10 --max-tries=0 \
  --file-allocation=none \
  -d ~/models \
  "https://huggingface.co/.../model.gguf?download=true"

This reduces pressure on the server while still delivering substantially better throughput than a single connection.

4. Resume mechanics and .aria2 Control Files

4.1 How aria2 resume works

For every active download, aria2 creates a binary control file alongside the data file using the naming convention <target-filename>.aria2. Downloading model.gguf produces both model.gguf and model.gguf.aria2 in the output directory. The control file tracks segment byte offsets, checksums, and download state. Without it, aria2 cannot determine which byte ranges are valid and cannot safely resume.

If the same aria2 command is re-run from the same directory with the same output filename, aria2 detects the existing data file and its .aria2 sidecar and continues from the last recorded position. The -c flag makes this behavior explicit and causes aria2 to abort rather than silently overwrite when a safe resume is not possible.

# First run — interrupted partway through
aria2c -c -x16 -s16 -k64M -o model.gguf "https://huggingface.co/.../model.gguf"

# Second run — resumes from the interrupted position
aria2c -c -x16 -s16 -k64M -o model.gguf "https://huggingface.co/.../model.gguf"

To enforce strict resume-or-abort behavior rather than a silent restart:

aria2c --always-resume=true -c "https://example.com/bigfile.iso"

4.2 Missing .aria2 control file

If aria2 reports errorCode=13 or a message containing "file exists but .aria2 does not exist", the data file is present but the control file has been lost.

If the data file is complete and passes an integrity check (e.g., SHA-256 matches the repository-published hash), it can be kept and removed from any pending URL or session lists.
If the data file is incomplete and the .aria2 file is gone, aria2 has no record of which byte ranges were successfully written. The safest recovery is to delete both the partial data file and any remnant .aria2 file, then restart the download from scratch for that specific file.

Do not attempt to resume an incomplete file without its control file. The resulting output may silently contain duplicate or missing byte ranges.

4.3 Resuming with a new signed URL

Hugging Face and similar hosts issue time-limited signed URLs. If a partial download's original URL has expired, resume is still possible provided:

The output file name and directory path are unchanged.
The new URL resolves to the same file content.

aria2c -c --auto-file-renaming=false \
  -d ~/models -o model.gguf \
  "https://new-signed-url..."

The --auto-file-renaming=false flag prevents aria2 from creating a renamed copy (e.g., model.gguf.1) when it detects an existing file. Instead, aria2 reuses the existing partial file and its .aria2 control file and continues from the recorded position.

5. Batch downloads and session files

5.1 URL list with session management

When downloading multiple files — such as a full safetensors shard set or several GGUF quantization variants — use a URL list file and a session file to enable batch resume. Create urls.txt with one URL per line:

https://huggingface.co/.../model-00001-of-00037.safetensors
https://huggingface.co/.../model-00002-of-00037.safetensors
https://huggingface.co/.../model-00003-of-00037.safetensors

Start the batch with session tracking enabled:

aria2c -i urls.txt \
  --save-session=aria2-session.txt \
  --save-session-interval=60 \
  -c -x8 -s8 -k16M \
  -d ~/models

-i urls.txt: read download targets from the URL list file.
--save-session=aria2-session.txt: write all active and incomplete download state to the session file.
--save-session-interval=60: flush the session file to disk every 60 seconds, limiting lost progress on an abrupt stop.
-c: resume any partial files found in the target directory.

After a reboot or manual stop, resume the entire batch:

aria2c --input-file=aria2-session.txt \
  --save-session=aria2-session.txt \
  -c

--input-file reloads all entries from the session file. Completed downloads are automatically dropped from the next session write. Adding --force-save=true retains completed entries for audit purposes, but requires manual pruning of the session file as the download set grows.

5.2 Persistent single-session workflow

A single session file can serve as both input and output, providing a self-maintaining queue of unfinished downloads across multiple aria2 invocations.

touch aria2-session.txt
aria2c --input-file=aria2-session.txt \
  --save-session=aria2-session.txt \
  --save-session-interval=30 \
  -c -x8 -s8 -k16M \
  -d ~/downloads

New URLs can be added to the queue at any time by running a separate aria2c -i urls.txt ... --save-session=aria2-session.txt invocation. The session file accumulates all unfinished tasks and the next resume run picks them all up automatically.

6. Jetson-specific configuration and filesystem Considerations

6.1 Recommended baseline flags for Jetson AGX Orin

The Jetson AGX Orin has fast CPU cores and ample RAM, but disk throughput and storage capacity may be shared across concurrent inference workloads. The following practices are tuned for this profile:

Use a dedicated directory per model project (~/models, ~/hf_cache) to keep .aria2 control files co-located with their data files and avoid cross-directory confusion.
Prefer --file-allocation=none for fast startup. Switch to falloc only when pre-allocation is explicitly needed for fragmentation control on large multi-GB artefacts.
Start with -x8 -s8 and increase to 16 if the connection and server support it without triggering rate limiting.
Always include -c for any file larger than a few hundred megabytes to guard against accidental restarts.

Add a shell alias to ~/.bashrc to standardize the baseline:

echo "alias aria2fast='aria2c -c -x8 -s8 -k16M --file-allocation=none --summary-interval=30'" >> ~/.bashrc
source ~/.bashrc

Usage with the alias:

aria2fast -d ~/models -o model.gguf "https://huggingface.co/.../model.gguf?download=true"

A Hugging Face-specific alias with longer retry delays handles the backend's rate limiting more gracefully:

echo "alias hfaria='aria2c -c -x8 -s8 -k32M --retry-wait=10 --max-tries=0 --file-allocation=none --summary-interval=60'" >> ~/.bashrc
source ~/.bashrc

6.2 File allocation on Jetson NVMe

On the Jetson NVMe (ext4 with extents), --file-allocation=falloc pre-allocates the full file using fallocate(2), which is fast and reduces fragmentation for multi-GB files. On slower or older filesystems, --file-allocation=none avoids a blocking write pass at startup.

aria2c -c -x8 -s8 -k32M \
  --file-allocation=falloc \
  -d ~/models \
  "https://huggingface.co/.../model.gguf?download=true"

Monitor available storage before long downloads with df -h. An out-of-space condition frequently manifests as repeated failures at the same completion percentage, which can be misread as a network or server error.

7. Filename issues from redirect chains

7.1 Why downloads receive hash-like filenames

When downloading from a Hugging Face resolve URL, the server redirects through an internal S3-style backend (cas-bridge.xethub.hf.co or similar object storage). The final HTTP response path contains an opaque SHA-like object key rather than the human-readable model filename. If no explicit output name is provided with -o, aria2 saves the file using that object key as the filename, producing output such as:

c56b8f0416a453a53aace7bef4a088a2c2db33c3b8a4eda949a380c214420b31

The fix is to always specify -o with the intended filename. This flag forces the output name regardless of redirects or Content-Disposition headers:

cd ~/models

aria2c -c -x8 -s8 -k32M \
  --file-allocation=none \
  -o gemma-4-31B-it-Q4_K_M.gguf \
  "https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/resolve/main/gemma-4-31B-it-Q4_K_M.gguf?download=true"

7.2 Renaming an existing hash-named partial file

If a large partial download was already saved under a hash name, it can be renamed without discarding the downloaded data. Both the data file and its .aria2 sidecar must be renamed to matching names simultaneously:

cd ~/models

mv c56b8f0416a453a53aace7bef4a088a2c2db33c3b8a4eda949a380c214420b31 \
   gemma-4-31B-it-Q4_K_M.gguf

mv c56b8f0416a453a53aace7bef4a088a2c2db33c3b8a4eda949a380c214420b31.aria2 \
   gemma-4-31B-it-Q4_K_M.gguf.aria2

After renaming, re-run the standard aria2 command with -o gemma-4-31B-it-Q4_K_M.gguf from the same directory. aria2 will locate the renamed pair and resume only the missing segments.

8. Failure modes and recovery

8.1 HTTP 403 errors during download (errorCode=22)

Error lines of the form errorCode=22 … status=403 indicate that individual HTTP segment requests were rejected by the backend. This occurs most commonly near the end of a long download from Hugging Face for two reasons:

Signed S3 URLs in the redirect chain expire mid-download on very large or slow-connection transfers.
High concurrency (-x16 -s16) triggers per-connection rate limiting on popular models.

When errorCode=22 appears but the download ultimately reports stat|OK, aria2 recovered and retried successfully. To reduce the frequency of these errors:

aria2c -c -x8 -s8 -k32M \
  --retry-wait=10 --max-tries=0 \
  --file-allocation=none \
  -o gemma-4-31B-it-Q4_K_M.gguf \
  "https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/resolve/main/gemma-4-31B-it-Q4_K_M.gguf?download=true"

Lower -x/-s values reduce the number of simultaneous signed segment requests in flight. Combined with --retry-wait=10 and --max-tries=0, aria2 waits calmly between retries rather than hammering the backend.

8.2 Persistent failures at a fixed byte offset

When a download fails repeatedly at the same percentage or offset:

Reduce -x and -s to lower concurrent byte-range requests.
Increase --retry-wait (e.g., --retry-wait=30) and keep --max-tries=0 to allow extended retry cycles.
Check for local storage or filesystem errors with dmesg and journalctl -xe. Write errors on Jetson NVMe can present as download failures at the application layer.
If the .aria2 control file is corrupted and resume fails consistently, delete both the partial data file and the .aria2 file, then restart that specific download only — not the entire batch.

8.3 Quick command reference

Use case	Command
Fast single model download	`aria2c -c -x16 -s16 -k64M --file-allocation=none -d ~/models -o FILE "URL"`
Conservative single model	`aria2c -c -x8 -s8 -k16M --file-allocation=none -d ~/models -o FILE "URL"`
Resume single file	`aria2c -c -x8 -s8 -o FILE "URL"` (run from same dir, `.aria2` present)
Batch from URL list	`aria2c -i urls.txt --save-session=aria2-session.txt -c -x8 -s8 -d ~/models`
Resume batch via session	`aria2c --input-file=aria2-session.txt --save-session=aria2-session.txt -c`
New signed URL for partial	`aria2c -c --auto-file-renaming=false -d DIR -o FILE "NEW_URL"`
Retain completed in session	Add `--force-save=true`; prune session file manually as needed

Table 2 — aria2 command reference for Jetson AGX Orin

9. Practical outcomes

Established a correct and efficient aria2 command pattern for large Hugging Face model downloads on Jetson AGX Orin, including proper flag ordering for -x, -s, and -k, and the mandatory use of -o to avoid hash-like filenames from object storage redirects.
Documented resume mechanics using .aria2 control files, the -c flag, and --always-resume=true, with explicit guidance on what to do when the control file is missing or the data file has been renamed.
Provided a signed-URL resume pattern using --auto-file-renaming=false that handles expiring links without restarting partial downloads.
Defined batch download patterns using URL list files and persistent session files (--save-session, --input-file) for multi-shard model repositories such as safetensors split sets.
Captured Jetson-specific defaults covering file allocation modes (none vs. falloc), directory hygiene, concurrency tuning, disk space monitoring, and error recovery for long-running downloads.
Identified the root cause of HTTP 403 (errorCode=22) errors during large Hugging Face downloads as expiring signed URLs and per-connection rate limiting, with a prescriptive mitigation using reduced concurrency and extended retry delays.

10. Conclusions and recommendations

aria2 reliably saturates available bandwidth for large LLM downloads on Jetson AGX Orin and handles interruptions gracefully, provided three practices are consistently followed: always pass -c for large files; use a stable output directory so .aria2 control files remain co-located with their data files; and set --max-tries=0 so aria2 recovers from transient failures without manual intervention.

For daily workflows, standardize on one or two shell aliases (aria2fast, hfaria) and always invoke aria2 from the same directory paths. Always specify -o with an explicit filename when downloading from Hugging Face, as the platform's object storage backend assigns opaque hash-like keys that aria2 will use as the filename in the absence of an explicit override. This eliminates the most common source of filename confusion and simplifies subsequent resume operations.

When troubleshooting stubborn failures, apply interventions in order: reduce concurrency first, then increase retry delay, then inspect disk and filesystem health with dmesg and journalctl. Delete a partial file and its .aria2 sidecar only as a last resort, and only for the specific file that is failing — not for the entire batch. If a transient Hugging Face backend outage is causing persistent 403 errors near completion, the most effective response is to wait and retry rather than to restart a near-complete multi-gigabyte download from scratch.

Network Optimization Tutorial For NVIDIA Jetson AGX Orin 64 GB

Sergio Andres Usma — Sun, 05 Apr 2026 22:14:49 +0000

Abstract

This tutorial documents a systematic approach to network performance optimization on an NVIDIA Jetson AGX Orin Developer Kit 64 GB running Ubuntu 22.04.5 LTS (aarch64) with JetPack 6.2.2, CUDA 12.6, cuDNN 9.3.0, OpenCV 4.8.0, and TensorRT 10.3.0.30. The procedure covers kernel TCP buffer tuning, MTU adjustment on the eno1 wired interface, APT parallel download configuration, and aria2 multi-connection download tooling. All steps include pre-change backups and a dedicated revert procedure.

The guide is structured as a production-oriented, step-by-step procedure rather than a reference summary. It includes a consolidated interactive Bash script that auto-detects the primary wired interface (eno1 or eth0), backs up affected configuration files before any changes, and applies each optimization only with explicit operator consent. Troubleshooting guidance is included for the RTNETLINK answers: Device or resource busy error encountered during MTU changes on Jetson hardware.

System administrators and Edge AI developers with intermediate Linux experience will benefit from this document when preparing a Jetson AGX Orin for workloads that involve frequent large model downloads, frequent package updates, or sustained high-bandwidth data transfers. The tutorial assumes shell access with sudo privileges and familiarity with a terminal text editor such as nano.

1. Prerequisites and Environment

1.1 Hardware and Software Specifications

The following table describes the environment in which all commands were validated. Applying these optimizations on a different JetPack release or kernel version may require adjusting parameter values.

Component	Value
Hardware	NVIDIA Jetson AGX Orin Developer Kit 64 GB
OS	Ubuntu 22.04.5 LTS aarch64 (L4T 36.5.0)
JetPack	nvidia-jetpack 6.2.2+b24
CUDA	12.6.68
cuDNN	9.3.0
OpenCV	4.8.0
TensorRT	10.3.0.30
Kernel	5.15.185-tegra

Table 1 — Validated hardware and software environment

1.2 Required Permissions and Tools

All system configuration steps require sudo access. The following tools are used throughout the tutorial and are available by default on JetPack installations: nano, cp, sysctl, ip, apt, ping, and bash. The aria2c binary is installed in Section 6.

1.3 Primary Wired Interface Name

On the Jetson AGX Orin Developer Kit, the wired Ethernet interface is exposed as eno1 under predictable network interface naming (udev rules). Some custom images or older configurations may still use eth0. Where commands target a specific interface, both names are provided. The ip link command identifies the correct name on any given system.

2. Pre-Change Backup Procedure

Before modifying any system configuration file, create timestamped backups using the .backup-pre-netopt suffix. This convention makes backup files easy to identify and is expected by the revert commands in Section 9.

Run the following once before proceeding to any subsequent section:

# Backup sysctl configuration
sudo cp /etc/sysctl.conf /etc/sysctl.conf.backup-pre-netopt

# Back up the APT parallel config only if it already exists
if [ -f /etc/apt/apt.conf.d/99parallel ]; then
  sudo cp /etc/apt/apt.conf.d/99parallel /etc/apt/apt.conf.d/99parallel.backup-pre-netopt
fi

Verify the backup was created:

ls -lh /etc/sysctl.conf.backup-pre-netopt

The backup captures the unmodified state of /etc/sysctl.conf. If the APT configuration file does not yet exist (first-time setup), no APT backup is created; the revert script handles this case by removing the file rather than restoring a backup.

3. Maximum Performance Mode

Dynamic CPU and GPU frequency scaling can reduce network throughput indirectly by limiting the processing available to TCP stack operations, protocol encryption, and receive-side data handling. NVIDIA provides two utilities to force the Jetson into its highest power and clock configuration:

sudo nvpmodel -m 0
sudo jetson_clocks

nvpmodel -m 0 selects power model 0, which is the maximum performance profile on Jetson AGX Orin.
jetson_clocks locks CPU, GPU, and memory frequencies to their maximum values and disables dynamic frequency scaling.

These commands take effect immediately but do not persist across reboots. If your workload restarts after system reboots, add both commands to a startup service or run them manually before beginning large download or inference sessions.

4. Kernel Network Parameter Tuning

Kernel-level TCP parameters govern how much memory is allocated to socket buffers and how the TCP stack behaves under high-throughput conditions. The defaults in a stock Ubuntu image are conservative and were not tuned for sustained high-bandwidth transfers of the kind required when pulling large AI model checkpoints.

4.1 Edit sysctl Configuration

Open the sysctl configuration file:

sudo nano /etc/sysctl.conf

Append the following block at the end of the file:

# Disable IPv6 (optional — avoids latency in name resolution on IPv4-only networks)
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

# Increase TCP buffers for high-speed downloads
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_window_scaling = 1

Note: The IPv6 disable block is optional. Omit those three lines if the system connects to IPv6-only or dual-stack services. The interactive script in Section 8 prompts for this choice at runtime.

4.2 Apply the Configuration

Save the file and apply all settings immediately without rebooting:

sudo sysctl -p

The output lists each applied parameter and its new value. The settings increase the maximum socket receive and send buffer sizes to 16 MB, expand the default and maximum TCP window memory, disable the slow-start penalty after a connection has been idle, and confirm that TCP window scaling is active.

5. MTU Adjustment for Wired Interface

The Maximum Transmission Unit (MTU) controls the largest payload that can be sent in a single Ethernet frame without IP fragmentation. A mismatch between the Jetson's MTU and the network path MTU can cause silent retransmissions and degraded throughput. On some networks, setting MTU to 1450 bytes avoids fragmentation caused by VPN or tunnel encapsulation overhead.

5.1 Identify the Active Wired Interface

List all network interfaces and their current MTU values:

ip link

A typical output on Jetson AGX Orin looks like:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ...
3: wlP1p1s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...
5: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4X:bX:4X:4X:6X:XX brd ff:ff:ff:ff:ff:ff
6: l4tbr0: <BROADCAST,MULTICAST> mtu 1500 ...
7: usb0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...
8: usb1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...
9: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...

The active wired interface is the one in UP state with a hardware Ethernet address. On Jetson AGX Orin this is typically eno1. Only change the MTU of the interface that carries the traffic being optimized.

5.2 Change MTU at Runtime

For eno1:

sudo ip link set dev eno1 mtu 1450

For systems using eth0:

sudo ip link set dev eth0 mtu 1450

This change applies immediately but resets to 1500 on reboot. For a persistent configuration, use Netplan or NetworkManager as described in Section 5.4.

5.3 Troubleshooting: "RTNETLINK answers: Device or resource busy"

This error indicates that a higher-level service holds the interface or that the interface is a member of a bridge. It is common on Jetson because eno1 can be associated with the l4tbr0 bridge used for USB networking.

Option 1 — Bring the interface down, change MTU, then bring it back up:

sudo ip link set dev eno1 down
sudo ip link set dev eno1 mtu 1450
sudo ip link set dev eno1 up

Replace eno1 with eth0 if that is the active interface. Connectivity is interrupted briefly while the interface is down.

Option 2 — Change MTU on the bridge instead of the physical interface:

If traffic passes through l4tbr0, apply the MTU to the bridge:

sudo ip link set dev l4tbr0 mtu 1450

Verify connectivity immediately after this change.

Option 3 — Use the network manager to apply the change:

For NetworkManager-managed connections:

nmcli connection show
nmcli connection modify "<connection-name>" 802-3-ethernet.mtu 1450
nmcli connection down "<connection-name>"
nmcli connection up "<connection-name>"

For Netplan-managed interfaces, edit the relevant YAML file (see Section 5.4) and apply.

If none of these options resolve the error without disrupting connectivity, leave the MTU at 1500 and focus on the kernel TCP tuning in Section 4 and the aria2 tooling in Section 6.

5.4 Persistent MTU via Netplan

To make the MTU change survive reboots, edit the Netplan configuration file for the interface. The file is typically located at /etc/netplan/01-netcfg.yaml or a similarly named file in /etc/netplan/:

sudo nano /etc/netplan/01-netcfg.yaml

Add or update the mtu key for the interface:

network:
  ethernets:
    eno1:
      mtu: 1450
      dhcp4: true
  version: 2

Apply the change:

sudo netplan apply

6. APT Download Optimization and aria2 Installation

6.1 APT Parallel Download Configuration

APT downloads package lists and archives sequentially by default, which underutilizes available bandwidth when fetching many packages. A drop-in configuration file in /etc/apt/apt.conf.d/ can improve this behavior without modifying the main APT configuration.

Create the configuration snippet:

sudo nano /etc/apt/apt.conf.d/99parallel

Add the following content:

Acquire::Languages "none";
Acquire::Queue-Mode "access";
Acquire::Retries "3";
Acquire::http::Pipeline-Depth "5";

Acquire::Languages "none" suppresses the download of translated package description files, which are rarely needed on a headless Edge AI system.
Acquire::Queue-Mode "access" prioritizes fetching from the same server before switching, reducing connection overhead.
Acquire::Retries "3" retries failed downloads up to three times before failing.
Acquire::http::Pipeline-Depth "5" sends up to five HTTP requests in flight simultaneously on persistent connections, improving throughput on reliable links.

The changes take effect on the next sudo apt update or sudo apt upgrade invocation.

6.2 aria2 for Large File Downloads

For AI model checkpoints, dataset archives, or container images that exceed several gigabytes, aria2 opens multiple parallel connections to the same server and splits the file into segments. This approach can saturate available bandwidth more effectively than single-threaded tools such as wget or curl.

Install aria2:

sudo apt install aria2 -y

Download a large file using 16 parallel connections and 16 segments:

aria2c -x 16 -s 16 "URL_TO_LARGE_FILE"

-x 16 sets the maximum number of simultaneous connections per server.
-s 16 splits the download into 16 segments, each fetched by a separate connection.

Reduce both values on congested networks or when the target server enforces per-IP connection limits.

7. Consolidated Automation Script

The interactive script below consolidates all optimization steps into a single file. It detects the primary wired interface automatically, creates backups before any changes, and prompts for confirmation before each optimization section. IPv6 disabling and APT tuning are presented as optional to preserve compatibility with environments that depend on those behaviors.

Create the script file:

nano ~/jetson_network_opt.sh

Paste the following content:

#!/usr/bin/env bash
set -e

echo "=== Jetson Network Optimization Script ==="
echo "This script will:"
echo "  - Backup /etc/sysctl.conf and /etc/apt/apt.conf.d/99parallel (if present)"
echo "  - Optionally tune kernel TCP parameters"
echo "  - Optionally disable IPv6"
echo "  - Optionally adjust MTU for the primary wired interface"
echo "  - Optionally optimize APT downloads"
echo "  - Optionally install aria2"
echo

read -rp "Continue? [y/N]: " CONTINUE
if [[ ! "$CONTINUE" =~ ^[Yy]$ ]]; then
  echo "Aborting."
  exit 0
fi

echo
echo "== Detecting primary wired interface =="

PRIMARY_IF=""
if ip link show eno1 &>/dev/null; then
  PRIMARY_IF="eno1"
elif ip link show eth0 &>/dev/null; then
  PRIMARY_IF="eth0"
fi

if [[ -z "$PRIMARY_IF" ]]; then
  echo "Warning: neither eno1 nor eth0 detected. You may need to edit this script to use your interface name."
else
  echo "Primary wired interface detected: $PRIMARY_IF"
fi

echo
echo "== 1) Backing up configuration files =="

if [ -f /etc/sysctl.conf ]; then
  sudo cp /etc/sysctl.conf /etc/sysctl.conf.backup-pre-netopt
  echo "Backup: /etc/sysctl.conf.backup-pre-netopt created."
fi

if [ -f /etc/apt/apt.conf.d/99parallel ]; then
  sudo cp /etc/apt/apt.conf.d/99parallel /etc/apt/apt.conf.d/99parallel.backup-pre-netopt
  echo "Backup: /etc/apt/apt.conf.d/99parallel.backup-pre-netopt created."
fi

echo
echo "== 2) Kernel TCP parameter tuning =="

read -rp "Apply TCP buffer and window tuning in /etc/sysctl.conf? [y/N]: " APPLY_TCP
if [[ "$APPLY_TCP" =~ ^[Yy]$ ]]; then
  sudo bash -c 'cat >> /etc/sysctl.conf <<EOF

# Jetson network optimization - TCP buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_window_scaling = 1
EOF'
  sudo sysctl -p
  echo "TCP parameters applied."
else
  echo "Skipping TCP parameter tuning."
fi

echo
echo "== 3) IPv6 behavior =="

echo "If you rely on IPv6 (e.g. IPv6-only or dual-stack networks), DO NOT disable it."
read -rp "Disable IPv6 via /etc/sysctl.conf? [y/N]: " DISABLE_IPV6
if [[ "$DISABLE_IPV6" =~ ^[Yy]$ ]]; then
  sudo bash -c 'cat >> /etc/sysctl.conf <<EOF

# Jetson network optimization - disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
EOF'
  sudo sysctl -p
  echo "IPv6 disabled (sysctl)."
else
  echo "Keeping IPv6 enabled."
fi

echo
echo "== 4) MTU adjustment for primary wired interface =="

if [[ -z "$PRIMARY_IF" ]]; then
  echo "No primary interface detected (eno1/eth0). Skipping MTU change."
else
  ip link show "$PRIMARY_IF"
  read -rp "Set MTU 1450 on $PRIMARY_IF (runtime only, reset on reboot)? [y/N]: " SET_MTU
  if [[ "$SET_MTU" =~ ^[Yy]$ ]]; then
    if ! sudo ip link set dev "$PRIMARY_IF" mtu 1450; then
      echo "Failed to set MTU on $PRIMARY_IF (device or resource busy?)."
      echo "You may need to adjust MTU via NetworkManager, Netplan, or a bridge (e.g. l4tbr0)."
    else
      echo "MTU adjustment attempted on $PRIMARY_IF."
    fi
  else
    echo "Skipping MTU change."
  fi
fi

echo
echo "== 5) APT optimization =="

echo "This will create or overwrite /etc/apt/apt.conf.d/99parallel."
read -rp "Apply APT optimization? [y/N]: " APPLY_APT
if [[ "$APPLY_APT" =~ ^[Yy]$ ]]; then
  sudo bash -c 'cat > /etc/apt/apt.conf.d/99parallel <<EOF
Acquire::Languages "none";
Acquire::Queue-Mode "access";
Acquire::Retries "3";
Acquire::http::Pipeline-Depth "5";
EOF'
  echo "APT optimization applied."
else
  echo "Skipping APT optimization."
fi

echo
echo "== 6) aria2 installation =="

read -rp "Install aria2 for multi-connection downloads? [y/N]: " INSTALL_ARIA2
if [[ "$INSTALL_ARIA2" =~ ^[Yy]$ ]]; then
  sudo apt update
  sudo apt install -y aria2
  echo "aria2 installed."
else
  echo "Skipping aria2 installation."
fi

echo
echo "All selected steps completed."

Make the script executable and run it:

chmod +x ~/jetson_network_opt.sh
~/jetson_network_opt.sh

The script exits cleanly if the operator declines any step. Interface detection runs once at startup and the result is reused for MTU operations. If neither eno1 nor eth0 is present, the MTU section is skipped automatically with a diagnostic message.

8. Reverting All Changes

If connectivity degrades or system behavior changes unexpectedly after applying these optimizations, revert to the pre-optimization state using the backups created in Section 2.

8.1 Restore sysctl and APT Configuration

Run the following to restore both configuration files:

# Restore sysctl configuration if backup exists
if [ -f /etc/sysctl.conf.backup-pre-netopt ]; then
  sudo cp /etc/sysctl.conf.backup-pre-netopt /etc/sysctl.conf
  echo "Restored /etc/sysctl.conf from backup."
  sudo sysctl -p
fi

# Restore APT optimization file if backup exists, otherwise remove it
if [ -f /etc/apt/apt.conf.d/99parallel.backup-pre-netopt ]; then
  sudo cp /etc/apt/apt.conf.d/99parallel.backup-pre-netopt /etc/apt/apt.conf.d/99parallel
  echo "Restored /etc/apt/apt.conf.d/99parallel from backup."
else
  if [ -f /etc/apt/apt.conf.d/99parallel ]; then
    sudo rm /etc/apt/apt.conf.d/99parallel
    echo "Removed /etc/apt/apt.conf.d/99parallel created by optimization."
  fi
fi

The sysctl -p call within the restore block immediately reloads the original kernel parameters without requiring a reboot.

8.2 Revert MTU

The MTU change is runtime-only and resets automatically on the next reboot. To revert it immediately without rebooting:

sudo ip link set dev eno1 mtu 1500

Or for eth0:

sudo ip link set dev eth0 mtu 1500

8.3 Remove aria2

If aria2 was installed solely for this workflow and is no longer needed:

sudo apt remove -y aria2

After reverting, re-run the connectivity verification in Section 9 to confirm normal operation.

9. Practical Outcomes

Maximum performance mode: nvpmodel -m 0 and jetson_clocks eliminate CPU and GPU frequency throttling, ensuring consistent processing headroom during sustained network activity. Both commands must be re-run after reboot unless integrated into a startup service.
Improved TCP buffer sizing: Kernel parameters raise socket buffer limits to 16 MB and disable the slow-start penalty after idle periods, measurably improving throughput on high-bandwidth links carrying large file transfers.
Optional IPv6 control: The IPv6 disable block is presented as an explicit choice rather than a default, preserving compatibility with dual-stack and IPv6-only environments. The interactive script enforces this distinction at runtime.
Correct interface targeting: MTU changes target eno1 by default on Jetson AGX Orin hardware, with automatic fallback to eth0 in the automation script. Three resolution paths for the RTNETLINK answers: Device or resource busy error are documented and tested.
APT efficiency: The 99parallel drop-in configuration reduces unnecessary package list downloads, enables HTTP pipelining, and adds retry resilience without modifying core APT behavior.
Multi-connection downloads: aria2c -x 16 -s 16 saturates available bandwidth when downloading large AI model archives from servers that permit multiple concurrent connections. The tool installs and removes cleanly via APT.
Safe configuration management: Backups with the .backup-pre-netopt suffix and a dedicated revert procedure reduce the risk of persistent misconfiguration. All changes can be undone without rebooting, except MTU (which also resets on reboot).

10. Conclusion

Applying the optimizations described in this tutorial to a Jetson AGX Orin 64 GB running JetPack 6.2.2 produces a measurable improvement in network throughput for Edge AI development workflows, particularly those involving repeated large model downloads and frequent JetPack package updates. The combination of maximum performance mode, expanded TCP buffers, interface-aware MTU adjustment, APT pipeline tuning, and aria2 multi-connection downloads addresses the principal bottlenecks encountered on this platform without requiring kernel rebuilds or third-party drivers.

The backup and revert procedures in Sections 2 and 8, together with the interactive automation script in Section 7, reduce operational risk to a level appropriate for both development and production-adjacent systems. Each optimization is applied with explicit consent and can be reversed independently.

For production deployments, consider encoding these settings into a configuration management tool such as Ansible, committing the Netplan YAML changes for persistent MTU configuration, and coordinating with the local network team to confirm that a 1450-byte MTU is appropriate for the network path in use. Document any deviations from the values shown here alongside the JetPack version in use, as kernel and L4T updates may change default TCP stack behavior.

Creating a 50 GB Swap File on Jetson AGX Orin (Root on NVMe)

Sergio Andres Usma — Sun, 05 Apr 2026 17:59:02 +0000

Abstract

This document describes the process of creating, tuning, and managing a large swap file on an NVIDIA Jetson AGX Orin 64 GB running Ubuntu 22.04.5 LTS aarch64. The configuration is specifically optimized for running large language models (LLMs) alongside CUDA, cuMB, and TensorRT by leveraging a fast NVMe SSD as the primary swap backing store.

The implementation was validated using a 50 GB swap file configuration alongside existing zram layers. The procedure successfully extended the usable memory capacity, allowing for the deployment of larger models without triggering immediate Out-Of-Memory (OOM) errors, provided the storage-to-RAM paging latency is acceptable.

This tutorial serves as a technical reference for advanced Jetson and Linux users. It provides a reproducible method for extending virtual memory on edge AI hardware to support demanding 34B–70B parameter models.

1. Hardware and Software Environment

The target environment is an NVIDIA Jetson AGX Orin Developer Kit equipped with 64 GB of unified memory. The system runs Ubuntu 22.04.5 LTS on an aarch64 kernel (5.15.185-tegra). The installation includes JetPack 6.2.2, providing the necessary software stack for AI inference, including CUDA 12.6, cuDNN 9.3.0, and TensorRT 10.3.0.

The primary storage for the swap file is the NVMe SSD, which serves as the root filesystem. This choice is critical for minimizing the performance penalty during memory paging operations.

Component	Detail
Hardware	NVIDIA Jetson AGX Orin Developer Kit 64 GB
OS	Ubuntu 22.04.5 LTS aarch64
Kernel	5:15.185-tegra
RAM	64 GB unified memory
JetPack	6.2.2+b24 (nvidia-jetpack)
CUDA	12.6 (nvcc 12.6.68)
cuDNN	9.3.0
TensorRT	10.3.0.30-1+cuda12.5

Table 1 — Jetson AGX Orin environment for swap configuration

2. Swap Location Strategy

Effective swap placement is determined by the throughput and endurance of the underlying storage media. On the Jetson AGX Orin, the system utilizes eMMC for the boot partition and an NVMe SSD for the primary root filesystem.

Storage	Approx Speed	Recommendation
NVMe SSD	~2000 MB/s	Best — primary location for swap
eMMC	~400 MB/s	Secondary fallback; higher wear risk
USB Drive	~100 MB/s	Not recommended due to high latency

Table 2 — Recommended swap backing storage on Jetson AGX Orin

For this configuration, the swap file is placed directly on the NVMe-backed root filesystem (/) at /swapfile. This ensures the highest possible I/O performance for paging operations.

3. Step-by-Step Swap File Creation

The following steps outline the allocation and initialization of a 50 GB swap file.

3.1 Check Devices and Free Space

Before allocation, verify the available space on the target partition. The lsblk command confirms the mount points, while df -h verifies the capacity.

# List block devices and mount points
lsblk -o NAME,SIZE,TYPE,MOSQL,ROTA

# Check free space on the root filesystem
df -h /

The current configuration shows approximately 636 GB of available space on /dev/nvme0n1p1, which is more than sufficient for a 50 GB allocation.

3.2 Create the Swap File

The fallocate utility is used to pre-allocate the file space efficiently.

# Allocate 50 GB for the swap file on the root filesystem
sudo fallocate -l 50G /swapfile

3.3 Secure and Format the Swap File

Security is paramount; the swap file must be restricted to root-only access to prevent sensitive data leakage from memory to disk.

# Restrict permissions to root read/write only
sudo chmod 600 /swapfile

# Format the file as swap space
sudo mkswap /swapfile

3.4 Enable the Swap File

Once formatted, the swap file must be activated in the running kernel.

# Enable the swap file
sudo swapon /swapfile

# Verify active swap devices
swapon --show

# Confirm memory and swap totals
free -h

4. Making Swap Persistent Across Reboots

To ensure the swap file is automatically re-enabled upon system restart, an entry must be added to the /etc/fstab configuration file.

# Append the swap file definition to /etc/fstab
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

# Verify the entry exists
grep swap /etc/fstab

5. Tuning Swappiness and zram for LLM Workloads

Optimal performance for LLM inference requires tuning the kernel to prioritize physical RAM and the compressed zram layer over the disk-backed swap file.

5.1 Adjust Swappiness and Cache Pressure

Lowering the swappiness value instructs the kernel to avoid swapping pages to the NVMe SSD unless absolutely necessary.

# Apply settings immediately
sudo sysctl vm.swappiness=10
sudo sysctl vm.vfs_cache_pressure=50

# Persist the settings across reboots
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
echo 'vm.vfs_cache_pressure=50' | sudo tee -a /etc/sysctl.conf

# Reload sysctl configuration
sudo sysctl -p

Swappiness	Behavior Description
0	Swap only when absolutely out of RAM
10	Recommended for LLM workloads
60	Typical Linux default
100	Very aggressive swapping

Table 3 — Swappiness values and behavior for Jetson LLM use

6. Relationship Between zram and /swapfile

The Jetson system utilizes a tiered memory architecture. The zram-config service provides several compressed RAM-based swap devices (zram0 through zram11). The hierarchy of memory allocation is as follows:

Physical RAM (64 GB unified memory)
zram (Compressed swap in RAM, ~31 GB total)
NVMe Swap File (50 GB on /swapfile)

This tiered approach allows the kernel to handle small, compressible allocations within the highly efficient zram layer before resorting to the higher-latency NVMe disk-backed swap.

7. Removing or Reconfiguring the Swap File

If disk space needs to be reclaimed, the swap file can be decommissioned following these steps:

# Disable the swap file usage
sudo swapoff /swapfile

# Remove the entry from /etc/fstab
sudo sed -i '/\/swapfile/d' /etc/fstab

# Delete the physical file
sudo rm /swapfile

# Reload sysctl to refresh kernel state
sudo sysctl -p

8. Practical Outcomes

Increased Capacity: Successfully established a 50 GB swap area on NVMe, expanding the total virtual memory capacity.
Stability: Provided a critical safety margin for running 70B parameter models (e.g., Q4_K_M) that may exceed the 64 GB physical RAM limit during peak usage.
Optimized Hierarchy: Integrated the new disk-backed swap into the existing zram architecture without disrupting the compressed RAM layer.
Persistence: Achieved a fully automated configuration that survives system reboots via /etc/fstab tuning.

9. Conclusions

Configuring a large, NVMe-backed swap file is a highly effective strategy for maximizing the utility of the NVIDIA Jetson AGX Orin 64 GB for large-scale AI workloads. By following the documented procedure of using fallocate, setting strict chmod 600 permissions, and tuning swappiness to 10, users can achieve a stable environment capable of handling models that exceed physical memory boundaries.

While the performance penalty of disk-based swapping is unavoidable, the use of high-speed NVMe storage and a tiered zram approach minimizes the impact on inference latency, making it a viable solution for non-interactive or batch processing of 34B–70B parameter models.

Check NVIDIA Jetson AGX Orin Specifications

Sergio Andres Usma — Sun, 05 Apr 2026 16:41:10 +0000

Abstract

This document provides a systematic, reproducible method for new users to verify every hardware and software component on an NVIDIA Jetson AGX Orin 64 GB developer kit running Ubuntu 22.04.5 LTS. The approach walks through the exact commands needed to collect CPU, GPU, memory, kernel, JetPack, CUDA, cuDNN, TensorRT, and OpenCV data, then compresses the findings into a single‑line summary for quick sharing.

The verification process works on a clean Jetson image with no custom configuration. All commands are standard Ubuntu packages, so they can be run immediately after booting without installing additional tools. The script at the end automates the entire workflow for future use.

Reading the summary proves the system matches the advertised specifications and serves as a baseline for troubleshooting or compliance checks. Developers, reviewers, and CI pipelines can all reuse this tutorial to guarantee that a Jetson board meets its nominal performance envelope.

1. Hardware and Software Environment

1.1 Jetson board identification

Run:

cat /sys/firmware/devicetree/base/model
cat /sys/firmware/devicetree/base/compatible

You should see:

NVIDIA Jetson AGX Orin Developer Kit
nvidia,p3737-0000+p3701-0005
nvidia,p3701-0005
nvidia,tegra234

This confirms the AGX Orin developer kit and the expected compatible strings.

1.2 Operating system (Ubuntu 22.04.5 LTS)

Run:

lsb_release -a
cat /etc/os-release

Typical output:

Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy

and:

PRETTY_NAME="Ubuntu 22.04.5 LTS"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
ARCH=x86_64

These lines show you are on Ubuntu 22.04.5 LTS for aarch64.

2. CPU details

Run:

lscpu
cat /proc/cpuinfo | grep -E "model name|Processor|Features"
nproc

Key fields from lscpu:

Architecture:          aarch64
Model name:            Cortex-A78AE
CPU(s):                12
CPU max MHz:           2201.6001
CPU min MHz:           115.2000

/proc/cpuinfo repeats the same model name (ARMv8 Processor rev 1 (v8l)) and lists the supported flags.

This tells the user the board has 12 ARMv8 cores running up to ~2.2 GHz.

3. Memory

Run:

free -h
cat /proc/meminfo

free -h example:

Mem:   61Gi  used 5.8Gi  free 51Gi  buff/cache 3.6Gi  available 55Gi
Swap:  30Gi  used 0B     free 30Gi

/proc/meminfo provides the raw totals in kB (e.g., MemTotal: 64335836 kB).

Together they show ~7.4 GiB used out of ~61 GiB.

4. JetPack and Jetson Linux (L4T) versions

4.1 JetPack meta‑package

apt-cache show nvidia-jetpack

Relevant lines:

Source: nvidia-jetpack (6.2.2)
Version: 6.2.2+b24
Architecture: arm64
Maintainer: NVIDIA Corporation
Depends: nvidia-jetpack-runtime (= 6.2.2+b24), nvidia-jetpack-dev (= 6.2.2+b24)

4.2 L4T release (Jetson Linux R36.5)

cat /etc/nv_tegra_release

Output example:

# R36 (release), REVISION: 5.0, GCID: 43688277, BOARD: generic, EABI: aarch64, DATE: Fri Jan 16 03:50:45 UTC 2026
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

Also verify the core package:

dpkg-query --show nvidia-l4t-core

Expected output:

nvidia-l4t-core 36.5.0-20260115194252

These two commands confirm you are on JetPack 6.2.2 with the matching L4T release.

5. CUDA, cuDNN, TensorRT, OpenCV

5.1 CUDA toolkit

nvcc --version

Example:

Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

5.2 cuDNN

dpkg -l | grep libcudnn

Shows libcudnn9-cuda-12 9.3.0.75-1 (runtime) and corresponding dev packages.

5.3 TensorRT

dpkg -l | grep TensorRT

Key line:

tensorrt 10.3.0.30-1+cuda12.5 arm64 Meta package for TensorRT

5.4 OpenCV

python3 -c "import cv2; print(cv2.__version__)"

Output: 4.8.0.

All four pieces of software are installed and their versions match the target specification.

6. Practical Outcomes (what worked)

Hardware detection succeeded; model and compatible strings are correct.
OS verification produced the expected Ubuntu 22.04.5 LTS aarch64 string.
CPU info confirms 12‑core ARMv8 at ~2.2 GHz.
Memory shows the advertised 61 GiB total with minimal swap usage.
JetPack 6.2.2 and L4T R36.5 were identified automatically.
CUDA 12.6, cuDNN 9.3, TensorRT 10.3, and OpenCV 4.8 are present in the exact versions.

7. Conclusion (recommendations)

The Jetson AGX Orin 64 GB developer kit is fully configured as advertised. The verification steps can be automated via the script provided in Section 8, making it suitable for CI pipelines, regression testing, or compliance reporting.

Note: The host name (ubuntu) and host display (NVIDIA Jetson AGX Orin Develop) are placeholders; you can replace them with the actual values shown by hostname and lscpu.

8. Automated script – `jetson_sysinfo.sh`

#!/usr/bin/env bash

# Simple system summary for NVIDIA Jetson AGX Orin

# Hardware
HW_MODEL=$(tr -d '\0' </sys/firmware/devicetree/base/model 2>/dev/null)
HW_MODEL=${HW_MODEL:-"Unknown"}

# OS
OS_DESC=$(lsb_release -d 2>/dev/null | cut -f2-)
OS_DESC=${OS_DESC:-$(grep -E '^PRETTY_NAME=' /etc/os-release 2>/dev/null | cut -d= -f2 | tr -d '"')}
ARCH=$(uname -m)
HOST=$(hostname)
KERNEL=$(uname -r)

# CPU
CPU_MODEL=$(grep -m1 "model name" /proc/cpuinfo 2>/dev/null | cut -d: -f2- | xargs)
[ -z "$CPU_MODEL" ] && CPU_MODEL=$(lscpu | awk -F: '/Model name/ {print $2}' | xargs)
CPU_CORES=$(nproc)
CPU_MAX=$(lscpu | awk -F: '/CPU max MHz/ {gsub(/ /,"",$2); print $2}')
CPU_MIN=$(lscpu | awk -F: '/CPU min MHz/ {gsub(/ /,"",$2); print $2}')

# Jetson / JetPack / L4T
L4T_CORE=$(dpkg-query --show nvidia-l4t-core 2>/dev/null | awk '{print $2}')
JP_SRC=$(apt-cache show nvidia-jetpack 2>/dev/null | awk -F': ' '/^Source:/ {print $2; exit}')
JP_VER=$(apt-cache show nvidia-jetpack 2>/dev/null | awk -F': ' '/^Version:/ {print $2; exit}')
JP_ARCH=$(apt-cache show nvidia-jetpack 2>/dev/null | awk -F': ' '/^Architecture:/ {print $2; exit}')
JP_MAINT=$(apt-cache show nvidia-jetpack 2>/dev/null | awk -F': ' '/^Maintainer:/ {print $2; exit}')
JP_DEPS=$(apt-cache show nvidia-jetpack 2>/dev/null | awk -F': ' '/^Depends:/ {print $2; exit}')
NVREL=$(grep -m1 '^# R' /etc/nv_tegra_release 2>/dev/null | sed 's/^# //')

# CUDA
NVCC_VER=$(nvcc --version 2>/dev/null | tail -n1)

# cuDNN
CUDNN_LINE=$(dpkg -l | awk '/libcudnn[0-9]-cuda-12/ {print $2" " $3; exit}')
[ -z "$CUDNN_LINE" ] && CUDNN_LINE="not found"

# TensorRT
TRT_LINE=$(dpkg -l | awk '/^ii  tensorrt / {print $2" " $3; exit}')
[ -z "$TRT_LINE" ] && TRT_LINE="not found"

# OpenCV
OPENCV_VER=$(python3 -c "import cv2; print(cv2.__version__)" 2>/dev/null || echo "not found")

# Print summary
echo "Hardware: ${HW_MODEL} 64GB"
echo "OS: ${OS_DESC} ${ARCH}"
echo "Host: ${HOST}"
echo "Kernel: ${KERNEL}"
echo "CPU: ${CPU_MODEL} (${CPU_CORES}) @ ${CPU_MAX%.*}MHz"
echo "CPU max MHz: ${CPU_MAX}"
echo "CPU min MHz: ${CPU_MIN}"
echo "Memory: ${MEM_LINE}"
echo "nvidia-l4t-core: ${L4T_CORE}"
[ -n "$NVREL" ] && echo "L4T release: ${NVREL}"
echo "Package: nvidia-jetpack"
echo "Source: ${JP_SRC}"
echo "Version: ${JP_VER}"
echo "Architecture: ${JP_ARCH}"
echo "Maintainer: ${JP_MAINT}"
echo "Depends: ${JP_DEPS}"
echo "nvcc: NVIDIA (R) Cuda compiler driver"
echo "${NVCC_VER}"
echo "cuDNN: ${CUDNN_LINE}"
echo "OpenCV Version: ${OPENCV_VER}"
echo "TensorRT: ${TRT_LINE}"

How to use

nano jetson_sysinfo.sh          # paste the script
chmod +x jetson_sysinfo.sh      # make it executable
./jetson_sysinfo.sh             # run – prints a compact summary

The script prints exactly the same one‑line summary shown in Section 6, making sharing as proof of configuration trivial.

Enabling Maximum Performance Mode on NVIDIA Jetson AGX Orin 64 GB

Sergio Andres Usma — Sun, 05 Apr 2026 15:58:34 +0000

Abstract

This document explains how to configure an NVIDIA Jetson AGX Orin 64 GB Developer Kit running Ubuntu 22.04.5 LTS and JetPack 6.2.2 to operate in maximum performance mode for AI workloads, especially LLM inference. It describes how to select the MAXN power mode, lock system clocks at their highest frequencies, and verify that the configuration is correctly applied with built-in NVIDIA tools and simple benchmarks. The tutorial targets users who want reproducible, high-throughput inference on a Jetson AGX Orin while retaining awareness of thermal and power constraints.

It documents the practical impact of enabling MAXN and jetson_clocks, showing how GPU frequency and token generation throughput can increase roughly threefold compared to default settings. The guide also covers how to persist these settings using a systemd service so that the device consistently boots into a high-performance state suitable for heavy AI workloads. Where relevant, it notes expected frequency values and normal operating temperatures for the Jetson AGX Orin platform.

The purpose of this tutorial is to serve as a reusable reference for configuring maximum performance on Jetson-based AI systems, integrated into a larger workflow that includes swap configuration and tool installation for LLM workloads. Readers with basic Linux and Jetson familiarity can follow step-by-step commands to prepare the device, validate the configuration, and understand when to switch between performance and power-saving modes.

1. Hardware and Software Environment

Your system is an NVIDIA Jetson AGX Orin Developer Kit 64 GB running Ubuntu 22.04.5 LTS (aarch64) with JetPack 6.2.2, CUDA 12.6, cuDNN 9.3.0, OpenCV 4.8.0, and TensorRT 10.3.0.30 installed. The CPU is an ARMv8 12-core processor with a maximum clock around 2.2 GHz, and the board exposes NVIDIA’s nvpmodel and jetson_clocks tools for power and clock management.

According to NVIDIA’s specifications, the Jetson AGX Orin 64 GB configuration can achieve up to 275 TOPS when configured in MAXN mode with clocks locked to their maximum frequencies. This tutorial assumes shell access with sudo privileges and that NVIDIA JetPack components are correctly installed from the nvidia-jetpack meta-package.

2. Why Maximum Performance Matters for AI

By default, without MAXN mode and jetson_clocks, the Jetson AGX Orin keeps GPU frequencies around 600 MHz to maintain thermal and power safety margins. Under these conservative defaults, a 7B LLM typically reaches only about 8 tokens per second during inference, which limits interactivity and throughput.

When MAXN and jetson_clocks are enabled, the GPU can run at approximately 1300 MHz, and end-to-end LLM inference throughput can increase to roughly 18–25 tokens per second on the same 7B model. This represents about a 3x performance improvement and makes interactive LLM usage and larger batch workloads more practical on the device.

3. Inspecting and Selecting Power Modes

Before changing anything, check the current power mode:

sudo nvpmodel -q

The command prints the active power mode, and the integer at the bottom of the output is the current mode ID (for example, 0 for MAXN). This lets you confirm whether the system already runs in MAXN or a more restrictive power profile.

To see all available power modes for the Jetson AGX Orin 64 GB under JetPack 6.2, run:

sudo nvpmodel -q --verbose | grep -A1 "MODE_NAME"

On this platform, the mode table typically looks like:

Mode ID	Name	TDP	CPU cores active	GPU max freq
0	MAXN	No limit (~60 W)	12	1300 MHz
1	MODE_50W	50 W	12	1100 MHz
2	MODE_30W	30 W	8	854 MHz
3	MODE_15W	15 W	4	612 MHz

Use mode ID 0 (MAXN) for the high-performance configuration described here.

4. Enabling MAXN Mode and Locking Clocks

To switch the Jetson into MAXN mode, run:

sudo nvpmodel -m 0

This update is written into /etc/nvpmodel.conf and therefore persists across reboots until you select a different mode. After this step, the Jetson operates under the highest power budget supported by its cooling solution, which is ideal for compute-heavy AI tasks.

Next, lock all clocks (CPU, GPU, and memory bus) to their maximum frequencies:

sudo jetson_clocks

This command is temporary and resets after each reboot, so it must be re-applied or automated to persist. Once applied, the system stops using dynamic frequency scaling and instead pins frequencies to their highest supported values for maximum compute performance.

5. Verifying Power Mode and Clock Frequencies

To confirm that MAXN is active and clocks are locked, run:

# Confirm power mode
sudo nvpmodel -q

# Check GPU and CPU frequencies
sudo jetson_clocks --show

The jetson_clocks --show output should include lines similar to:

CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=729600 MaxFreq=2201600 CurrentFreq=2201600 ...
GPU MinFreq=306000000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=204000000 MaxFreq=3199000000 CurrentFreq=3199000000

For a correct configuration, CurrentFreq should match MaxFreq for CPU, GPU, and EMC entries, indicating that frequencies are pinned at their maximums. If you see lower current frequencies, reapply jetson_clocks or investigate thermal throttling conditions.

6. Making jetson_clocks Persistent with systemd

To ensure jetson_clocks runs automatically at boot, first try enabling the built-in service:

sudo systemctl enable nvargus-daemon 2>/dev/null || true
sudo systemctl enable jetson_clocks 2>/dev/null || echo "Service not found, creating..."

On some JetPack versions the jetson_clocks service may not exist, in which case you can create a custom systemd unit file. The following commands define such a service, reload systemd, and enable it:

sudo tee /etc/systemd/system/jetson_clocks.service > /dev/null << 'EOF'
[Unit]
Description=Lock Jetson clocks at maximum frequency
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/jetson_clocks
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable jetson_clocks
sudo systemctl start jetson_clocks
sudo systemctl status jetson_clocks

Afterward, every boot should automatically apply jetson_clocks, and systemctl status jetson_clocks should report the service as active. This eliminates the need to manually run the command after each restart while keeping the configuration transparent and reversible via systemd.

7. Quick Performance Benchmark with LLM Inference

Once MAXN and jetson_clocks are active, you can validate real-world AI performance using an LLM benchmark. If you have an Ollama container running (as configured in a later phase of your workflow), execute:

# Benchmark: time to generate 100 tokens with a 3B model
docker exec ollama ollama run llama3.2 \
  --verbose \
  "Write a 100-word story about a robot" 2>&1 | grep -E "eval rate|tokens/s"

In MAXN mode on the Jetson AGX Orin, a llama3.2 3B Q4_K_M model is expected to reach around 25–40 tokens per second, significantly higher than default power modes. If observed throughput is substantially lower, recheck power mode, clock locking, and ensure the system is not thermally throttling or swapping heavily.

8. Monitoring Power, Temperature, and Thermal Safety

While models are running, monitor system health from a second terminal:

# Option A: tegrastats (every second)
tegrastats --interval 1000

# Option B: jtop (interactive dashboard)
jtop

In tegrastats, key fields include GPU@XXX°C for GPU temperature (targeting below about 85°C under sustained load), POM_5V_GPU Xm/Ym for GPU power draw in milliwatts, and Tboard@XXX for board temperature. MAXN mode is designed to work within the active cooling capabilities of the AGX Orin module, but blocked vents or poor airflow can still cause throttling.

Typical thermal ranges under continuous AI workloads are:

Component	Normal	Throttle starts	Emergency
GPU	50–75 °C	~85 °C	~95 °C
CPU	45–70 °C	~85 °C	~95 °C
Board	40–60 °C	—	—

If tegrastats shows throttle=1, improve ventilation or reduce workload intensity until the system stabilizes.

9. Choosing Performance vs Power-Saving Modes

Depending on your workload, you may want to switch between MAXN and more efficient modes. Common scenarios include:

Situation	Recommended mode	Command
LLM inference (7B–70B)	MAXN (0)	`sudo nvpmodel -m 0 && sudo jetson_clocks`
Vision / video processing	MAXN (0)	same as above
Compiling code (e.g., LLM)	MAXN (0)	same as above
Idle / light development	MODE_30W (2)	`sudo nvpmodel -m 2`
Background low-power tasks	MODE_15W (3)	`sudo nvpmodel -m 3`

Switching modes only changes the power envelope, while jetson_clocks controls frequency locking; together they give fine-grained control over performance versus efficiency. You can integrate these commands into your own scripts to toggle modes depending on job type or time of day.

10. Practical Outcomes

MAXN mode and jetson_clocks are enabled on the Jetson AGX Orin 64 GB, with GPU, CPU, and EMC frequencies pinned at their maximums for AI workloads.
A systemd service (built-in or custom) ensures jetson_clocks runs at boot so performance is consistent across reboots.
Simple LLM benchmarks confirm real-world throughput improvements (on the order of 3x token/sec) compared to default power modes.
Continuous monitoring with tegrastats or jtop provides visibility into temperature, power draw, and potential thermal throttling.
Clear commands exist to switch between high-performance and power-saving modes depending on workload requirements.

11. Conclusion

Configuring the Jetson AGX Orin 64 GB into MAXN mode with locked clocks is a necessary step to realize the board’s full 275 TOPS potential for LLM inference and other GPU-intensive workloads. The combination of nvpmodel for power profiles and jetson_clocks for frequency locking provides deterministic performance while staying within the cooling design limits of the developer kit.

With the steps in this tutorial, you can reproducibly enable, verify, and persist maximum performance settings, then validate them using practical AI benchmarks and runtime telemetry tools. In a larger workflow, this configuration forms the foundation for subsequent tasks such as creating swap space for very large models and installing build tools for optimized inference frameworks.

Exploratory Installation of Unsloth on NVIDIA Jetson AGX Orin 64 GB

Sergio Andres Usma — Sun, 05 Apr 2026 14:52:21 +0000

Abstract

This report documents an exploratory attempt to install and run Unsloth (including Unsloth Studio) on an NVIDIA Jetson AGX Orin 64 GB using a Docker-based workflow with dustynv/l4t-ml:r36.4.0 as the base image.

The process successfully validated GPU-accelerated PyTorch and Unsloth’s core Python package on Jetson, but exposed substantial friction and incompatibilities in getting Unsloth Studio’s full stack (Studio backend, frontend, Triton/TorchInductor/TorchAo dependencies, and custom virtual environment) to run reliably on this ARM-based edge platform.

The goal of this write-up is to provide a precise technical account so that other practitioners (and the Unsloth team) can (a) reproduce or avoid the same pitfalls, and (b) better assess the current suitability of Unsloth Studio for Jetson-class devices.

1. Hardware and Software Environment

The experiments were conducted on the following platform:

Device: NVIDIA Jetson AGX Orin Developer Kit (64 GB)
OS: Ubuntu 22.04.5 LTS, aarch64
JetPack / L4T: JetPack 6.2.2, L4T 36.5.0
CUDA: 12.6 (nvcc 12.6.68)
cuDNN: 9.3.0
TensorRT: 10.3.0
Docker: Engine with NVIDIA Container Runtime enabled (--runtime=nvidia)
Base ML image: dustynv/l4t-ml:r36.4.0 (from Jetson Containers), which provides:
- PyTorch compiled for Jetson (aarch64) with CUDA and TensorRT integration
- JupyterLab and common ML tooling

Host-side persistent storage for this project was centralized under:

~/unsloth/
  build/      # Dockerfile and build context
  work/       # notebooks, datasets, outputs
  cache/      # general cache inside the container
  hf/         # Hugging Face cache
  jupyter/    # Jupyter config
  ssh/        # SSH keys/config (optional)

This layout was bind-mounted into the container to ensure persistence across container rebuilds.

2. Docker Image Construction

2.1 Base Dockerfile

The starting point was a custom image layered on top of dustynv/l4t-ml:r36.4.0:

FROM dustynv/l4t-ml:r36.4.0

ENV DEBIAN_FRONTEND=noninteractive \
    PIP_NO_CACHE_DIR=1 \
    PYTHONUNBUFFERED=1 \
    SHELL=/bin/bash \
    JUPYTER_PORT=8888 \
    STUDIO_PORT=8000 \
    WORKSPACE=/workspace \
    HF_HOME=/workspace/.cache/huggingface \
    TRANSFORMERS_CACHE=/workspace/.cache/huggingface \
    HUGGINGFACE_HUB_CACHE=/workspace/.cache/huggingface

USER root

RUN apt-get update && apt-get install -y --no-install-recommends \
    curl git wget ca-certificates build-essential pkg-config \
    python3-pip python3-dev python3-venv \
    openssh-server sudo nano htop tmux \
    libopenblas-dev libssl-dev libffi-dev \
    && rm -rf /var/lib/apt/lists/*

RUN mkdir -p /var/run/sshd /workspace/work /workspace/.cache/huggingface /root/.jupyter

RUN python3 -m pip install --upgrade pip setuptools wheel

# Remove Jetson-specific custom pip indexes to avoid transient outages
RUN python3 -m pip config unset global.index-url || true && \
    python3 -m pip config unset global.extra-index-url || true

# Generic Python dependencies via PyPI
RUN PIP_INDEX_URL=https://pypi.org/simple python3 -m pip install \
    fastapi "uvicorn[standard]" gradio \
    accelerate transformers peft trl datasets sentencepiece protobuf safetensors \
    huggingface_hub

# Install Unsloth (core + zoo) from GitHub/PyPI
RUN PIP_INDEX_URL=https://pypi.org/simple python3 -m pip install \
    "unsloth @ git+https://github.com/unslothai/unsloth.git" \
    "unsloth-zoo @ git+https://github.com/unslothai/unsloth.git" || true

# Optionally attempt bitsandbytes (may be fragile on Jetson)
RUN PIP_INDEX_URL=https://pypi.org/simple python3 -m pip install bitsandbytes || true

WORKDIR /workspace
EXPOSE 8000 8888 22

CMD ["/bin/bash"]

Key design choices:

Reuse NVIDIA’s l4t-ml stack instead of installing PyTorch/TensorRT manually, since it is tuned for Jetson.
Explicitly unset custom Jetson pip indexes before installing Unsloth, to avoid failures due to unavailable Jetson-specific mirrors while installing generic packages (e.g. fastapi).
Install Unsloth via GitHub (or PyPI) rather than using the x86-oriented Docker image unsloth/unsloth.

The image was built with:

cd ~/unsloth/build
sudo docker build --no-cache -t local/unsloth-studio:jetson-l4tml-r36.4.0 .

3. Container Runtime and GPU Validation

A persistent container was created with host networking and bind mounts:

sudo docker run -d \
  --name unsloth-studio \
  --restart unless-stopped \
  --runtime nvidia \
  --network host \
  --shm-size=16g \
  -e HF_HOME=/workspace/.cache/huggingface \
  -e TRANSFORMERS_CACHE=/workspace/.cache/huggingface \
  -e HUGGINGFACE_HUB_CACHE=/workspace/.cache/huggingface \
  -v ~/unsloth/work:/workspace/work \
  -v ~/unsloth/cache:/workspace/.cache \
  -v ~/unsloth/hf:/root/.cache/huggingface \
  -v ~/unsloth/jupyter:/root/.jupyter \
  -v ~/unsloth/ssh:/root/.ssh \
  local/unsloth-studio:jetson-l4tml-r36.4.0 \
  tail -f /dev/null

Inside the container, GPU support was verified with:

python3 -c "import torch;
print(torch.__version__);
print(torch.cuda.is_available());
print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'no cuda')"

This confirmed:

torch version 2.6.0 (from the l4t-ml stack),
CUDA available,
device name reported as “Orin”.

Thus, the base ML environment inside the container was correctly accelerated on Jetson.

4. Installing Unsloth Core

Within the container:

python3 -c "import unsloth; print('unsloth ok')"
unsloth --help

The CLI output showed the main Unsloth commands:

train
inference
export
list-checkpoints
studio (subcommand group)

However, importing Unsloth triggered a warning stacktrace related to Triton, TorchInductor, and TorchAo:

ImportError: cannot import name 'AttrsDescriptor' from triton.compiler.compiler
Errors inside torch._inductor.runtime.hints and torchao.quantization

This indicates that parts of the current Unsloth stack assume a Triton/TorchInductor/TorchAo configuration aligned with x86_64 desktop/server builds of PyTorch, which is not trivially compatible with the Jetson-specific PyTorch build shipping in l4t-ml.

Despite these warnings, the CLI remained usable for basic commands, and GPU acceleration for standard PyTorch operations was intact.

5. Attempting to Enable Unsloth Studio

5.1 CLI-Level Status

The unsloth studio subcommand was present:

unsloth studio --help

showed options such as --host, --port, --frontend, and subcommands:

stop
update
reset-password

Attempting to start Studio directly:

unsloth studio --host 0.0.0.0 --port 8000

returned:

Studio not set up. Run install.sh first.

This implies that Studio expects an auxiliary installation step that sets up its environment (frontend, backend, and venv).

5.2 Running `unsloth studio setup`

Unsloth documentation describes a developer mode where Studio is installed via uv and a dedicated virtual environment.

Following this pattern, the command:

unsloth studio setup

produced:

Successful installation of nvm, Node LTS, and bun
Successful build of the frontend (“frontend built”)
But then:

python venv not found at /root/.unsloth/studio/unsloth_studio
Run install.sh first to create the environment:
  curl -fsSL https://unsloth.ai/install.sh | sh

Thus, the CLI expects a virtual environment under /root/.unsloth/studio/unsloth_studio that appears to be normally created by the official install.sh script.

5.3 Manual Creation of the Studio Virtual Environment

Rather than relying on install.sh (which is tuned for other platforms and may interfere with the Jetson-specific PyTorch/Triton stack), a manual venv was created:

cd /root/.unsloth/studio
uv venv unsloth_studio --python 3.10
source /root/.unsloth/studio/unsloth_studio/bin/activate

uv pip install --index-url https://pypi.org/simple unsloth

This installed Unsloth (and a complete stack of dependencies) into the venv, including:

torch, torchao, triton
transformers, accelerate, peft, trl
bitsandbytes
unsloth, unsloth-zoo

Within the venv, unsloth studio -H 0.0.0.0 -p 8000 still failed due to missing backend dependencies (structlog), which were then installed.

However, repeated attempts to start Studio continued to reveal issues:

ModuleNotFoundError: No module named 'structlog' (due to pip confusion between global and venv environments)
Friction in adding pip to the venv (pip not present or not found via python -m pip)
A recurring tension between the uv-managed environment and the classical pip expectations coming from Studio’s backend modules.

Ultimately, even after installing the necessary Python packages, the CLI still treated Studio as “not set up” and insisted on running the global install.sh script.

6. Failure Modes and Root Causes

The main failure modes observed were:

Triton / TorchInductor / TorchAo incompatibilities
- Errors when importing Unsloth related to AttrsDescriptor in Triton and TorchInductor.
- These components are not officially supported or tuned for the Jetson-specific PyTorch build, causing runtime import and registration issues.
Studio’s tight coupling to its own venv and installer
- Studio expects a very particular environment layout under ~/.unsloth/studio/unsloth_studio created by install.sh.
- Deviating from the installer (e.g., manual or uv-only installation) leads to missing venv markers, which the CLI interprets as “Studio not set up.”
Tooling friction on Jetson (uv + venv + pip)
- The combination of uv-managed environments with a system Python and Docker base image that already has a global pip led to situations where:
  - The venv had no pip initially.
  - python -m ensurepip installed pip globally rather than into the venv.
  - The actual pip used to install backend dependencies was the global one, leaving the venv incomplete.
Mismatch with Jetson Containers philosophy
- Jetson Containers and l4t-ml are built around Nvidia’s optimized PyTorch/TensorRT stacks, while Unsloth Studio’s modern pipeline assumes desktop/server-class Triton and TorchInductor configurations.
- This leads to a mismatch that is non-trivial to reconcile in a maintainable way.

7. Practical Outcomes

Despite the failure to get Unsloth Studio fully operational, the following outcomes were achieved:

A validated GPU-accelerated Unsloth core environment on Jetson:
- unsloth CLI installed and usable.
- PyTorch 2.6.0 with CUDA on Orin working correctly.
A reusable Docker-based ML devbox (local/unsloth-studio:jetson-l4tml-r36.4.0) with:
- A clear persistent directory layout (~/unsloth).
- Host networking and shared volumes suitable for integration with other Jetson Containers (e.g., llama.cpp, vLLM, NanoLLM, llama-factory).
Empirical evidence that, as of this experiment, Unsloth Studio is not yet a drop-in web UI solution for Jetson AGX Orin, due to:
- Triton/TorchInductor/TorchAo assumptions, and
- Strong coupling to the install.sh-managed environment.

8. Recommendations for Jetson Practitioners

For current Jetson AGX Orin users:

Use Unsloth core selectively
- Unsloth’s Python API and CLI can still be valuable for fine-tuning/export workflows that do not rely heavily on Triton/TorchInductor-specific optimizations.
- Prefer using the Jetson-optimized PyTorch from l4t-ml and be cautious with features that depend on TorchInductor/Triton.
Rely on Jetson Containers for serving and fine-tuning
- For serving and fine-tuning large models on Jetson, the containers in the Jetson Containers ecosystem (llama.cpp, vLLM, MLC, TensorRT-LLM, NanoLLM, llama-factory) are significantly more mature and better integrated with JetPack and L4T.
Treat Unsloth Studio on Jetson as experimental
- Until there is first-class ARM/Jetson support (or a documented variant of install.sh and Studio’s backend explicitly targeting Jetson), Studio should be considered an experimental integration on this hardware.

9. Suggestions for the Unsloth Team

Based on this experience, the following changes would materially improve the viability of Unsloth Studio on Jetson and similar edge platforms:

Documented “headless / no-Triton” mode
- A configuration profile that can disable or bypass TorchInductor/Triton/TorchAo, relying purely on standard PyTorch kernels when running on unsupported architectures such as Jetson.
Explicit ARM/Jetson support statement and checks
- Clear statements in the documentation regarding ARM/aarch64 support status, with runtime checks that either:
  - Enable a safe, reduced feature set, or
  - Fail fast with a clear, actionable message.
Studio installation mode for preexisting Python stacks
- A variant of install.sh or studio setup that:
  - Can attach to an existing PyTorch environment (e.g., Jetson’s l4t-ml), and
  - Creates only the additional Studio-specific venv/backend/frontend without attempting to reconfigure PyTorch or Triton.
Minimal dependency profile for Studio backend
- A smaller “core backend” dependency set for Studio that avoids complex quantization stacks and heavy compiler integrations when running in constrained or embedded environments.

10. Conclusion

The experiment demonstrates that:

Installing Unsloth core on Jetson AGX Orin via a Dockerized l4t-ml base image is feasible, and the resulting environment is usable for GPU-accelerated LLM workflows.
However, enabling Unsloth Studio—the full web UI for training and serving—on Jetson currently encounters significant hurdles due to the interaction between Triton/TorchInductor, TorchAo, uv-managed venvs, and the assumptions baked into install.sh.

From a practical standpoint, Jetson users are better served today by combining Unsloth core (where useful) with the existing Jetson Containers ecosystem, while treating Unsloth Studio as an experimental component on this hardware.

From a community and engineering perspective, this experiment highlights concrete areas where incremental changes and documentation from the Unsloth team could unlock a powerful edge deployment story on Jetson-class devices.

Forem: Sergio Andres Usma

Jetson Containers Quickstart on NVIDIA Jetson AGX Orin 64GB

Abstract

1. Target Hardware and Software Environment

2. LLM Inference Engines (OpenAI-Compatible)

2.1 Ollama — General Purpose LLM Runtime

2.2 llama.cpp — GGUF, Quantized LLM Server

2.3 vLLM — High Throughput LLM Serving

2.4 SGLang — Structured Output and JSON

2.5 MLC and nanoLLM — Orin‑Optimized and Multimodal

3. Speech and Audio Containers

3.1 faster-whisper — STT Server

3.2 kokoro-tts — Lightweight Local TTS

3.3 speaches — Unified Speech In/Out

4. Vision, Diffusion, and VLM Containers

4.1 Stable Diffusion WebUI — Text‑to‑Image UI + API

4.2 ComfyUI — Graph‑Based Diffusion Workflows

4.3 VILA and Related VLMs

5. Development, Experiment Tracking, and Smart Home

5.1 L4T-ML, PyTorch, and JupyterLab

5.2 AIM Experiment Tracker

5.3 Home Assistant Core on Jetson

6. n8n Integration Patterns and Networking Notes

7. Practical Recommendations and Next Steps

[Beginner] Docker Tutorial for jetson-containers on Jetson AGX Orin

Abstract

1. Basic Concepts: Images, Containers, Volumes

2. Safety First: Backing Up Configuration Files

2.1. Backing up directories on the Jetson

3. Checking Docker on Jetson AGX Orin

4. Using jetson-containers: Typical Workflow

4.1. Cloning the jetson-containers repository

4.2. Building a jetson-containers image

5. Running, Stopping, and Inspecting Containers

5.1. Starting a new container with jetson-containers

5.2. Listing running and stopped containers

5.3. Attaching and entering a running container

5.4. Stopping and removing containers

6. Re-running and Rebuilding Images

6.1. Re-running a container with the same configuration

6.2. Forcing a rebuild of an image

6.3. Updating images from a registry

7. Managing Data and Volumes Safely

7.1. Using host directories as volumes

7.2. Using named Docker volumes (optional)

8. Useful Command Reference Table

9. Conclusion

Fast Large-file and LLM Downloads with aria2 on NVIDIA Jetson AGX Orin

Abstract

1. Hardware and software environment

2. Installing aria2

3. Fast single-file downloads from Hugging Face

3.1 Core throughput flags: -x, -s, and -k

3.2 Power command template for large model downloads

4. Resume mechanics and .aria2 Control Files

4.1 How aria2 resume works

4.2 Missing .aria2 control file

4.3 Resuming with a new signed URL

5. Batch downloads and session files

5.1 URL list with session management

5.2 Persistent single-session workflow

6. Jetson-specific configuration and filesystem Considerations

6.1 Recommended baseline flags for Jetson AGX Orin

6.2 File allocation on Jetson NVMe

7. Filename issues from redirect chains

7.1 Why downloads receive hash-like filenames

7.2 Renaming an existing hash-named partial file

8. Failure modes and recovery

8.1 HTTP 403 errors during download (errorCode=22)

8.2 Persistent failures at a fixed byte offset

8.3 Quick command reference

9. Practical outcomes

10. Conclusions and recommendations

Network Optimization Tutorial For NVIDIA Jetson AGX Orin 64 GB

Abstract

1. Prerequisites and Environment

1.1 Hardware and Software Specifications

1.2 Required Permissions and Tools

1.3 Primary Wired Interface Name

2. Pre-Change Backup Procedure

3.1 Core throughput flags: `-x`, `-s`, and `-k`

8. Automated script – `jetson_sysinfo.sh`