Introduction
YAML (YAML Ain’t Markup Language) is a human-readable data serialization language designed for configuration and data interchange. Unlike XML or JSON, YAML uses minimal syntax and indentation (spaces) to represent structure, making it easy for humans to read and write.
Developers encounter YAML all the time: it’s the backbone of many DevOps tools (Kubernetes manifests, Docker Compose files, Ansible playbooks, CI/CD pipelines, GitHub Actions workflows, etc.).
For instance, GitHub Actions workflows are defined in YAML files (in .github/workflows/
), since “YAML is a markup language that’s commonly used for configuration files”. Because of its ubiquity in modern tooling and its focus on readability, understanding YAML is invaluable for developers.
Basic Syntax and Formatting
YAML’s syntax is defined by a few simple rules:
-
Indentation: Use spaces (not tabs) to denote nesting. Two spaces per level is common. For example, the keys under
person:
are indented to show they belong to that mapping. -
Key–Value Pairs: Write mappings as
key: value
. Keys are usually alphanumeric (use quotes only if needed). -
Lists: Start list items with a hyphen (
-
). Each-
begins a new element in the sequence. -
Comments: Precede comments with
#
. Anything on a line after#
is ignored by YAML parsers.
For example:
# Basic YAML example
person:
name: John Doe
age: 30
skills:
- Python
- YAML
Here person
is a mapping containing keys name
, age
, and skills
. The skills
key maps to a list of two items (denoted by -
).
Note there are no tabs, and indentation is consistent (2 spaces per level). Improper indentation or mixing tabs and spaces will cause parse errors.
Core Data Types
YAML natively supports several core data types:
-
Scalars: These include strings, integers, floats, booleans, and nulls. YAML usually auto-detects types:
-
Strings: Plain (unquoted) strings (e.g.
title: Hello
), or quoted with"
or'
for special characters. Escape sequences (like\n
) work in double-quoted strings. -
Numbers: Integers (
age: 42
) and floats (pi: 3.14159
) are written without quotes. -
Booleans: Represented as
true
/false
(lowercase). -
Null: Use
null
or~
to denote a null value.
-
Strings: Plain (unquoted) strings (e.g.
Sequences (Lists): Ordered collections denoted by
-
entries. E.g.:
fruits:
- Apple
- Banana
- Cherry
This creates a list of three strings.
- Mappings (Hashes/Dictionaries): Unordered key-value pairs. E.g.:
database:
host: db.example.com
port: 5432
enabled: true
This creates a map with three keys. (YAML maps are called “associative arrays” in some docs.)
Example combining types:
app:
name: MyApp
version: 1.0
active: true
description: "A sample YAML file"
nullable_field: ~
tags:
- backend
- production
limits:
cpu: 2
memory: 512
In the above, active
is a boolean, nullable_field
is null (~
), and tags
is a list. YAML’s loose typing means you often don’t need quotes: name: Hello
is fine. But be careful: unquoted strings like yes
, no
, on
, off
are interpreted as booleans by default. If needed, force a type with explicit tags (e.g. !!str
before a value), though this is rare in typical configs.
Advanced Features
Beyond basic values, YAML provides powerful features:
-
Anchors & Aliases: Use
&anchorName
to mark a node, and*anchorName
to reference it elsewhere. This avoids repetition. For example:
defaults: &default_settings
retries: 3
timeout: 30
service1:
<<: *default_settings
host: example.com
Here &default_settings
anchors a mapping. service1
then uses <<: *default_settings
to merge those key-values (retries, timeout) into service1
. This keeps large YAML DRY (Don’t Repeat Yourself).
-
Multi-line strings (Block Scalars): Use
|
(literal) or>
(folded) for multi-line text:
note: |
This is a multiline
string in literal style.
summary: >
This is a folded
style multiline string.
The |
style preserves line breaks; the >
style folds them into spaces. (In the example above, summary
will have its newline folded.)
- Complex/Nested Structures: YAML can express deeply nested data. For example, lists of maps:
services:
- name: web
replicas: 2
ports:
- containerPort: 80
- name: db
replicas: 1
ports:
- containerPort: 5432
Here services
is a list of two mappings. Each mapping can have its own keys and further nesting. YAML lets you mix mappings and sequences arbitrarily to represent complex hierarchies.
Block vs Flow Styles and Multi-Document Files
By default YAML uses block style (indentation) for clarity. However, it also supports a more compact flow style (JSON-like) for collections:
- Block style lists/mappings: Each item on its own line, indented.
colors:
- red
- green
- blue
user:
name: Alice
age: 30
-
Flow style: Enclose lists in
[ ]
and maps in{ }
, separating items with commas. For example:
colors: [red, green, blue]
user: { name: "Alice", age: 30 }
Flow style is valid YAML and mirrors JSON syntax. It can make short lists/maps more compact. Generally, block style is preferred for readability, but flow style can be handy for inline or short lists.
YAML also supports multiple documents in one file. Separate documents with ---
. You can end a document with ...
(though it’s optional). For example:
# Document 1
---
name: Document1
value: 123
# Document 2
---
name: Document2
value: 456
...
Each ---
starts a new YAML document (often used in Kubernetes multi-resource files, CI pipelines, etc.).
Real-World Usage Examples
GitHub Actions Workflows
GitHub Actions CI/CD pipelines are defined by YAML files. Each workflow (in .github/workflows/*.yml
) specifies triggers and jobs. For example:
name: CI Pipeline
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: echo "Building the project..."
- run: make test
This workflow runs on every push. The YAML fields (name
, on
, jobs
, etc.) are specific to GitHub Actions, but the syntax is pure YAML. Notice how steps
under a job is a list of actions. Any YAML error (wrong indent, missing colon) here will fail workflow loading.
Kubernetes Configuration
Kubernetes uses YAML for all its resource definitions (Pods, Deployments, Services, etc.). Here’s a snippet of a Deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: example-container
image: example-image
ports:
- containerPort: 8080
This defines a Deployment (kind: Deployment
) with 3 replicas of the example-container
running on example-image
. Notice the keys apiVersion
, kind
, metadata
, spec
– these are Kubernetes conventions, but the YAML syntax (indentation, lists, maps) follows the rules above. Kubernetes parsers will reject the file if YAML structure is wrong.
Docker Compose
Docker Compose uses a docker-compose.yml
file (YAML) to define multi-container applications. Example:
version: '3'
services:
web:
image: nginx:latest
ports:
- "8080:80"
db:
image: postgres:13
environment:
POSTGRES_USER: example
POSTGRES_DB: exampledb
Here services
is a mapping of service names (web
, db
) to their configurations. This example (from Docker’s docs) shows two services, each with an image and other settings. Again, correct indentation and syntax are crucial – e.g., ports
is a list, so its elements are prefixed with -
.
Ansible Playbooks
Ansible playbooks (automation tasks) are written in YAML. A simple playbook might look like:
- name: Update web servers
hosts: webservers
tasks:
- name: Install Apache
ansible.builtin.yum:
name: httpd
state: latest
- name: Copy config
ansible.builtin.copy:
src: httpd.conf
dest: /etc/httpd.conf
This defines a play (notice the leading -
at top level) that targets the webservers
group. Under tasks
, each task is a mapping with a name
and the module to run (yum
, copy
, etc.) with its arguments. Incorrect YAML (like wrong indent before - name: Install Apache
) will cause Ansible to fail parsing the playbook.
YAML Example
# String
name: "John Doe"
# Integer
age: 30
# Boolean
is_active: true
# Null
address: null
# List (Array)
languages:
- English
- Spanish
- French
# Map (Key-Value pairs)
contact:
email: "johndoe@example.com"
phone: "+1234567890"
# Nested Map
company:
name: "Tech Innovators Inc."
address:
street: "123 Tech Avenue"
city: "Innovapolis"
country: "Techland"
# List of Maps
employees:
- name: "Alice"
role: "Developer"
- name: "Bob"
role: "Designer"
- name: "Charlie"
role: "Manager"
# Mixed types (Map with List)
project:
name: "YAML Parser"
status: "In Progress"
team:
- Alice
- Bob
- Charlie
milestones:
- "Design"
- "Development"
- "Testing"
Best Practices
Writing clean, maintainable YAML is important. Some recommended practices include:
-
Consistent naming: Use clear, descriptive keys in a uniform style (e.g.
snake_case
orcamelCase
). Avoid cryptic abbreviations. Consistency helps readability. - Indentation: Always use spaces (no tabs) for indentation. Standardize on 2 spaces per level. Consistent indentation prevents hard-to-find errors.
- Avoid unnecessary complexity: Do not mix flow and block styles in confusing ways. Stick to block style for large or nested structures for clarity.
- Use anchors/aliases wisely: If several sections share identical settings, define them once with an anchor and reuse with aliases. This DRY approach reduces errors when updating common values.
-
Comments: Add comments (
#
) to explain non-obvious configurations or reasons for certain values. Comments are ignored by parsers but are invaluable for human readers. -
Validation: Use YAML linters (e.g., yamllint) or editor plugins to enforce style rules. These tools can catch indentation errors, duplicate keys, and more. Also, many systems have a “test config” mode (like
kubectl apply --dry-run
oransible-playbook --syntax-check
) to verify YAML before running. -
Quoting: When in doubt, quote strings that include special characters (
:
,@
, spaces) or begin with YAML-sensitive words. For example, wrap regex patterns or file paths in quotes to prevent parsing issues. - Schema awareness: Know the expected schema of your YAML (e.g. Kubernetes API spec). Some values must be in quotes or a specific format. Follow official style guides when available (e.g., Home Assistant forbids flow style).
Common Pitfalls and Debugging
Even simple mistakes can break YAML. Watch out for:
- Tabs vs Spaces: Using a tab instead of spaces will cause a parsing error.
-
Indentation errors: Misaligned keys (e.g. indent one level too far) will confuse the structure. A common error is
mapping values are not allowed
when you forget a dash or colon. -
Missing dashes: Forgetting the
-
before list items can turn what should be a list into a mapping key. -
Wrong quoting: Colons in unquoted strings or special characters can break the file. For example,
path: C:\Users
should often be written aspath: "C:\\Users"
. -
Boolean and null confusion: YAML treats
yes
,no
,on
,off
, and unquotedtrue/false
specially. A string “yes” becomes boolean true. If you need the literal string, quote it ("yes"
). -
Leading zeros: A number like
09
without quotes will be parsed as octal (or float) and can cause errors. Quoting such values avoids misinterpretation. - Duplicate keys: YAML forbids duplicate keys in the same mapping (some parsers allow it, but behavior is undefined). Double-check for typos.
To debug YAML issues:
- Use online validators or command-line tools (e.g.
python -c 'import yaml,sys; yaml.safe_load(sys.stdin)' < file.yaml
). - Many editors (VS Code, IntelliJ, etc.) have YAML support that highlights syntax errors.
- Read error messages carefully – they usually include the line and column of the problem.
- Compare to a working example or schema if available.
By following these guidelines and learning from examples, you can write robust, readable YAML for any use case.
Top comments (0)