Forem: Aayush Gid

Building a Production-Grade Tool Access Control Guardrail for LLM Agents

Aayush Gid — Tue, 09 Dec 2025 11:58:58 +0000

A Technical Breakdown with Code, Algorithms, and Internal Workflows

Modern AI agents increasingly act as autonomous operators inside real systems: querying databases, sending emails, initiating financial operations, retrieving secrets, orchestrating workflows… and that means they must obey security boundaries just like any human engineer.

This is not a simple “if/else allow/deny” guardrail.
The system combines:

Zero-trust principles
Capability-based access control
Cryptographic verification
Context-aware decision logic
Rate limiting
Anomaly detection
Immutable audit logs
Human-in-the-loop approval

High-Level Architecture

1. Tool Access Policy (TAP): The Source of Truth

Every tool in the system is defined by a ToolPolicy object.
This defines:

Sensitivity level
Allowed agent roles
Required identity verification
Rate limits
Allowed environments
Optional geo restrictions
Whether human approval is required
Input sanitization or output redaction flags
Custom validators

Sample Policy Registration

policy.register_tool(ToolPolicy(
    tool_name="finance.transfer",
    sensitivity=ToolSensitivity.SENSITIVE_WRITE,
    allowed_roles={AgentRole.ORCHESTRATOR, AgentRole.ADMIN},
    required_identity_strength=IdentityStrength.MFA_VERIFIED,
    requires_approval=True,
    approval_type="multi",
    max_invocations_per_hour=10,
    input_sanitization_required=True,
    audit_required=True
))

This immediately gives you a mental map:

If the tool handles money or secrets → strict permissions, approval required, logs enforced.

2. Agent Identity: Strong, Tiered Trust

Each agent is authenticated & classified through an identity object:

@dataclass
class AgentIdentity:
    agent_id: str
    agent_type: PrincipalType
    agent_role: AgentRole
    identity_strength: IdentityStrength
    attestation_signature: Optional[str]

A trust score is generated:

def get_trust_score(self):
    base = strength_scores[self.identity_strength]
    if self.attestation_signature:
        base += 0.1
    return min(1.0, base)

Agents with low identity strength show up as high-risk later in the anomaly detection pipeline.

3. Capability Tokens - Cryptographic, Time-Bound Permission Slips

A capability token is tied to:

a specific tool
specific allowed actions
specific constraints
expiration timestamp
a cryptographic signature

Example generation:

token = CapabilityToken(
    token_id=uuid4().hex,
    agent_id=agent_id,
    tool_name=tool_name,
    allowed_actions=[ToolAction.READ],
    constraints={"max_rows": 100},
    issued_at=now,
    expires_at=now + timedelta(hours=1)
)
token.signature = sha256(f"{payload}:{signing_key}")

This ensures:

Tokens can’t be forged
Tokens can’t be reused outside validity window
Tokens can’t be used on the wrong tool

Pseudocode validation:

if token.expired → deny
if token.tool_name != requested_tool → deny
if signature != sha256(payload + key) → deny
if any constraint violated → deny

4. Runtime Context: Where Stateful Intelligence Lives

Runtime context includes:

recent tool calls
rate limit counters
user verification
environment (dev/staging/prod)
geo location
device fingerprint
IP address
risk score

Example:

runtime = RuntimeContext(
    session_id="xyz",
    user_identity="user_123",
    user_verified=True,
    environment="production",
    geo_location="US"
)

This enables contextual rule enforcement:

Tool allowed in dev but not in prod
Tool allowed only for US traffic
User not verified → downgrade trust

5. Tool Call Workflow (End-to-End)

Replace this placeholder with a professional diagram later:

6. Anomaly Detection Engine

Risk score combines:

(A) Low-trust identity → higher risk

risk += (1 - trust_score) * 0.3

(B) Tool sensitivity

Sensitive tools automatically raise risk:

sensitivity_risk = {
    PUBLIC_READ: 0.0,
    INTERNAL_WRITE: 0.3,
    SENSITIVE_WRITE: 0.6,
    PRIVILEGED_ADMIN: 0.8
}

(C) Behavioral anomalies

Excessive repeated calls
Too many unique tools in a burst
Suspicious arguments (SQLi, JS, eval patterns)

if suspicious_args(tool_args):
    risk += 0.1

If final score > threshold → quarantine

7. Rate Limiting

A simple but effective mechanism:

rate_limit_counters[(agent, tool)] = timestamps[]

Every request:

remove timestamps older than 1 hour
if count >= policy.max → deny
else → append timestamp

This protects against runaway loops & spammy agents.

8. Approval System (Human-in-the-Loop)

Most production systems need humans to approve critical actions:

finance tools
secret retrieval
privileged admin tasks

Approval object:

ApprovalRequest(
    request_id="abcd1234",
    tool_name="finance.transfer",
    agent_id="agent_x",
    reason="Tool requires multi approval",
    risk_score=0.92
)

Workflow:

Guardrail detect approval needed
Create request
Return “awaiting approval”

9. Immutable Audit Trail

Every tool call — successful, denied, quarantined — is logged:

AuditEntry(
    agent_id, tool_name, decision, reason,
    tool_args_hash, context_snapshot, metadata
)

Arguments are hashed so:

sensitive data isn’t stored
but auditors can still compare hashes

This meets compliance requirements (SOC2, ISO, etc).

Dummy infographic placeholder:

10. The Core Algorithm: check_tool_call()

Here is a high-level version of the real function:

def check_tool_call(tool, args, ctx):

    # 1. Validate identity & context
    if not agent_identity: deny

    # 2. Verify capability token signature
    if not capability.verify(signing_key): deny

    # 3. Run anomaly detection
    risk = calculate_risk(agent, tool, args)
    if risk > threshold: quarantine

    # 4. Enforce rate limits
    if exceeded_rate_limit(agent, tool): deny

    # 5. Policy evaluation (TAP)
    decision, reason = policy.evaluate(...)

    # 6. Handle approval workflows
    if decision == REQUIRE_APPROVAL:
        create_approval_request(...)
        return "awaiting approval"

    # 7. Log everything
    audit_log(...)

    return decision

This is the “guardian” for every tool call.

11. Dependency Graph

Dummy infographic (replace with real graphic later):

ToolAccessControlGuardrail
│
├── ToolAccessPolicy
│     ├── ToolPolicy
│     └── Global Rules
│
├── ApprovalSystem
│
├── AuditLogger
│
├── CapabilityToken
│
└── RuntimeContext

This modular structure enables:

swapping components
customizing policy behavior
integrating external approval systems
plugging into enterprise security infrastructure

12. Why This Guardrail Model Scales in Production

It solves real-world concerns:

Prevents privilege escalation
Prevents prompt-induced dangerous actions
Controls tool surface area
Enforces least-privilege
Provides visibility & traceability
Supports security standards (zero-trust, NIST RMF)
Enables human approval for sensitive tasks
Handles noisy or misbehaving agents gracefully

This is not a toy guardrail — it is an enterprise-ready security layer.

Closing Thoughts

LLM agents are becoming more autonomous every month.
This system ensures they stay safe, predictable, and accountable.

The combination of:

strong cryptographic identity
capability tokens
context-aware policies
anomaly detection
audit logging
human oversight

gives you a security architecture that can actually withstand real-world failures, attacks, and unpredictable LLM behavior.

Github Link :- https://github.com/aayush598/agnoguard/blob/main/src/agnoguard/guardrails/tool_access_control.py

The LLM Shield: How to Build Production-Grade NSFW Guardrails for AI Agents

Aayush Gid — Sat, 06 Dec 2025 10:42:06 +0000

Content moderation is one of the most critical yet challenging aspects of building AI applications. As developers, we're tasked with creating systems that can understand context, detect harmful content, and make nuanced decisions—all while maintaining a positive user experience. Today, I want to share insights from building a production-grade NSFW detection system that goes beyond simple keyword blocking.

Why Simple Keyword Filtering Isn't Enough

When I first started working on content moderation, I thought a simple blocklist would suffice. Flag a few explicit words, block them, and call it a day. Reality quickly proved me wrong.

Users are creative. They use character substitutions ("s3x"), deliberate spacing ("p o r n"), and roleplay scenarios to bypass filters. Meanwhile, legitimate medical and educational content was getting incorrectly flagged. The system needed to be smarter—it needed context awareness.

The Multi-Layered Approach

The solution I developed uses a four-tier severity classification system, inspired by industry standards from organizations like OpenAI and Microsoft. Here's how it breaks down:

Level 0: Allowed Content

This includes medical, educational, and scientific content. Think anatomy textbooks, reproductive health articles, or clinical research papers. The system looks for contextual indicators like "doctor," "diagnosis," "textbook," or "peer-reviewed" to identify this category.

Level 1: Restricted Content

Mature themes that aren't explicitly sexual but may require age verification. This includes content about kissing, attraction, or sexual health education. It's the gray area that needs careful handling.

Level 2: Contextual Content

This is where things get interesting. Terms like "aroused," "seductive," or "naked" can be perfectly appropriate in some contexts (art history, literature analysis) but inappropriate in others. The system analyzes surrounding text to make informed decisions.

Level 3: Critical Content

Explicit sexual content, pornographic material, and sexual violence. This gets blocked immediately, no questions asked. The patterns here are carefully designed to catch both direct language and obfuscated attempts.

Detecting Jailbreak Attempts

One pattern I've seen repeatedly is users trying to bypass filters through roleplay: "Let's pretend we're characters in a story where..." The system specifically watches for roleplay indicators combined with sexual content, treating these as high-risk attempts to circumvent protections.

Handling Obfuscation

Users employ various tricks to evade detection:

Character separation: "p.o.r.n" or "s-e-x"
Deliberate misspellings: "p0rn" or "s3xy"
Leetspeak substitutions: "nak3d" or "h0rny"

Obfuscation Patterns

Character separation:

p[._-]?o[._-]?r[._-]?n

Leetspeak:

p[o0]rn
s[e3]x
h[o0]rny

The obfuscation detector uses regex patterns that account for these variations. It looks for suspicious patterns like excessive punctuation between characters or common number-for-letter substitutions.

The Ensemble Decision Engine

Here's where all the pieces come together. When content is analyzed:

Signal Collection: Each detector (explicit content, contextual analysis, obfuscation) generates a signal with a confidence score
Context Modification: The base confidence is adjusted based on context (medical terms present? roleplay detected? user verified?)
Weighted Aggregation: Signals are combined, with critical content getting more weight
Threshold Evaluation: The final decision compares against configurable thresholds

if severity_scores[L3] > 0:
    action = BLOCK
elif severity_scores[L2] > threshold:
    action = WARN or BLOCK
elif severity_scores[L1] > threshold:
    action = ALLOW or WARN
else:
    action = ALLOW

This ensemble approach is more robust than any single detector. Multiple weak signals can combine to indicate problematic content, while strong contextual indicators can override false positives.

Practical Implementation Considerations

Configuration Flexibility

Real-world applications need different strictness levels. The system supports three preset configurations:

Strict Mode: For general audience apps. Blocks Level 1+ content with a low confidence threshold (0.6). Best for platforms accessible to minors.

Age-Verified Mode: For adult platforms with user verification. Allows Level 1 content and requires higher confidence (0.7) before blocking Level 2 content.

Educational Mode: Optimized for academic settings. Only blocks Level 3 critical content and uses a high threshold (0.8) to minimize false positives on legitimate educational material.

Custom Rules

Every application has unique needs. The system allows:

Custom blocklists: Add domain-specific terms that should always block
Custom allowlists: Override detections for known safe terms in your context
Confidence thresholds: Adjust how aggressive the filtering should be

Transparency and Auditability

One crucial aspect often overlooked is transparency. When content is blocked, users deserve to understand why. The system provides detailed metadata:

Severity level and confidence score

Specific signals that triggered detection

Which patterns were matched (without exposing the full pattern library)

Whether the content appears to be in an educational/medical context

This transparency helps with:

User trust: People can understand and potentially appeal decisions
Debugging: Developers can identify false positives
Compliance: Audit trails for regulatory requirements

Future Enhancements

Content moderation is an evolving challenge. Some areas for future development:

Machine Learning Integration: Pattern-based detection has limits. ML models can learn nuanced patterns and adapt to new evasion techniques.

Multi-Language Support: The current system is English-focused. Expanding to other languages requires language-specific patterns and cultural context awareness.

Image and Video: Text is just the beginning. Visual content moderation adds another dimension of complexity.

User Feedback Loop: Allow users to report false positives/negatives, feeding improvements back into the system.

Conclusion

Building effective content moderation requires balancing multiple competing goals: safety, accuracy, user experience, and performance. A multi-layered approach with context awareness provides the flexibility to handle diverse scenarios while maintaining high accuracy.

The key takeaways:

Simple keyword blocking fails in production environments
Context analysis is essential for reducing false positives
Multiple detection signals provide robustness
Configuration flexibility allows adaptation to different use cases
Transparency builds user trust

Content moderation isn't a solved problem—it's an ongoing challenge that requires continuous refinement. But with thoughtful architecture and careful implementation, we can build systems that protect users while respecting legitimate content.

If you're building AI applications with user-generated content, I hope this guide provides a solid foundation for your moderation strategy. The code and patterns discussed here are based on real-world production experience and industry best practices.

Github Code : https://github.com/aayush598/agnoguard/blob/main/src/agnoguard/guardrails/nsfw_advanced.py

Stay safe, and happy coding!

Have questions about implementing NSFW detection in your application? Found this guide helpful? Leave a comment below or connect with me on your preferred platform. I'd love to hear about your experiences with content moderation.

LLM Guardrails: 50+ Safety Layers Every AI Application Needs

Aayush Gid — Sun, 16 Nov 2025 11:39:55 +0000

In 2024 alone, 68% of enterprises deploying Large Language Models (LLMs) reported security incidents due to inadequate guardrails. If you’re building with LLMs—whether ChatGPT, Claude, Llama, or proprietary models—understanding guardrails isn't optional anymore. It's the difference between a production-ready application and a compliance nightmare waiting to happen.

This comprehensive guide breaks down 50+ guardrails across 8 critical categories. Whether you're a security engineer hardening enterprise AI systems, a developer building your first LLM application, or a compliance officer evaluating AI risks, you’ll find actionable insights here.

What Are LLM Guardrails and Why Do They Matter?

LLM guardrails are safety mechanisms that monitor, filter, and control what goes into and comes out of your AI system. Think of them as security checkpoints at multiple stages of your AI pipeline—validating inputs before they reach the model, intercepting malicious prompt patterns, and sanitizing outputs before they reach users.

Without guardrails, your LLM application is vulnerable to:

Prompt injection attacks that manipulate model behavior
Data leakage exposing sensitive customer information
Jailbreak attempts bypassing safety policies
Compliance violations under GDPR, HIPAA, or industry regulations
Toxic content generation damaging brand reputation
Unauthorized tool access leading to system compromise

The cost of these failures? A single data breach averages $4.45 million, not counting reputational damage and regulatory fines.

The 8 Categories of LLM Guardrails

Here is a list of the critical guardrails organized by category:

1. Input Validation Guardrails

Purpose: Stop malicious, sensitive, or malformed inputs before they reach your LLM. These are your first line of defense.

Critical Input Guardrails	Description	Compliance Relevance
PII Detection (Extended)	Identifies and blocks personally identifiable information (names, addresses, phone numbers, SSNs, credit cards, etc.).	GDPR, CCPA
PHI Awareness	Detects Protected Health Information (medical record numbers, diagnoses, treatment details).	HIPAA
URL and File Blocker	Prevents SSRF attacks, data exfiltration, or malicious file inclusion attempts.	Security
Binary Attachment Blocker	Rejects binary data disguised as text input (payload injection vector).	Security
Secrets in Input Detection	Scans for API keys, passwords, tokens, and other credentials accidentally or maliciously included in prompts.	Security, Logging
Encoding Obfuscation Detection	Identifies attempts to bypass filters using Base64, URL encoding, or Unicode manipulation.	Security
Input Size Limits	Enforces character/token limits to prevent Denial-of-Service (DoS) and context window overflow.	Operational, Cost Control
Dangerous Pattern Detection	Blocks known malicious patterns like SQL injection syntax, shell commands, or script tags.	Security
Regex Filter (Configurable)	Allows custom pattern matching for domain-specific threats.	Domain Security
Language Restriction	Limits inputs to approved languages, preventing multilingual confusion exploits.	Operational

2. Prompt Injection & Jailbreak Guardrails

Purpose: Detect and block attempts to manipulate the LLM into ignoring safety instructions or performing unauthorized actions.

Prompt Injection Signature Detection: Identifies known injection patterns (e.g., “Ignore previous instructions,” “You are now in developer mode”).
LLM Classifier for Injection: Uses a secondary, smaller LLM to classify whether an input contains injection attempts.
System Prompt Leak Prevention: Blocks attempts to extract your system prompt (e.g., “Repeat the instructions given to you”).
Cross-Context Manipulation Detection: Identifies attempts to mix conversation contexts or inject fake history.
Jailbreak Pattern Recognition: Catches sophisticated techniques like hypothetical scenarios or role-play attacks (“Pretend you’re an AI without restrictions”).
Role-Play Injection Blocker: Targets attempts to make the AI assume unauthorized roles (e.g., “root administrator”).
Override Instruction Detection: Flags any input attempting to modify, disable, or override the AI’s core instructions.

3. Output Validation & Leakage Guardrails

Purpose: Sanitize and validate LLM outputs before they reach users, preventing data leakage and ensuring quality.

Output PII Redaction: Scans generated responses for PII that might have leaked, and automatically redacts or blocks them.
Secret Leak Detection in Output: Prevents the model from outputting API keys, passwords, internal URLs, or configuration details.
Internal Data Leak Prevention: Blocks outputs containing internal documentation references, employee names, proprietary methodologies, or infrastructure details.
Confidentiality Enforcement: Ensures the model never reveals information about other users or system internals.
Output Schema Validation: For structured outputs (JSON, XML), validates that responses match expected schemas.
Hallucination Risk Assessment: Flags outputs with high-confidence factual statements when the data is uncertain (critical for medical/legal/financial apps).
Citation Requirement Enforcement: Ensures the model includes verifiable citations and doesn't present hallucinated information as fact.
Sandboxed Output Verification: Tests outputs in isolated environments before delivery (important for generating code or executable content).

4. Content Safety Guardrails

Purpose: Prevent generation of harmful, offensive, or policy-violating content.

NSFW Content Filter: Blocks generation of sexually explicit or pornographic content.
Hate Speech Detection: Identifies and prevents outputs containing discrimination, slurs, or targeted harassment.
Violence Content Filter: Blocks detailed descriptions of violence, gore, or torture.
Self-Harm Prevention: Detects and intervenes in conversations involving suicide ideation or self-injury, and suggests crisis resources.
Political Persuasion Restriction: Prevents the model from engaging in political campaigning or presenting partisan views as objective fact.
Medical Advice Limitation: Blocks the AI from providing diagnosis or treatment recommendations and enforces appropriate disclaimers.
Defamation Prevention: Prevents generation of false, damaging statements about real individuals or organizations.

5. Tool & Capability Guardrails

Purpose: Control what external tools, APIs, and capabilities your LLM can access and execute.

Tool Access Control: Implements permission-based access to functions based on user or context.
Command Injection in Output Prevention: Ensures generated system commands, SQL queries, or API calls are sanitized.
Destructive Tool Call Detection: Flags and blocks tool calls that would delete data, modify critical configuration, or execute privileged operations without explicit human approval.
API Rate Limit Enforcement: Prevents excessive external API calls that could exhaust rate limits or generate unexpected costs.
File Write Restriction: Ensures the LLM can only write to approved directories, with approved extensions, and validated content.

6. Security Guardrails

Purpose: Protect system infrastructure and prevent security credential leakage.

Secrets in Logs Prevention: Ensures logging and telemetry never capture API keys, passwords, or sensitive data.
API Key Rotation Trigger: Monitors for compromise indicators and triggers automatic key rotation.
Internal Endpoint Leak Prevention: Blocks any output or log entry that would reveal internal service URLs or infrastructure topology.
IAM Permission Validation: Verifies that requested operations align with the user’s Identity and Access Management permissions.
Environment Variable Leak Detection: Prevents disclosure of configuration secrets or database connection strings stored in environment variables.

7. Privacy & Compliance Guardrails

Purpose: Ensure regulatory compliance with data protection laws and user privacy rights.

GDPR Data Minimization: Ensures the system only collects, processes, and retains the minimum necessary data.
User Consent Validation: Verifies that proper consent was obtained before processing personal data.
Retention Check: Enforces data retention policies by flagging or preventing access to data beyond its permitted period.
Right to Erasure Request Detection: Identifies when users invoke the GDPR Article 17 "right to be forgotten" and triggers deletion workflows.

8. Operational Guardrails

Purpose: Maintain system reliability, cost control, and quality standards.

Rate Limiting: Prevents abuse by limiting requests per user/IP. Protects against DoS and API quota exhaustion.
Cost Threshold Alerts: Monitors token usage and API costs in real-time. Triggers alerts or cutoffs when spending exceeds predefined thresholds.
Model Version Pinning: Ensures your application uses a specific, tested model version rather than automatically updating.
Telemetry Enforcement: Guarantees all LLM interactions are properly logged and traceable for audits and investigations.
Quality Threshold Validation: Measures output quality (coherence, relevance) and automatically rejects or regenerates low-quality responses.

Common Guardrail Implementation Mistakes to Avoid

Mistake	Consequence	Best Practice
Sequential implementation	Adds unacceptable latency to the user experience.	Run multiple guardrails simultaneously (in parallel).
Treating guardrails as binary pass/fail	Limits flexibility and can frustrate users.	Implement confidence scoring and graduated responses (block, warn, log).
Neglecting false positive rates	Overly aggressive blocking frustrates legitimate users.	Test extensively on real use cases and tune sensitivity.
Hardcoding patterns	Guardrails quickly become outdated as attacks evolve.	Build guardrails with adjustable thresholds and updateable pattern databases.

Monitoring and Metrics: Know Your Guardrail Health

Track these Key Performance Indicators (KPIs) to measure effectiveness:

Detection Metrics:
- Trigger rate: How often each guardrail fires.
- Block rate: Percentage of requests blocked vs. warned.
- False positive rate: Legitimate requests incorrectly blocked.
- False negative rate: Malicious requests that passed through.
Performance Metrics:
- Latency p50/p95/p99: Response time impact.
- Resource utilization: CPU, memory, API costs.
Security Metrics:
- Attack attempts: Detected injection/jailbreak tries.
- Successful bypasses: Known failures requiring patches.

Guardrails Are Not Optional

Every LLM application needs a comprehensive guardrail strategy from day one. Start with the critical tier—PII detection, prompt injection defense, rate limiting, and output sanitization—as these alone prevent 80% of common vulnerabilities.

The best time to implement guardrails was before you launched. The second best time is now.

Building a Bootloader from Scratch: An x86 Assembly Guide

Aayush Gid — Sat, 15 Nov 2025 09:24:16 +0000

When you press the power button, a complex, step-by-step procedure unfolds before your operating system (OS) appears. At the very core of this process lies the bootloader. This article guides you through building a simple, Stage-1 bootloader in x86 assembly that prints messages and reads a disk sector using BIOS interrupts.

What is a Bootloader?

A bootloader is the first program that executes after the system power-on sequence completes.

Location: It resides in the boot sector—the very first 512-byte sector of a bootable device (like a hard drive or USB).
Loading: The system's BIOS (Basic Input/Output System) loads this sector into memory at the specific address 0x7C00.
Signature: A valid boot sector must end with the signature 0xAA55.
Role: Its primary function is to prepare the system environment and load the "next stage" of code, which could be the OS kernel or a more advanced second-stage bootloader.

Analogy: Think of the bootloader as the table of contents of a book—it’s the first thing the system sees and it points to where the essential content (the OS) can be found.

Without a bootloader, the CPU wouldn't know where the OS is, how to load it into memory, or what instruction to execute next. The BIOS does the basic hardware checks and initialization; the bootloader takes the hand-off and directs execution.

Project Goal: Stage-1 Bootloader

You will create a minimal Stage-1 bootloader with the following sequence:

BIOS loads the boot sector to 0x7C00.
Bootloader prints an initial message.
Bootloader uses the BIOS disk service (INT 0x13) to read a specific sector (e.g., Sector 2).
Bootloader prints the contents of the newly loaded sector from memory.
Bootloader halts in an infinite loop.

Tools Required

Tool	Description	Installation Command (Linux)
NASM	The Assembler used to convert assembly code into a binary file (`.bin`).	`sudo apt install nasm`
QEMU	A fast and reliable system emulator for testing the bootloader.	`sudo apt install qemu-system`
Optional: Bochs	For detailed, low-level debugging.	-

You'll run the final assembly code using QEMU:
qemu-system-i386 -fda boot.bin

Background Knowledge for Beginners

The Computer Boot Sequence

POST: The BIOS runs the Power-On Self-Test (POST) to check hardware.
Load: The BIOS loads the first 512-byte sector (the boot sector) of the bootable drive into memory at 0x7C00.
Execute: The CPU jumps to 0x7C00 and begins executing the bootloader's code.
Handoff: The bootloader loads the next stage of the OS or program.

Real Mode and 16-bit Basics

Upon reset, the CPU operates in 16-bit Real Mode.

Addressing: It uses segment:offset addressing.
Access: It can only access the first 1 MB of memory.
Registers: Key registers are 16-bit, including AX, BX, CX, DX (general purpose), SI, DI (index), and the segment registers DS, ES, SS, CS.

Segmentation and Addressing

The CPU calculates a 20-bit Physical Address using the 16-bit Segment and Offset registers:

Physical Address = Segment * 16 + offset

Common pairs: DS:SI for string/data manipulation, ES:BX for disk buffers, and CS:IP for code execution.

BIOS Interrupts Overview

BIOS provides services through software interrupts, which are called using the int instruction. We'll focus on two:

Interrupt	Purpose	Example Register Setup
INT 0x10	Video services (e.g., printing characters).	`mov ah, 0x0E` (Teletype function)
INT 0x13	Disk services (e.g., reading/writing sectors).	`mov ah, 0x02` (Read function)

Source Code Structure and Logic

The project is split into three modular assembly files for clarity and reusability:

File	Purpose	Key Function
`print.asm`	Reusable routines for text output.	`print_string` (using INT 0x10)
`disk_read.asm`	Handles disk I/O with minimal error handling.	`read_sector` (using INT 0x13)
`stage1_bootloader.asm`	The main entry point and execution logic.	Entry at `0x7C00`

1. Printing Functions (`print.asm`)

These functions use INT 0x10, AH=0x0E (Teletype mode) to display characters.

; Print a single character in AL
print_char:
    mov ah, 0x0E    ; Teletype function
    mov bh, 0x00    ; Display page 0
    mov bl, 0x07    ; White on black color
    int 0x10
    ret

; Print a null-terminated string (DS:SI -> string)
print_string:
.print_loop:
    lodsb           ; Load byte from [DS:SI] into AL, increment SI
    cmp al, 0       ; Check for null-terminator (0)
    je .done
    call print_char
    jmp .print_loop
.done:
    ret

2. Disk Reading Function (`disk_read.asm`)

This function uses INT 0x13, AH=0x02 to read one sector.

Register	Value	Description
AH	`0x02`	Function: Read Sector(s)
AL	`0x01`	Number of sectors to read
CH	Cylinder (0-based)
CL	Sector (1-based)
DH	Head (0-based)
DL	Drive (0x00 for floppy, 0x80 for hard disk)
ES:BX	Destination buffer address

Important: Sector numbering starts at 1.

read_sector:
    ; Prerequisites: ES:BX (dest), DL (drive), CH/DH/CL (CHS)
    mov ah, 0x02
    mov al, 0x01
    int 0x13
    jc .fail        ; Jump if Carry Flag (CF) is set (failure)
    ret
.fail:
    mov si, read_error_msg
    call print_string
    jmp $           ; Halt forever on error
read_error_msg db "Disk Read Error", 0

3. Bootloader Entry Point (`stage1_bootloader.asm`)

This is the main logic. We initialize segment registers, print the message, then configure the parameters for read_sector.

Parameter	Value	Description
ES	`0x0000`	Destination segment (Data to be loaded at 0x0000:0x0500)
BX	`0x0500`	Destination offset (Safe memory buffer)
CL	`0x02`	Sector 2 (The sector we are reading)

[BITS 16]
[ORG 0x7C00]
start:
    ; 1. Initialize segment registers to 0
    xor ax, ax
    mov ds, ax
    mov es, ax

    ; 2. Print initial message
    mov si, msg
    call print_string

    ; 3. Configure and call read_sector to load Sector 2 to 0x0500
    mov ax, 0x0000
    mov es, ax          ; Destination Segment ES=0x0000
    mov bx, 0x0500      ; Destination Offset BX=0x0500
    mov dl, 0x00        ; Drive 0 (Floppy)
    mov ch, 0x00        ; Cylinder 0
    mov cl, 0x02        ; Sector 2
    mov dh, 0x00        ; Head 0
    call read_sector

    ; 4. Print the loaded data (at 0x0500)
    mov si, 0x0500
    call print_string

    ; 5. Loop forever (halt)
    jmp $

msg db "Reading sector 2...", 0
%include "asm/print.asm"
%include "asm/disk_read.asm"

; Boot sector padding and signature
times 510 - ($ - $$) db 0
dw 0xAA55

Running the Project

1. Assemble with NASM

This command converts the assembly code into a raw 512-byte binary file (boot.bin).

nasm -f bin asm/stage1_bootloader.asm -o boot.bin

2. Run in QEMU

QEMU emulates the hardware (BIOS, CPU, disk). The -fda flag tells QEMU to load our binary as the floppy disk image, which the BIOS will then boot from.

qemu-system-i386 -boot a -fda boot.bin

Expected Output:

The first line will be "Reading sector 2..." from the bootloader itself, followed immediately by the (potentially garbled) data contained within the actual Sector 2 of the virtual disk image.

Conclusion

By successfully building this basic bootloader, you've gained invaluable, low-level insight into the computer's startup process. You've directly interacted with the BIOS via interrupts, worked with real-mode addressing, and understood the critical hand-off from firmware to software.

This fundamental knowledge is the building block for all system-level development, from writing device drivers to developing a fully-fledged operating system.

Appendix: Quick BIOS Interrupt Reference

Interrupt	AH	Purpose	Key Registers
INT 0x10	0x0E	Teletype Output	AL (Character), BL (Color)
INT 0x13	0x02	Read Sectors	AL (Count), ES:BX (Buffer), CH/CL/DH (CHS)
INT 0x16	0x00	Wait for Keypress	Returns key code in AL

Glossary

Bootloader: The program loaded by the BIOS to initialize the OS.
Sector: The smallest addressable unit of disk storage (512 bytes).
CHS: Cylinder-Head-Sector, the legacy addressing scheme for disk I/O.
Boot Signature (0xAA55): The required 2-byte marker at the end of the boot sector.
Real Mode: The 16-bit operating mode of the x86 CPU at reset.

GitHub Repo Link : https://github.com/aayush598/basic-bootloader-assembly

Forem: Aayush Gid

Building a Production-Grade Tool Access Control Guardrail for LLM Agents

High-Level Architecture

1. Tool Access Policy (TAP): The Source of Truth

Sample Policy Registration

2. Agent Identity: Strong, Tiered Trust

3. Capability Tokens - Cryptographic, Time-Bound Permission Slips

4. Runtime Context: Where Stateful Intelligence Lives

5. Tool Call Workflow (End-to-End)

6. Anomaly Detection Engine

(A) Low-trust identity → higher risk

(B) Tool sensitivity

(C) Behavioral anomalies

If final score > threshold → quarantine

7. Rate Limiting

8. Approval System (Human-in-the-Loop)

9. Immutable Audit Trail

10. The Core Algorithm: check_tool_call()

11. Dependency Graph

12. Why This Guardrail Model Scales in Production

Closing Thoughts

The LLM Shield: How to Build Production-Grade NSFW Guardrails for AI Agents

Why Simple Keyword Filtering Isn't Enough

The Multi-Layered Approach

Level 0: Allowed Content

Level 1: Restricted Content

Level 2: Contextual Content

Level 3: Critical Content

Detecting Jailbreak Attempts

Handling Obfuscation

Obfuscation Patterns

The Ensemble Decision Engine

Practical Implementation Considerations

Configuration Flexibility

Custom Rules

Transparency and Auditability

Future Enhancements

Conclusion

LLM Guardrails: 50+ Safety Layers Every AI Application Needs

What Are LLM Guardrails and Why Do They Matter?

The 8 Categories of LLM Guardrails

1. Input Validation Guardrails

2. Prompt Injection & Jailbreak Guardrails

3. Output Validation & Leakage Guardrails

4. Content Safety Guardrails

5. Tool & Capability Guardrails

6. Security Guardrails

7. Privacy & Compliance Guardrails

8. Operational Guardrails

Common Guardrail Implementation Mistakes to Avoid

Monitoring and Metrics: Know Your Guardrail Health

Guardrails Are Not Optional

Building a Bootloader from Scratch: An x86 Assembly Guide

What is a Bootloader?

Project Goal: Stage-1 Bootloader

Tools Required

Background Knowledge for Beginners

The Computer Boot Sequence

Real Mode and 16-bit Basics

Segmentation and Addressing

BIOS Interrupts Overview

Source Code Structure and Logic

1. Printing Functions (print.asm)

2. Disk Reading Function (disk_read.asm)

3. Bootloader Entry Point (stage1_bootloader.asm)

Running the Project

1. Assemble with NASM

2. Run in QEMU

Conclusion

Appendix: Quick BIOS Interrupt Reference

Glossary

1. Printing Functions (`print.asm`)

2. Disk Reading Function (`disk_read.asm`)

3. Bootloader Entry Point (`stage1_bootloader.asm`)