I built a local LLM + Python tool that keeps your folders from turning into chaos

sukanto-m — Tue, 28 Oct 2025 10:41:54 +0000

We've all been there. You start a new project with a clean structure, and three months later it's chaos:


my-project/
├── src/
│   ├── component1.py
│   ├── component2.py
│   ├── ... (26 more files)
├── temp/
├── backup/
├── old_backup/
├── Copy of feature.py
├── New File.txt
└── Untitled.py

Existing solutions either:

Don't use AI (just basic linting rules)
Require cloud APIs (your directory structure leaves your machine)
Cost money for what should be a simple dev tool

I wanted something different: AI-powered analysis that respects privacy.
The Solution
I built a directory monitoring tool that uses local LLMs (via Qwen/Ollama) to analyze project structure and give specific, actionable recommendations.

🗂️ What it does

Detects new, removed, or renamed folders
Logs structure changes in real time
Helps you visualize how projects grow, shrink, or get messy

Key features:

🤖 Local LLM analysis (Qwen/Llama)
📊 Beautiful terminal UI with trends
🎯 RAG for pattern recognition
🔒 100% private - no cloud APIs
💾 SQLite for history tracking

┌─────────────────────────────────────┐
│      Your Machine (100% Local)      │
├─────────────────────────────────────┤
│                                     │
│  1. Scan Directory Structure        │
│     ↓                               │
│  2. Store in SQLite                 │
│     ↓                               │
│  3. Generate Embeddings (local)     │
│     ↓                               │
│  4. RAG: Retrieve Similar States    │
│     ↓                               │
│  5. Query Local LLM (Ollama)        │
│     ↓                               │
│  6. Get Analysis & Recommendations  │
│                                     │
└─────────────────────────────────────┘

1. Directory Scanning

The core DirectoryAnalyzer walks the filesystem and tracks:


@dataclass
class DirectorySnapshot:
    timestamp: str
    path: str
    total_files: int
    total_dirs: int
    file_types: Dict[str, int]
    depth_distribution: Dict[int, int]
    naming_violations: List[str]
    largest_files: List[Dict[str, Any]]

Key metrics:

File and directory counts
Naming violations (spaces, temp files, etc.)
Directory depth (detecting over-nesting)
File type distribution
Large files that shouldn't be committed

2. Local Embeddings with RAG

This is where it gets interesting. Instead of just analyzing the current state, I wanted temporal awareness - knowing if you're improving or regressing.

Implementation:


from sentence_transformers import SentenceTransformer

class LocalVectorStore:
    def __init__(self):
        # Runs entirely on your machine - no API calls
        self.model = SentenceTransformer('all-MiniLM-L6-v2')

    def add_snapshot(self, snapshot: DirectorySnapshot):
        # Convert snapshot to text representation
        text = self._snapshot_to_text(snapshot)

        # Generate embedding locally
        embedding = self.model.encode(text)

        # Store in SQLite
        self.db.save_embedding(snapshot_id, embedding)

    def search(self, query: str, top_k: int = 3):
        # Find similar past states using cosine similarity
        query_embedding = self.model.encode(query)

        similarities = []
        for stored_embedding in self.embeddings:
            similarity = cosine_similarity(query_embedding, stored_embedding)
            similarities.append(similarity)

        # Return most similar past states
        return top_k_results(similarities)

Why this matters:

When analyzing the current directory, the system retrieves similar past states:
Current: 28 files in src/
Past (3 months ago): 15 files in src/
Past (1 month ago): 22 files in src/

→ LLM context: "The directory is growing - was 15, then 22, now 28"
This gives the LLM temporal context to make better recommendations.

LLM Analysis with Ollama Instead of cloud APIs, I use Ollama for local LLM inference:


import ollama

def analyze_with_llm(snapshot: DirectorySnapshot, context: str):
    prompt = f"""You are a development standards expert. 

    {context}  # RAG context from similar past states

    Current State:
    - Total Files: {snapshot.total_files}
    - Naming Violations: {len(snapshot.naming_violations)}
    - Max Depth: {max_depth}

    Issues:
    {snapshot.naming_violations[:10]}

    Based on best practices:
    1. Is this messy? (Yes/No)
    2. Top 3 issues?
    3. Specific actions?
    4. Rate messiness 1-10
    """

    response = ollama.chat(
        model='qwen3:8b',
        messages=[{'role': 'user', 'content': prompt}]
    )

    return response['message']['content']

Models tested:

qwen3:8b (5.2GB) - Fast, good quality
qwen2.5:latest (14GB) - Slower but excellent
llama3.2 (7GB) - Balanced option

4. Beautiful Terminal UI

Built with Rich for a modern TUI:


from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from rich.layout import Layout

def create_metrics_panel(result):
    metrics = Table.grid(padding=(0, 2))

    # Messiness score with color coding
    score = result['messiness_score']
    color = "green" if score < 3 else "yellow" if score < 7 else "red"

    metrics.add_row(
        Panel(f"[{color}]{score:.1f}/10[/{color}]", title="Messiness")
    )

    return Panel(metrics, title="Metrics", border_style="blue")

Features:

Real-time metrics cards
Sparkline trend graphs (▁▂▃▄▅▆▇█)
Color-coded scores
LLM analysis display
History tracking

Example Output


$ python monitor_tui.py

Messiness Score: 6.2/10 ⚠️

LLM Analysis:

Yes, this directory structure needs attention.

Top 3 Issues:
1. Excessive files in src/components (28 files) - 
   recommended maximum is 20. Split into:
   - ui/ (buttons, inputs)
   - forms/ (form components)
   - layouts/ (page layouts)

2. Naming violations (8 files):
   - "temp_fix.py" → move to .archive/ or delete
   - "Copy of feature.py" → remove or rename properly
   - Files with spaces → use kebab-case

3. Directory depth exceeds 7 levels - flatten structure

Messiness Rating: 6.2/10 - Moderate attention needed

Trend: 📉 Improving (was 7.8 → 6.2)

Privacy & Security

Everything stays local:


# NO external API calls
❌ openai.ChatCompletion.create()
❌ requests.post('https://api...')
❌ anthropic.messages.create()

## YES local processing
✅ ollama.chat()  # localhost:11434
✅ SentenceTransformer.encode()  # local CPU/GPU
✅ sqlite3.connect()  # local file

Verification:


# Monitor network traffic while running
sudo tcpdump -i any port not 22

## Result: No outbound connections (except Ollama on localhost)

Data stored:

SQLite database: directory_monitor.db
Location: Current directory (portable)
Contents: Timestamps, file counts, violation lists
NOT stored: File contents, sensitive data

Performance

Benchmarks on M1 Mac (8GB RAM):

Operation	Time
Directory scan (1000 files)	~0.3s
Embedding generation	~0.1s
LLM analysis (Qwen3:8b)	~2-3s
Full scan cycle	~3-5s

Memory usage:

Base: ~200MB (Python + dependencies)
With Qwen3:8b loaded: ~5.5GB
With embeddings cached: ~250MB

Optimizations:

Lazy loading of embeddings
Batch processing for large directories
Caching of LLM responses
SQLite indexes on timestamps

Challenges & Solutions

Challenge 1: SQLite Threading
Problem: Flask creates threads, SQLite doesn't like that.


❌ This fails
self.conn = sqlite3.connect(db_path)

# ✅ Solution
self.conn = sqlite3.connect(db_path, check_same_thread=False)

Challenge 2: LLM Consistency
Problem: LLMs are non-deterministic. Same directory, different analysis.
Solution: Structure the output with clear prompts:

prompt = """Rate messiness 1-10 (10 = extremely messy)


Format:
**Messiness Rating**: X/10
**Top 3 Issues**:
1. Issue one
2. Issue two
3. Issue three
"""

Challenge 3: Embedding Quality
Problem: Generic embeddings didn't capture directory-specific patterns well.
Solution: Create domain-specific text representations:

def snapshot_to_text(snapshot):
    return f"""
    Files: {snapshot.total_files}
    Directories: {snapshot.total_dirs}
    Max Depth: {max_depth}
    Violations: {", ".join(snapshot.naming_violations[:5])}
    File Types: {", ".join(snapshot.file_types.keys())}
    """

This improved similarity matching by 40%.

Results

After using it for 2 weeks on 3 projects:

Project	Before	After	Improvement
Project A	7.8/10	2.8/10	64%
Project B	5.2/10	1.9/10	63%
Project C	8.9/10	4.1/10	54%

Most common recommendations:

Split large directories (40% of scans)
Remove temp/backup files (30%)
Fix naming violations (20%)
Flatten deep nesting (10%)

Unexpected benefit: The act of seeing a "messiness score" motivated me to clean up immediately. Gamification works!

Future Improvements

Planned features:

Git integration (track messiness by commit)
Language-specific rules (Python vs JavaScript standards)
Team collaboration (shared standards)
CI/CD integration (fail build if too messy)
More export formats (HTML reports, CSV)

Experimental ideas:

Use computer vision to analyze folder icons
Predict future messiness based on trends
Integration with IDEs (VS Code extension)
Mobile app for quick checks

Try It Yourself

# Clone
git clone https://github.com/sukanto-m/directory-monitor
cd directory-monitor

# Install
pip install -r requirements.txt

# Get Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull model
ollama pull qwen3:8b

# Run
python monitor_tui.py

GitHub: https://github.com/sukanto-m/directory-monitor

Tech Stack

Python 3.9+ - Core language
Ollama - Local LLM inference
sentence-transformers - Local embeddings
Rich - Terminal UI
Flask - Web UI
SQLite - Database
NumPy - Vector operations

Lessons Learned

1. Local-first is viable

I was skeptical that local LLMs could match cloud APIs. I was wrong.

Qwen3:8b gives surprisingly good analysis - sometimes better than GPT-3.5 because it's not overly verbose.

2. RAG adds real value

Without RAG, the LLM just analyzes snapshots independently. With RAG, it understands context and trends.

"You're regressing" hits different than "you have 28 files."

3. UX matters for CLI tools

Adding sparklines, color coding, and real-time updates made the difference between "neat demo" and "actually useful tool."

4. Privacy sells itself

I didn't expect the "100% local" angle to resonate so much. Turns out developers really care about this.

Conclusion

Building a local-first AI tool taught me:

Local LLMs are good enough for many use cases
RAG is powerful even with small datasets
Privacy-focused tools have a market
Python + Rich = beautiful CLIs

The future is local-first AI.

Cloud APIs are convenient, but local processing gives you:

Privacy
Control
No usage limits
Offline capability
No vendor lock-in

Try building something local-first. You might be surprised how capable these models are.

Questions?

Drop a comment! I'm happy to discuss:

RAG implementation details
Local LLM performance
Privacy considerations
Code architecture

Star the repo if you found this interesting: https://github.com/sukanto-m/directory-monitor

Built with Claude AI assistance for implementation guidance. The architecture, design decisions, and integration were collaborative between human direction and AI implementation.

Forem: sukanto-m