Forem: Freemen HOUNGBEDJI

I Built a URL Threat Analyzer That Detects Phishing in Real-Time — Here's How It Works published

Freemen HOUNGBEDJI — Sat, 09 May 2026 12:29:40 +0000

Every day, millions of people click malicious URLs without knowing it. Phishing pages look legitimate. Shortened links hide their destination. Freshly registered domains slip past blockers.
I got tired of copy-pasting URLs into clunky online scanners and waiting 10 seconds for a result — so I built SnifURL.
Live
GitHub

What is SnifURL?

SnifURL is a real-time URL threat analyzer. You paste a URL, it runs it through 13 heuristic checks in parallel, and returns a risk score from 0 to 100 with a full breakdown of every signal it found.
No black box. No "we flagged it, trust us." Every point added or subtracted is explained.

{ "url": "https://paypa1-secure-login.tk/verify", "score": 91, "risk_level": "CRITICAL", "recommendation": "Phishing almost certain — block immediately", "details": [ "Suspicious TLD (.tk — Freenom free domain) (+18)", "Brand impersonation detected: 'paypa1' looks like 'paypal' (+25)", "Domain registered 3 days ago (+20)", "Homograph character detected: '1' replacing 'l' (+15)", "No valid SSL certificate (+13)" ] }

The Risk Score System

🟢 0–14 — SAFE No significant indicators. The URL looks clean.
🟡 15–34 — LOW Probably safe, but worth a second look before clicking.
🟠 35–54 — MEDIUM Suspicious. Inspect manually before trusting it.
🔴 55–74 — HIGH Very suspicious — block it unless you're 100% sure of the source.
🚨 75–100 — CRITICAL Phishing almost certain. Block immediately.
The score is additive: each indicator adds or subtracts points based on its confidence weight. This makes results explainable and debuggable.

The 13 Heuristic Indicators

Here's exactly what SnifURL checks — and why each one matters.
1. TLD Reputation
Free TLDs like .tk, .ml, .cf, .ga, .gq (Freenom) are massively over-represented in phishing campaigns. Crypto TLDs like .xyz, .top also rank high. High-risk ccTLDs are weighted accordingly.

2. Direct IP in URL
http://192.168.1.1/login — no legitimate service asks you to log in via a raw IP address. This is a strong phishing signal.

3. Brand Impersonation
The engine scans the subdomain and path against a dictionary of major brands (paypal, google, apple, amazon, microsoft, netflix...) and checks for typosquatting variations.

4. Homograph Attacks
Unicode lookalike characters are a sneaky attack vector. pаypal.com with a Cyrillic а looks identical to paypal.com. SnifURL detects non-ASCII characters and flags them.

5. SSL Certificate Validity
Checks if the certificate is valid and evaluates issuer trust. A self-signed cert on a "bank login" page? Red flag.

6. WHOIS Domain Age
New domains are suspicious. A domain registered 2 days ago asking for your password is a massive red flag. SnifURL queries WHOIS and penalizes recently created domains.

7. DNS Resolution
If the domain doesn't even resolve — it's either dead or a trap. Simple check, non-zero value.

URL Shorteners bit.ly, tinyurl.com, t.co — shorteners hide the real destination. SnifURL flags them and encourages redirect inspection.

9. Double File Extensions
invoice.pdf.exe is a classic malware trick. The engine scans the URL path for chained extensions.

10. @ Character in URL
http://legitimate.com@evil.com/phish — browsers follow the part after @. This old trick still catches people off guard.

11. Non-Standard Ports
https://mybank.com:8080/login is a sign something is off. Legit services run on 443.

12. Excessive URL Encoding
%68%74%74%70%73%3A%2F%2F... — heavy encoding often signals an attempt to obfuscate a malicious destination.

13. Subdomain Depth & Hyphens
secure-login-verify.paypal.accounts.malicious.com — deep subdomains and excessive hyphens are classic phishing patterns.

How the Parallel Analysis Works

Network checks (DNS, WHOIS, SSL) are the slow part. Running them sequentially would add 3–5 seconds of latency. Instead, SnifURL runs them concurrently with Python's ThreadPoolExecutor:
python# analyseur_url.py (simplified)
from concurrent.futures import ThreadPoolExecutor, as_completed

def analyze(url: str) -> dict:
results = {}

with ThreadPoolExecutor(max_workers=4) as executor:
    futures = {
        executor.submit(check_ssl, url): "ssl",
        executor.submit(check_whois, url): "whois",
        executor.submit(check_dns, url): "dns",
        executor.submit(check_redirects, url): "redirects",
    }
    for future in as_completed(futures):
        key = futures[future]
        results[key] = future.result()

# Heuristic checks (no I/O — instant)
results["tld"] = check_tld(url)
results["homograph"] = check_homograph(url)
results["brand"] = check_brand_impersonation(url)
# ...

return scoring.compute(results)

Total analysis time on most URLs: under 2 seconds.

The Scoring Engine

scoring.py takes the raw results and computes the final score. Each signal has a weight based on its phishing correlation strength:
python# scoring.py (simplified)
WEIGHTS = {
"suspicious_tld": 18,
"brand_impersonation": 25,
"homograph": 15,
"no_ssl": 13,
"recently_registered": 20, # < 30 days
"direct_ip": 22,
"url_shortener": 10,
"double_extension": 15,
"at_char": 12,
"non_standard_port": 8,
"excessive_encoding": 10,
"subdomain_depth": 8,
}

def compute(results: dict) -> dict:
score = 50 # neutral start
details = []

for key, weight in WEIGHTS.items():
    if results.get(key):
        score += weight
        details.append(f"{DESCRIPTIONS[key]} (+{weight})")

# Legitimate signals reduce the score
if results.get("ssl_valid"):
    score -= 10
if results.get("domain_age_days", 0) > 365:
    score -= 12
# ...

score = max(0, min(100, score))
return { "score": score, "risk_level": get_level(score), "details": details }

The weights were calibrated manually against a dataset of known phishing URLs from PhishTank and legitimate domains. It's not ML — it's deliberate, transparent logic.

The API

Everything is accessible via a simple REST API:
curl -X POST https://snifurl.online/analyze \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}'
{ "url": "https://example.com", "score": 12, "risk_level": "SAFE", "recommendation": "LEGITIMATE — No significant indicators", "details": ["Known legitimate root domain (example.com) (-12)"], "indicators": { "uses_https": true, "dns_exists": true, "has_ip": false, "suspicious_tld": null, "recently_created": false, "ssl_certificate": { "valid": true, "issuer_org": "DigiCert" }, "whois": { "age_days": 9862, "registrar": "..." } } }
You can integrate this into your own app, browser extension, or Slack bot to flag links before users click them.

Stack & Deployment

Backend → Python 3.11 / Flask
Analysis engine → analyseur_url.py + scoring.py
Network checks → dnspython, python-whois, ssl (parallel execution)
Frontend → Vanilla HTML / CSS / JS
Server → Vultr VPS / Ubuntu 22.04 / Nginx + Gunicorn

No framework overhead on the frontend. The UI is intentionally lean — the value is in the analysis, not the interface.

Run It Yourself

git clone https://github.com/FreemenTech/Snifurl cd snifurl pip install -r requirements.txt python app.py

Open http://localhost:5000 — that's it.

What's Next

A few things I'm thinking about:

Browser extension — flag URLs inline before you click
Bulk analysis endpoint — scan a list of URLs in one request
ML scoring layer — train a model on PhishTank data to complement the heuristics
Redirect chain analysis — follow shorteners and score the final destination

Try It

🔗Website
Paste a suspicious URL you've received recently .I'd be curious what score it gets. Drop it in the comments.
And if you find a false positive or a phishing URL that slips through open an issue. That's how the weights get better.

Built by Freemen HOUNGBEDJI — MIT License

🔥 Building Vigilo: A 15MB File Integrity Monitor That Outperforms OSSEC

Freemen HOUNGBEDJI — Fri, 06 Feb 2026 13:37:21 +0000

🚨 The Night Everything Broke
My former employer got hacked.

At 3:07 AM, an attacker modified /etc/sudoers.
No alerts.
No logs reviewed.
No alarms.

We noticed it 3 days later.
That night, I opened a blank Python file:
file_monitoring.py
That file became Vigilo.

❌ Why Existing Tools Failed Us
We didn’t ignore security tools.
We tried them.

OSSEC
❌ 200+ MB RAM on idle
❌ 50+ lines of XML config for a single file
❌ False positives drowning real alerts

Wazuh
❌ 30+ minutes installation
❌ YAML + agents + dashboards
❌ Massive overkill for < 50 servers

What we needed was simple:

“Tell me immediately when a critical file changes.
Nothing more. Nothing less.”

🛠️ What I Built Instead

Vigilo is a lightweight File Integrity Monitor built for real-world ops teams.

💾 < 15 MB RAM
⚡ < 1 second alert latency
🧠 Zero configuration hell
🐍 100% Python, easy to hack & extend

🎯 Core Design Principles

Install in under 60 seconds
Minimal memory footprint
Readable, auditable code
Production-ready from day one

🧩 Technical Architecture
vigilo/
├── file_monitoring.py # SHA-256 + metadata tracking
├── FileWatcher.py # inotify wrapper with smart filtering
├── logger.py # thread-safe persistent storage
├── alert_manager.py # system / future email / webhook alerts
└── main.py # CLI entrypoint

⚡ Performance Optimizations That Matter
1️⃣ In-Memory Baseline Cache
Before (slow, disk-bound):
def handle_event(path):
baseline = read_from_disk(path)

After (fast, O(1)):
def handle_event(path):
baseline = self.cache[path]
📈 Result: 10× faster event processing.

2️⃣ Atomic Writes (No Corrupted State)
temp = "file_info.json.tmp"
write_to(temp)
os.replace(temp, "file_info.json") # POSIX atomic

Even a crash won’t break your baseline.

3️⃣ Thread Safety (Because Events Are Brutal)
_db_lock = threading.Lock()
with _db_lock:
save_state()
No race conditions. No silent corruption.

📊 Benchmarks

Test: Monitoring /etc/nginx/nginx.conf
Load: 10 modifications / second

Tool CPU RAM False Positives
Vigilo 0.8% 11 MB 0
OSSEC 3.2% 187 MB 14
Wazuh 5.1% 243 MB 23

🚀 Usage

Install

pip install -r requirements.txt

Add file to monitoring

vigilo add /etc/nginx/nginx.conf --preset full --alert system

Start monitoring

vigilo start
Modify the file → desktop alert in < 1 second.

🔐 Security First (Yes, Even the Tool)

✅ Path whitelisting (no /etc/shadow)
✅ Command injection protection (shlex.quote)
✅ Strict file permissions (0o600)
✅ Input validation on all CLI arguments

🌙 Lessons Learned (The Hard Way)

Night 1 — The Watchdog Spam
One file triggered 1000+ events/min.
👉 Fixed by filtering events before processing.

Night 2 — The Performance Breakthrough
Added in-memory cache.
👉 Everything became 10× faster.

Night 3 — The Security Obsession
Found a command injection flaw in alert execution.
👉 6 hours replacing everything with shlex.quote().

Worth it.

❌ When NOT to Use Vigilo
You manage 1000+ servers
You need advanced event correlation
You require enterprise SLAs
You must meet strict compliance (→ use Tripwire / Wazuh)
✅ When Vigilo Is Perfect:
< 100 servers
You want something that just works
You hate false positives
You like tools you can actually read and modify

🌍 Open Source
🔗 GitHub: https://github.com/FreemenTech/Vigilo
📄 License: MIT
Contributions are welcome 🙌
💬 Questions or feedback? Drop them below 👇