Forem: Jer Catallo

Automated Web Content Discovery: How Attackers Find Hidden Paths on Your Web Server in Minutes Using Free Tools

Jer Catallo — Fri, 08 May 2026 12:08:00 +0000

Web applications often have directories and files that are not linked from the main pages. These paths can expose admin panels, backup files, logs, and config data. Automated content discovery tools like Gobuster use wordlists to test hundreds or thousands of paths quickly, and finding these before attackers do is a key part of web application security testing.

Using the Acme IT Support practice target on TryHackMe, you can see exactly how an attacker builds up knowledge of a target step by step, starting from a small fast scan and moving to deeper coverage with file extension checks.

Ethical Considerations

Only scan systems you own or have written permission to test.
Set clear scope limits before scanning, including target hosts, paths, time windows, and allowed methods.
Start with safe scan settings to avoid breaking services.
Handle found data with care. Do not take, share, or publish sensitive content from logs, backups, or archives.
Remove IPs, tokens, credentials, usernames, and other sensitive details before sharing findings publicly.
Report high-risk findings through proper disclosure channels.
Follow all applicable laws, platform rules, and company policies.

Step 1: Run a Baseline Scan with a Small Wordlist

You can start with a small and fast wordlist to find common directories and files. This gives you quick results without waiting too long.

gobuster dir --url http://<target-ip>/ \
  -w /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt

The dir mode looks for directories. The --url flag sets the target. The -w flag points to the wordlist file. Gobuster uses 10 threads by default and treats 404 responses as negative results.

The scan found 9 paths:

Path	Status	Notes
`/assets`	301	Redirect, static resources directory
`/contact`	200	Contact page
`/customers`	302	Redirect, possible user area
`/development.log`	200	Sensitive, exposed development log
`/monthly`	200	Monthly content endpoint
`/news`	200	News section
`/private`	301	Redirect, restricted area
`/robots.txt`	200	Crawler exclusion file, useful for recon
`/sitemap.xml`	200	Sitemap, reveals additional paths

Finding /development.log at status 200 shows a high-risk misconfiguration. Development logs can contain stack traces, database queries, and sometimes credentials.

Remediation: Remove development logs from production servers. Use proper logging systems that store logs outside the web root. Add access controls if logs must be kept on the server.

Step 2: Expand Coverage with a Larger Wordlist

You can use a bigger wordlist to find less common paths. Adding more threads and filtering noise makes the scan faster and the output cleaner.

gobuster dir -u http://<target-ip>/ \
  -w /usr/share/wordlists/dirb/big.txt \
  -t 50 -b 404,403 --no-error

The -u flag sets the target URL. The -w flag points to the larger dirb wordlist with over 20,000 entries. The -t 50 flag increases threads for faster scanning. The -b 404,403 flag hides not-found and forbidden responses. The --no-error flag removes error output for cleaner results.

This scan found 2 new paths not seen in Step 1:

Path	Status	Notes
`/cookie-test`	200	New, cookie testing endpoint exposed
`/sitemap_xml`	200	New, alternate sitemap path

Using -b 404,403 removes noise and shows only useful results. Setting threads to 50 makes the scan much faster on stable targets.

Remediation: Remove internal test endpoints like /cookie-test from production. Use a single sitemap path and redirect alternates to avoid confusion.

Step 3: Deep Scan with File Extension Checking

You can add file extension checking to find backup files, config files, and other sensitive file types. This multiplies your test cases and gives wider coverage.

gobuster dir -u http://<target-ip>/ \
  -w /usr/share/wordlists/dirbuster/directory-list-2.3-medium.txt \
  -x txt,json,bak,zip,md

The -x flag adds each extension to every wordlist entry. For example, backup becomes backup.txt, backup.json, backup.bak, and so on. This helps you find files that directory-only scans will miss.

This scan found 1 critical path:

Path	Status	Notes
`/tmp.zip`	200	Critical, archive file exposed on web root

Adding -x tests each wordlist entry with every extension, which gives much wider coverage. Finding /tmp.zip shows why extension scanning is important. Backup and temp files left in web-accessible paths are a common issue.

Remediation: Remove all backup and archive files from the web root. Use deployment scripts that clean up temp files. Store backups in secure locations outside the web server.

Key Findings

Path	Status	Risk
`/development.log`	200	High, may contain credentials or stack traces
`/tmp.zip`	200	High, archive with unknown contents exposed
`/private`	301	Medium, restricted area worth investigating
`/customers`	302	Medium, potential user data area
`/cookie-test`	200	Low-Medium, exposes internal test endpoint
`/robots.txt`	200	Informational, reveals disallowed paths
`/sitemap.xml`	200	Informational, additional path disclosure

Summary

Small wordlists are fast but miss many paths. You should layer scans with bigger wordlists to get better coverage.
File extension scanning with -x is needed to find backup files like .bak and .zip, and config leaks.
Filtering noise with -b for block status codes gives cleaner output for faster review.
A 200 response means the content is accessible. 301 and 302 mean redirects worth following. Even 403 confirms a path exists.
Exposed files like development.log and tmp.zip are real-world issues you will often see in penetration tests.

You can use these steps on your own assessments to find hidden content and improve your security posture. Always stay within scope and handle any sensitive data you find with care.

If you found this helpful, drop a like and share it with someone learning security. If you have questions, ran into something different in your own lab, or want to share your results, leave a comment below. Always happy to connect and talk about security, recon techniques, or anything AppSec related.

Feel free to connect with me on LinkedIn

Always open to connecting with people in security, development, or both. Whether you are building something, breaking something, or just getting started, feel free to reach out.

OSINT Content Discovery: Why You Need to Know What's Publicly Exposed About Your Web Assets

Jer Catallo — Thu, 07 May 2026 12:15:00 +0000

Passive content discovery helps you map attack surfaces without touching target systems. You can use public search engines, browser extensions, web archives, code repositories, and cloud storage references to find exposed assets. This guide covers five methods you can apply in your own authorized security assessments.

Ethical Considerations

Only use these methods on assets you own or have clear written permission to test.

Get written permission before you target any domain, repo, or cloud resource.
Follow all laws, platform terms, and bug bounty scope rules.
Do not try to access accounts, use found credentials, steal data, or leave backdoors.
Stop and report right away if you find sensitive data.
Do not proceed if you are not sure about your authorization.

Google Dorking

Google search operators let you filter results to specific domains, file types, URL paths, and page titles. These operators are passive and use only public indexed data.

Step 1: Use `site:` to Scope Your Search

The site: operator limits results to one domain or hosting platform.

site:<target-domain> "<keyword>"

This query shows only pages from the target domain that contain your keyword. You can use it to find public pages hosted on a specific platform.

You can see indexed GitHub Pages sites that match the keyword. This shows how site: limits search to one hosting domain.

Step 2: Use `filetype:` to Find Exposed Documents

The filetype: operator filters results by file extension.

"<target-phrase>" filetype:<extension>

This query finds indexed files of a specific type that contain your target phrase. You can use it to map exposed documents and artifacts.

You can see public Jupyter notebooks that may hold code, data samples, or analysis work.

Remediation: Treat found documents as sensitive even if they are public. Do not copy or share private content. Report exposure through approved channels only.

Step 3: Use `inurl:` for Path-Based Discovery

The inurl: operator targets pages with specific words in the URL path.

inurl:<path-keyword> "<target-phrase>"

This query finds pages with your keyword in the URL path. You can use it to find specific page types.

You can see personal or professional about pages that give more context about the target.

Remediation: Avoid personal targeting, doxxing, or profiling. Collect only the data you need for your security task.

Step 4: Use `intitle:` for Title-Based Discovery

The intitle: operator matches pages with specific text in the HTML title tag.

intitle:"<title-text>" "<keyword1>" "<keyword2>"

This query finds pages with your text in the title plus extra keywords. You can use it to find project pages tied to certain technologies.

You can see developer portfolio pages that list their technology stack in the page title.

Remediation: Keep searches within approved scope. Do not use findings to target hobby or student projects.

Wappalyzer Technology Fingerprinting

Wappalyzer detects web technologies from the browser. It reads HTTP headers, HTML, JavaScript files, and loaded resources to identify frameworks, CDNs, and services.

Step 5: Fingerprint OWASP Juice Shop Stack

Open the target URL in a browser with Wappalyzer installed.

https://juice-shop.github.io

The extension scans the page and shows detected technologies in its panel.

Wappalyzer found front-end libraries, CDN providers, and hosting indicators on juice-shop.github.io. You can use this stack data to plan your next assessment steps.

Remediation: Only fingerprint where recon is allowed. Do not assume you can attack just because you see stack details. Use this data for defensive testing.

Step 6: Analyze GitHub Technology Profile

Apply Wappalyzer to large platforms to see their technology footprint.

https://github.com

The extension finds frameworks, analytics tools, CDN providers, and cloud services.

Wappalyzer found React, React Router, GSAP, AWS-related services, and more on github.com. This shows how fingerprinting works on large applications.

Remediation: Follow platform terms and rate limits. Do not scrape data in an abusive way. Use collected data only for authorized tasks.

Wayback Machine Archive Analysis

The Wayback Machine stores historical snapshots of web pages. You can use it to find old URLs, retired endpoints, and content versions no longer on the live site.

Step 7: Search for Historical Snapshots

Enter the target domain into the Wayback Machine search.

https://web.archive.org/web/*/<target-domain>

Browse the calendar timeline to see archived snapshots from different dates.

You can use this as your entry point for historical content analysis. You can find old URLs, retired endpoints, and content versions that are no longer on the live site.

Remediation: Just because content is archived does not mean you can test current systems. Do not use archived findings to access restricted areas without approval. Check ownership and scope before you test any found endpoint.

GitHub OSINT

GitHub search helps you find code references, config files, and metadata. Public repos often contain clues about infrastructure, dependencies, and potential misconfigurations.

Step 8: Search GitHub for Target Artifacts

Use the GitHub search page with targeted queries.

https://github.com/search?q=<target-query>

You can find repos, code snippets, and config files in your assessment scope.

Step 9: Use GitHub Dork Patterns for Credential Discovery

Organization-scoped searches limit results to one company's public repos.

org:<company-name> <secret-keyword>

Add keywords like "password", "token", "api_key", or "secret" to check for credential exposure.

# =========================================================
# GITHUB OSINT: HIGH-VALUE TARGET DORKS
# =========================================================

# --- Cloud & Infrastructure Secrets ---

# Searches for AWS Access Key IDs within PEM certificate files
"AKIA" extension:pem

# Locates exposed AWS credential configuration files
"AWS_ACCESS_KEY_ID" filename:credentials

# Finds unprotected SSH private keys for server access
"BEGIN OPENSSH PRIVATE KEY" filename:id_rsa

# Discovers Google Cloud Platform (GCP) service account credentials
filename:config "google_application_credentials"


# --- Database & Authentication Leaks ---

# Finds hardcoded MongoDB connection strings in JavaScript files
"mongodb+srv://" extension:js

# Searches for Java/MySQL database connection strings with passwords
"jdbc:mysql://" "password"

# Locates WordPress configuration files containing database passwords
filename:wp-config.php "DB_PASSWORD"

# Finds PostgreSQL password files for local database instances
filename:.pgpass "localhost:5432"


# --- API Keys & Tokens ---

# Hunts for hardcoded Bearer authentication tokens in Python scripts
"authorization: bearer" extension:py

# Locates exposed Django/Python web framework secret keys
filename:settings.py "SECRET_KEY="

# Finds live Stripe payment processing API keys
"api.stripe.com" "sk_live_"

# Discovers exposed Slack webhook URLs
"hooks.slack.com/services/" extension:js


# --- Targeted Corporate Recon ---
# (Replace 'companyname' with your target organization)

# Searches a specific organization's repos for internal Jira passwords
org:companyname "jira_password"

# Finds Atlassian/Confluence access tokens for a specific target domain
"companyname.atlassian.net" "token"

# Locates terminal history files showing SSH connections to a target
filename:.bash_history "ssh user@companyname"

# Discovers internal corporate network routing or configuration files
"corp.companyname.internal" extension:conf

You can see high-value search patterns for cloud credentials, database leaks, and token discovery. The sheet includes org-scoped searches like org:companyname for focused recon.

Remediation: Only use this in authorized training, internal audits, or approved bug bounty scopes. Never use found secrets or credentials. Report exposed credentials through approved incident channels right away.

S3 Bucket Discovery

Amazon S3 buckets often show up in public references through naming patterns, source code, and config files. You can find them using search operators and verify access with AWS CLI.

Step 10: Find S3 Buckets Through Public References

Search for public S3 bucket references with Google dorking.

site:s3.amazonaws.com "<target-company>"

You can also search GitHub for bucket names in source code and config files.

Step 11: Check Bucket Access with AWS CLI

Use the AWS CLI to check if a bucket allows public listing without credentials.

aws s3 ls s3://<bucket-name> --no-sign-request

Get the bucket ACL to see access permissions. A successful response means the bucket allows anonymous access.

aws s3api get-bucket-acl --bucket <bucket-name> --no-sign-request

# =========================================================
# S3 BUCKET OSINT & RECONNAISSANCE
# =========================================================

# --- 1. Passive Discovery (Google & GitHub Dorks) ---

# Google Dork: Finds publicly indexed S3 buckets for a target
site:s3.amazonaws.com intitle:"index of" "companyname"

# Google Dork: Searches for exposed bucket URLs in target's documents
"s3.amazonaws.com" ext:pdf "companyname"

# GitHub Dork: Locates bucket URLs hardcoded in a company's repositories
"s3.amazonaws.com" org:companyname

# GitHub Dork: Finds custom S3 endpoints mapped to a target domain
"companyname.s3.amazonaws.com"


# --- 2. Active Enumeration (Brute-Force Naming Conventions) ---
# Common permutations used in automated tools (e.g., ffuf, Gobuster)
# Format: https://{target}-{keyword}.s3.amazonaws.com

companyname-assets
companyname-public
companyname-private
companyname-dev
companyname-backup
companyname-staging
companyname-prod
companyname-www


# --- 3. Access Verification (AWS CLI) ---
# Testing for insecure permissions (Requires AWS CLI installed)

# Attempt to list the contents of a bucket anonymously (No credentials)
aws s3 ls s3://companyname-assets --no-sign-request

# Attempt to copy a sensitive file from the public bucket to local machine
aws s3 cp s3://companyname-backup/db_dump.sql . --no-sign-request

# Attempt to write a harmless file to test for insecure "Write" permissions
aws s3 cp test_file.txt s3://companyname-public/ --no-sign-request

You can see passive discovery patterns for S3 references using Google and GitHub queries. The image includes naming permutation examples and CLI commands to check bucket permissions.

Remediation: Cloud enumeration needs explicit permission from the asset owner. Do not list, download, upload, or change bucket content unless you have written authorization. If you find an exposed bucket, stop testing and report it with minimal proof.

Summary

You now have five passive content discovery methods you can use in authorized assessments. Google dorking helps you map indexed content with targeted operators. Wappalyzer gives fast technology stack details. The Wayback Machine reveals historical web data and retired endpoints. GitHub OSINT uncovers code references and config metadata. S3 recon shows cloud storage discovery patterns. All methods are passive and you should only use them within authorized scopes with clear permission.

Feel free to connect with me on LinkedIn

Always open to connecting with people in security, development, or both. Whether you are building something, breaking something, or just getting started, feel free to reach out.

Manual Web Content Discovery: How You Can Find Hidden Paths Before Attackers Do

Jer Catallo — Mon, 04 May 2026 12:05:00 +0000

Manual content discovery is a core skill in application security testing. Instead of relying only on automated scanners, you can use simple HTTP requests and browser tools to find exposed files, hidden paths, and technology fingerprints. This covers techniques like checking robots.txt, fingerprinting favicons, reading sitemap.xml, inspecting HTTP headers, and spotting framework markers in HTML source.

These methods help you understand a target's structure and find information disclosure issues early, before running heavy scanning tools.

Ethical Considerations

Only test systems you own or have explicit written permission to assess.
Follow the defined scope, timing, and rules of engagement set by the owner.
Stop immediately if you find data outside scope and report it through approved channels.
Use findings for defense and remediation, not exploitation.
Treat discovered paths like admin or staff portals as sensitive data. Do not brute-force or abuse them.
Do not publish sensitive headers, tokens, or internal values outside approved reports.

Robots.txt Analysis

The robots.txt file tells web crawlers which paths to avoid. It can accidentally reveal sensitive routes like admin panels or staff portals.

curl -s https://<target-domain>/robots.txt

This command fetches the robots.txt file so you can check Disallow and Allow directives for hidden paths.

The response shows a Disallow: /staff-portal directive under User-agent: *. This means the site owner does not want crawlers to index the staff portal, but the path is still visible to anyone who checks this file.

Result: The /staff-portal route is exposed through robots.txt. While this does not mean the path is vulnerable, it gives you a starting point for further authorized testing.

Remediation: Remove sensitive paths from robots.txt. Use proper authentication and authorization controls to protect those routes instead. Security through obscurity is not a reliable protection.

Favicon Fingerprinting

Favicons are small icons that browsers display in tabs. Different frameworks and products use unique favicon files, so you can calculate a hash and match it against known databases to identify the technology.

curl -s https://<target-domain>/favicon.ico | md5sum

This downloads the favicon and calculates its MD5 hash for comparison.

The browser network tab confirms a successful HTTP 200 response for favicon.ico.

The calculated MD5 hash is f276b19aabcb4ae8cda4d22625c6735f.

Searching this hash in the OWASP favicon database returns a match for cgiirc (0.5.9).

Result: The favicon hash maps to cgiirc (0.5.9), an IRC web client. This suggests the target may reuse assets from this product or run it in the background. You can use this information to check for known issues with this version.

Remediation: Replace default framework or third-party favicons with a custom one. This prevents passive technology identification through favicon hashing.

Sitemap.xml Enumeration

The sitemap.xml file lists pages that the site wants search engines to index. It often reveals old routes, API endpoints, or parameterized URLs you might not find through normal browsing.

curl -s https://<target-domain>/sitemap.xml

This retrieves the sitemap to find discoverable paths and endpoints.

The sitemap contains multiple URL entries including /news/, /contact, and parameterized article paths with sequential IDs like news/article?id=1, news/article?id=2, and news/article?id=3.

Result: The sitemap exposes several routes and a pattern for article IDs. You can use this to map out the content structure and check for IDOR or other parameter-based issues on these endpoints.

Remediation: Avoid listing sensitive or internal endpoints in sitemap.xml. Only include public-facing, intended content. For parameterized URLs, validate and authorize each request server-side.

HTTP Header Inspection

HTTP response headers contain metadata about the server, security configuration, and sometimes version information. Missing security headers or verbose server details can reveal weaknesses.

curl -I https://<target-domain>

This sends a HEAD request to get only the response headers without the full page body.

The headers show Server: nginx/1.18.0 (Ubuntu) and a custom X-FLAG: THM{HEADER_FLAG} header.

Result: The Server header leaks the exact web server version and operating system. This helps you narrow down potential version-specific issues. The response also lacks important security headers like Content-Security-Policy and Strict-Transport-Security, which means the site may be vulnerable to clickjacking or downgrade attacks.

Remediation: Configure your web server to suppress or mask the Server header. Add security headers like Content-Security-Policy, Strict-Transport-Security, X-Frame-Options, and X-Content-Type-Options. You can use tools like securityheaders.com to check your current header posture.

Framework Stack Identification

Web frameworks often leave markers in HTML source code, such as generator comments or meta tags. These markers reveal the technology stack and sometimes the exact version.

curl -s https://<target-domain> | grep -i "generated\|framework"

This fetches the homepage HTML and filters for framework-related comments.

The HTML source contains a comment showing the page was generated using the THM Framework.

Visiting the framework reference URL confirms it is the THM Web Framework with visible version details.

Result: The source comment reveals THM Framework v1.2 as the underlying technology. You can now research this framework for known misconfigurations, default paths, or version-specific vulnerabilities.

Remediation: Strip generator comments and version markers from production HTML before deployment. Configure your build pipeline or template engine to exclude debug and version metadata from rendered output.

Summary

Manual content discovery gives you a clear picture of a target without heavy tooling. You can see how robots.txt can leak sensitive paths, favicon hashes can identify technologies, sitemap.xml can map out hidden routes, HTTP headers can expose server versions and missing security controls, and HTML source comments can reveal framework details. These techniques work well as a first step before running automated scanners and help build a stronger picture of the target's attack surface.

Feel free to connect with me on LinkedIn

Always open to connecting with people in security, development, or both. Whether you are building something, breaking something, or just getting started, feel free to reach out.

Subdomain Enumeration: How Attackers Find What You Forgot to Hide

Jer Catallo — Sat, 02 May 2026 11:41:50 +0000

Subdomain enumeration is the process of finding all subdomains that belong to a target domain. Each subdomain is a potential entry point, making this a key step in external reconnaissance. In this write-up, we walk through the subdomain enumeration techniques tested in a hands-on lab, so you can see the tools, commands, and results along the way.

There are two main approaches:

Passive enumeration: Uses public data sources like search engines, certificate transparency logs, and third-party APIs. This method does not send direct requests to the target, so it has low risk of detection.
Active enumeration: Sends direct requests to DNS servers or web servers using wordlists. This method finds more results but creates network traffic that can be logged or detected.

We demonstrate both approaches below, so you can see how they complement each other and why relying on only one method leaves blind spots.

Ethical Considerations

Subdomain enumeration is a reconnaissance activity. It has legal and operational impact. You should follow these rules in every engagement:

Authorization is mandatory. Active DNS brute forcing or host-header fuzzing without written permission can break laws such as CFAA and local statutes.
Passive does not always mean harmless. Passive OSINT can still show sensitive information, so handle it with care.
Rate limiting matters. Use low thread counts (for example, -t 1) and wait between requests to avoid service disruption.
Scope validation is required. A discovered host is not automatically in scope. Always check every asset against the approved scope list.
Responsible disclosure applies. Report unintended exposed infrastructure through proper authorized channels.

All activity in this write-up ran against:

Personal domain (jercarlocatallo.com), owned by the author.
TryHackMe lab environments, authorized training infrastructure.

Demonstration

Search Engine Dorking

Search engine dorking is the simplest passive technique. By using the site: operator, you can ask Google which subdomains it has indexed for a domain. No traffic reaches the target, and results are available in seconds.

We use HackTheBox below so you can see how this technique works.

HackTheBox query:

site:*.hackthebox.com -site:www.hackthebox.com

Indexed results include subdomains such as roadmap, jobs, ctf, status, and trust. This confirms indexing-based visibility and helps you prioritize deeper checks.

Takeaway: Search engines know only what they have crawled. Results are quick to get but limited to indexed assets. This technique alone will miss subdomains that search engines never discovered.

Certificate Transparency Lookup

Certificate Transparency (CT) logs record every SSL/TLS certificate issued by participating authorities. crt.sh is a public search interface for these logs. Unlike search engines, CT logs can reveal hostnames that were never indexed or are no longer active.

We check tryhackme.com below so you can see how CT logs compare to search engine results.

tryhackme.com:

https://crt.sh/?q=tryhackme.com

Certificate entries show wildcard and service-specific names. You can see that CT logs include historical names not currently indexed by search engines.

Takeaway: CT logs provide deeper hostname visibility through certificate history. They often reveal names that search engines miss, including wildcard entries and expired certificates. However, CT logs cannot find subdomains that never had a certificate issued.

Passive Aggregation with Sublist3r

Sublist3r automates passive enumeration by querying multiple third-party sources (search engines, DNS aggregators, VirusTotal, and more) in a single run. We run it against a personal domain so you can see what public data sources know.

Command:

sublist3r -d jercarlocatallo.com

We can see 3 unique subdomains were discovered:

www.jercarlocatallo.com
m.jercarlocatallo.com
wwww.jercarlocatallo.com

Note that wwww.jercarlocatallo.com (four w's) is a misspelled typo domain, which is interesting because it shows up in third-party data sources, possibly from a crawl or a misconfigured link somewhere.

Two sources failed during collection: DNSDumpster (CSRF token error) and VirusTotal (blocking requests). Passive output quality depends on source availability and rate limits.

Remediation:

Monitor certificate transparency logs for your domains to detect unauthorized subdomain creation.
Remove unused or forgotten subdomains from DNS to reduce attack surface.

Takeaway: Sublist3r is efficient for gathering what third-party sources already know, but it cannot discover subdomains that no source has indexed. Infrastructure names like dev, admin, and vpn often do not appear in passive data because no one has publicly referenced them.

Virtual Host Discovery with ffuf

Not all subdomains resolve in public DNS. Some exist only as virtual hosts behind a single IP address. ffuf can find these by fuzzing the Host header in HTTP requests and filtering out default responses.

We run this in a TryHackMe lab environment so you can see the technique in action.

Command:

ffuf -w /usr/share/wordlists/SecLists/Discovery/DNS/namelist.txt \
  -H "Host: FUZZ.acmeitsupport.thm" \
  -u http://10.48.142.81 \
  -fs 2395

We can see two virtual hosts were found: delta and yellow. This confirms that HTTP-layer discovery can expose hosts not visible in public OSINT datasets.

Remediation:

Configure web servers to return consistent responses for unknown Host headers.
Use a default catch-all virtual host that does not leak information about other hosted services.

Takeaway: Virtual host fuzzing reveals assets that DNS and passive sources cannot find. It is an active technique that requires sending real HTTP requests, so it is more detectable but more complete for web-hosted infrastructure.

Active DNS Brute Force with Gobuster

Gobuster sends direct DNS queries for each word in a wordlist, resolving subdomains that passive sources never collected. We run this against a personal domain to find subdomains that Sublist3r missed.

Command:

gobuster dns \
  -d jercarlocatallo.com \
  -w subdomains-top1million-5000.txt \
  -t 1 \
  --resolver 8.8.8.8 \
  --protocol tcp

Standard DNS typically uses UDP, which can drop packets under load. TCP reduces timeout noise and improves consistency for wordlist-based enumeration.

You can see 8 subdomains resolved:

Subdomain	IP Resolved
`www.jercarlocatallo.com`	64.29.17.65, 216.198.79.65
`mail.jercarlocatallo.com`	127.0.0.1
`dev.jercarlocatallo.com`	127.0.0.1
`admin.jercarlocatallo.com`	127.0.0.1
`vpn.jercarlocatallo.com`	127.0.0.1
`api.jercarlocatallo.com`	127.0.0.1
`staging.jercarlocatallo.com`	127.0.0.1
`uat.jercarlocatallo.com`	127.0.0.1

www returned two public IPs (64.29.17.65 and 216.198.79.65), which suggests a round-robin or load-balanced setup. Some words (autos, soap, chemie) produced i/o timeouts, which is expected resolver noise. The scan completed all 4,989 words. Loopback results indicate locally configured DNS entries that may map to non-public infrastructure.

Remediation:

Use split-horizon DNS to separate internal and external DNS records.
Avoid exposing internal infrastructure names (dev, staging, uat, admin) in public DNS.
Implement DNSSEC to prevent DNS spoofing and cache poisoning attacks.

Takeaway: Active brute forcing found 8 subdomains where Sublist3r found only 3. The 5 additional names (mail, dev, admin, vpn, api, staging, uat) were not indexed by any passive source. As you can see, this gap shows why active enumeration is necessary for full coverage.

Summary

Passive techniques show what public data sources know. Active techniques show what DNS and application behavior expose. When you combine both approaches, you improve attack surface visibility and reduce blind spots.

Sublist3r found 3 publicly known subdomains from third-party data, including a typo domain (wwww.jercarlocatallo.com). Gobuster found 8 subdomains through direct DNS resolution, adding infrastructure names like dev, admin, vpn, api, staging, uat, and mail that no passive source had indexed. That gap between passive and active results is why using both methods together matters.

For defenders, the takeaway is clear: monitor certificate logs, clean up unused DNS records, and use split-horizon DNS to keep internal names out of public resolvers.

Feel free to connect with me on LinkedIn

Always open to connecting with people in security, development, or both. Whether you are building something, breaking something, or just getting started, feel free to reach out.

Forem: Jer Catallo

Automated Web Content Discovery: How Attackers Find Hidden Paths on Your Web Server in Minutes Using Free Tools

Ethical Considerations

Step 1: Run a Baseline Scan with a Small Wordlist

Step 2: Expand Coverage with a Larger Wordlist

Step 3: Deep Scan with File Extension Checking

Key Findings

Summary

OSINT Content Discovery: Why You Need to Know What's Publicly Exposed About Your Web Assets

Ethical Considerations

Google Dorking

Step 1: Use site: to Scope Your Search

Step 2: Use filetype: to Find Exposed Documents

Step 3: Use inurl: for Path-Based Discovery

Step 4: Use intitle: for Title-Based Discovery

Wappalyzer Technology Fingerprinting

Step 5: Fingerprint OWASP Juice Shop Stack

Step 6: Analyze GitHub Technology Profile

Wayback Machine Archive Analysis

Step 7: Search for Historical Snapshots

GitHub OSINT

Step 8: Search GitHub for Target Artifacts

Step 9: Use GitHub Dork Patterns for Credential Discovery

S3 Bucket Discovery

Step 10: Find S3 Buckets Through Public References

Step 11: Check Bucket Access with AWS CLI

Summary

Manual Web Content Discovery: How You Can Find Hidden Paths Before Attackers Do

Ethical Considerations

Robots.txt Analysis

Favicon Fingerprinting

Sitemap.xml Enumeration

HTTP Header Inspection

Framework Stack Identification

Summary

Subdomain Enumeration: How Attackers Find What You Forgot to Hide

Ethical Considerations

Demonstration

Search Engine Dorking

Certificate Transparency Lookup

Passive Aggregation with Sublist3r

Virtual Host Discovery with ffuf

Active DNS Brute Force with Gobuster

Summary

Step 1: Use `site:` to Scope Your Search

Step 2: Use `filetype:` to Find Exposed Documents

Step 3: Use `inurl:` for Path-Based Discovery

Step 4: Use `intitle:` for Title-Based Discovery