Forem: Sanjeev Kumar

Introducing Splunk Native Embedder: Secure Dashboard Embedding, Done Right

Sanjeev Kumar — Wed, 04 Feb 2026 12:06:48 +0000

I’m happy to share that Splunk Native Embedder has been approved and is now available on Splunkbase.

Splunk Native Embedder is a lightweight configuration manager built on Splunk’s native capabilities. In this post, I’ll walk through the technical details behind how the app enables secure cross-origin dashboard embedding, allowing developers to integrate Splunk visualizations into external portals with fine-grained control.

URL:https://splunkbase.splunk.com/app/8405

The Technical Challenge: X-Frame-Options & Cookie Security

Splunk Enterprise is secure by default. While this is a major strength, it introduces two common challenges when embedding Splunk content into external web applications:

1. Clickjacking Protection

Splunk sets the X-Frame-Options: SAMEORIGIN HTTP header by default. This tells browsers to block rendering when the parent page is hosted on a different domain.

2. Cookie Policies

Modern browsers such as Chrome, Safari, and Edge enforce SameSite=Lax by default. This prevents session cookies from being sent in cross-site contexts (like iframes). The result is a familiar authentication loop: users log in successfully, but the session immediately drops because the browser refuses to send the cookie.

The Solution: Native Configuration Management

The Splunk Native Embedder app removes this friction by acting as a UI wrapper around Splunk’s native web.conf configuration endpoints.

1. Managing Frame Security

When embedding is enabled from the app dashboard, the JavaScript controller (embedder_config.js) makes a REST call to the configs/conf-web endpoint. This updates local/web.conf and toggles the required security flags:

[settings]
# Disables the header that blocks cross-origin framing
x_frame_options_sameorigin = false

# Explicitly permits HTML dashboards to function within frames
dashboard_html_allow_iframes = true
dashboard_html_allow_embeddable_content = true

By managing these values directly at the platform level, the app preserves native behavior while ensuring optimal performance.

2. Solving the SameSite Cookie Issue

For authentication to persist inside an iframe, the session cookie must be marked SameSite=None; Secure. The app provides a simple toggle to apply this globally:

[settings]
# REQUIRED for cross-site embedding over HTTPS
cookieSameSite = none

Important: Setting cookieSameSite = none requires HTTPS. If Splunk is accessed over HTTP, modern browsers will reject the cookie entirely due to current security standards.

3. Handling Reverse Proxies & TLS Termination

In many deployments, SSL/TLS is terminated at a load balancer (NGINX, F5), while Splunk runs on HTTP internally. In this setup, Splunk may not detect that traffic is secure and therefore won’t mark cookies as Secure.

To handle this, the app exposes an additional setting:

[settings]
# Forces cookies to be marked 'Secure' even if Splunk sees HTTP traffic
tools.sessions.secure = true

This ensures cookies are accepted by browsers even in reverse-proxy scenarios.

The app is open for use and feedback. By relying entirely on native configuration, the goal is to provide the most stable and Splunk-aligned way to share dashboards externally.

Thanks,
Sanjeev

Building the Ultimate Reddit Scraper: A Full-Featured, API-Free Data Collection Suite

Sanjeev Kumar — Sun, 14 Dec 2025 00:30:13 +0000

Building the Ultimate Reddit Scraper: A Full-Featured, API-Free Data Collection Suite

December 2024 | By Sanjeev Kumar

TL;DR
I built a complete Reddit scraper suite that requires zero API keys. It comes with a beautiful Streamlit dashboard, REST API for integration with tools like Grafana and Metabase, plugin system for post-processing, scheduled scraping, notifications, and much more. Best of all—it’s completely open source.
🔗 GitHub: reddit-universal-scraper

The Problem
If you’ve ever tried to scrape Reddit data for analysis, research, or just personal projects, you know the pain:

Reddit’s API is heavily rate-limited (especially after the 2023 API changes)
API keys require approval and are increasingly restricted
Existing scrapers are often single-purpose - scrape posts OR comments, not both
No easy way to visualize or analyze the data after scraping
Running scrapes manually is tedious - you want automation I decided to solve all of these problems at once. ________________________________________ The Solution: Universal Reddit Scraper Suite After weeks of development, I created a full-featured scraper that: Feature What It Does 📊 Full Scraping Posts, comments, images, videos, galleries—everything 🚫 No API Keys Uses Reddit’s public JSON endpoints and mirrors 📈 Web Dashboard Beautiful 7-tab Streamlit UI for analysis 🚀 REST API Connect Metabase, Grafana, DuckDB, and more 🔌 Plugin System Extensible post-processing (sentiment analysis, deduplication, keywords) 📅 Scheduled Scraping Cron-style automation 📧 Notifications Discord & Telegram alerts when scrapes complete 🐳 Docker Ready One command to deploy anywhere ________________________________________ Architecture Deep Dive How It Works Without API Keys The secret sauce is in the approach. Instead of using Reddit’s official (and restricted) API, I leverage:
Reddit’s public JSON endpoints: Every Reddit page has a .json suffix that returns structured data
Multiple mirror fallbacks: When one source is rate-limited, the scraper automatically rotates through alternatives like Redlib instances
Smart rate limiting: Built-in delays and cool-down periods to stay under the radar MIRRORS = [ "https://old.reddit.com", "https://redlib.catsarch.com", "https://redlib.vsls.cz", "https://r.nf", "https://libreddit.northboot.xyz", "https://redlib.tux.pizza" ] When one source fails, it automatically tries the next. No manual intervention needed. The Core Scraping Engine The scraper operates in three modes:
Full Mode - The complete package python main.py python --mode full --limit 100 This scrapes posts, downloads all media (images, videos, galleries), and fetches comments with their full thread hierarchy.
History Mode - Fast metadata-only python main.py python --mode history --limit 500 Perfect for quickly building a dataset of post metadata without the overhead of media downloads.
Monitor Mode - Live watching
python main.py python --mode monitor
Continuously checks for new posts every 5 minutes. Ideal for tracking breaking news or trending discussions.

The Dashboard Experience
One of the standout features is the 7-tab Streamlit dashboard that makes data exploration a joy:
📊 Overview Tab
At a glance, see: - Total posts and comments - Cumulative score across all posts - Media post breakdown - Posts-over-time chart - Top 10 posts by score
📈 Analytics Tab
This is where it gets interesting: - Sentiment Analysis: Run VADER-based sentiment scoring on your entire dataset - Keyword Cloud: See the most frequently used terms - Best Posting Times: Data-driven insights on when posts get the most engagement
🔍 Search Tab
Full-text search across all scraped data with filters for: - Minimum score - Post type (text, image, video, gallery, link) - Author - Custom sorting
💬 Comments Analysis
• View top-scoring comments
• See who the most active commenters are
• Track comment patterns over time
⚙️ Scraper Controls
Start new scrapes right from the dashboard! Configure: - Target subreddit/user - Post limits - Mode (full/history) - Media and comment toggles
📋 Job History
Full observability into every scrape job: - Status tracking (running, completed, failed) - Duration metrics - Post/comment/media counts - Error logging
🔌 Integrations
Pre-configured instructions for connecting: - Metabase - Grafana - DreamFactory - DuckDB

The Plugin Architecture
I designed a plugin system to allow extensible post-processing. The architecture is simple but powerful:
class Plugin:
"""Base class for all plugins."""
name = "base"
description = "Base plugin"
enabled = True

def process_posts(self, posts):
return posts

def process_comments(self, comments):
return comments
Built-in Plugins
Sentiment Tagger Analyzes the emotional tone of every post and comment using VADER sentiment analysis:
class SentimentTagger(Plugin):
name = "sentiment_tagger"
description = "Adds sentiment scores and labels to posts"

def process_posts(self, posts):
for post in posts:
text = f"{post.get('title', '')} {post.get('selftext', '')}"
score, label = analyze_sentiment(text)
post['sentiment_score'] = score
post['sentiment_label'] = label
return posts
Deduplicator Removes duplicate posts that may appear across multiple scraping sessions.
Keyword Extractor Pulls out the most significant terms from your scraped content for trend analysis.
Creating Your Own Plugin
Drop a new Python file in the plugins/ directory:
from plugins import Plugin

class MyCustomPlugin(Plugin):
name = "my_plugin"
description = "Does something cool"
enabled = True

def process_posts(self, posts):
    # Your logic here
    return posts

Enable plugins during scraping:
python main.py python --mode full --plugins

REST API for External Integrations
The REST API opens up the scraper to a whole ecosystem of tools:
python main.py --api

API at http://localhost:8000

Docs at http://localhost:8000/docs

Key Endpoints
Endpoint Description
GET /posts List posts with filters (subreddit, limit, offset)
GET /comments List comments
GET /subreddits All scraped subreddits
GET /jobs Job history
GET /query?sql=... Raw SQL queries for power users
GET /grafana/query Grafana-compatible time-series data
Real-World Integration: Grafana Dashboard

Install the “JSON API” or “Infinity” plugin in Grafana
Add datasource pointing to http://localhost:8000
Use the /grafana/query endpoint for time-series panels SELECT date(created_utc) as time, COUNT(*) as posts FROM posts GROUP BY date(created_utc) Now you have a real-time dashboard tracking Reddit activity! ________________________________________ Scheduled Scraping & Notifications Automation Made Easy Set up recurring scrapes with cron-style scheduling: # Scrape every 60 minutes python main.py --schedule delhi --every 60

With custom options

python main.py --schedule delhi --every 30 --mode full --limit 50
Get Notified
Configure Discord or Telegram alerts when scrapes complete:

Environment variables

export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/..."
export TELEGRAM_BOT_TOKEN="123456:ABC..."
export TELEGRAM_CHAT_ID="987654321"
Now you get notified with scrape summaries directly in your preferred platform.

Dry Run Mode: Test Before You Commit
One of my favorite features is dry run mode. It simulates the entire scrape without saving any data:
python main.py python --mode full --limit 50 --dry-run
Output:
🧪 DRY RUN MODE - No data will be saved
🧪 DRY RUN COMPLETE!
📊 Would scrape: 100 posts
💬 Would scrape: 245 comments
Perfect for: - Testing your scrape configuration - Estimating data volume before committing - Debugging without cluttering your dataset

Docker Deployment
Quick Start

Build

docker build -t reddit-scraper .

Run a scrape

docker run -v ./data:/app/data reddit-scraper python --limit 100

Run with plugins

docker run -v ./data:/app/data reddit-scraper python --plugins
Full Stack with Docker Compose
docker-compose up -d
This spins up: - Dashboard at http://localhost:8501 - REST API at http://localhost:8000
Deploy to Any VPS
ssh user@your-server-ip
git clone https://github.com/ksanjeev284/reddit-universal-scraper.git
cd reddit-universal-scraper
docker-compose up -d
Open the firewall:
sudo ufw allow 8000
sudo ufw allow 8501
You now have a production-ready Reddit scraping platform!

Data Export Options
CSV (Default)
All scraped data is saved as CSV files: - data/r_/posts.csv - data/r_/comments.csv
Parquet (Analytics-Optimized)
Export to columnar format for analytics tools:
python main.py --export-parquet python
Query directly with DuckDB:
import duckdb
duckdb.query("SELECT * FROM 'data/parquet/*.parquet'").df()
Database Maintenance

Backup

python main.py --backup

Optimize/vacuum

python main.py --vacuum

View job history

python main.py --job-history

Data Schema
Posts Table
Column Description
id Reddit post ID
title Post title
author Username
score Net upvotes
num_comments Comment count
post_type text/image/video/gallery/link
selftext Post body (for text posts)
created_utc Timestamp
permalink Reddit URL
is_nsfw NSFW flag
flair Post flair
sentiment_score -1.0 to 1.0 (with plugins)
Comments Table
Column Description
comment_id Comment ID
post_permalink Parent post URL
author Username
body Comment text
score Upvotes
depth Nesting level
is_submitter Whether author is OP

Use Cases

Academic Research • Analyze subreddit community dynamics • Track sentiment over time during events • Study user engagement patterns
Market Research • Monitor brand mentions • Track product feedback • Identify emerging trends
Content Creation • Find popular topics in your niche • Analyze what makes posts go viral • Discover optimal posting times
Data Journalism • Archive discussions around breaking news • Analyze public sentiment during events • Track narrative evolution
Personal Projects • Build a dataset for ML training • Create Reddit-based recommendation systems • Archive communities you care about ________________________________________ Performance Considerations Respect Reddit’s Servers The scraper includes built-in delays: - 3 second cooldown between API requests - 30 second wait if all mirrors fail - Automatic mirror rotation to distribute load Optimize Your Scrapes • Use --mode history for faster metadata-only scrapes • Use --no-media if you don’t need images/videos • Use --no-comments for post-only data Handle Large Datasets • Parquet export for analytics queries • SQLite database for structured storage • Automatic deduplication to avoid bloat ________________________________________ What’s Next? Roadmap I’m actively developing new features: • ☐ Async scraping for even faster data collection • ☐ Multi-subreddit monitoring in a single command • ☐ Email notifications in addition to Discord/Telegram • ☐ Cloud deployment templates (AWS, GCP, Azure) • ☐ Web-based scraper configuration (no CLI needed) ________________________________________ Getting Started Prerequisites • Python 3.10+ • pip Installation # Clone the repo git clone https://github.com/ksanjeev284/reddit-universal-scraper.git cd reddit-universal-scraper

Install dependencies

pip install -r requirements.txt

Your first scrape

python main.py python --mode full --limit 50

Launch the dashboard

python main.py --dashboard
That’s it! You’re now scraping Reddit like a pro.

Contributing
This is an open-source project and contributions are welcome! Whether it’s: - Bug fixes - New plugins - Documentation improvements - Feature suggestions
Open an issue or submit a PR on GitHub.

If you found this useful, consider giving the project a ⭐ on GitHub!

Connect
• GitHub: @ksanjeev284
• Project: reddit-universal-scraper

Getting Started with Open Source: A Beginner’s Guide

Sanjeev Kumar — Mon, 16 Dec 2024 04:47:35 +0000

Have you ever wanted to contribute to open-source projects but felt overwhelmed by where to begin? You're not alone! Many aspiring developers face the same challenge, but the good news is that the open-source community is one of the most welcoming spaces for learners and professionals alike. Let’s break it down step by step.

Why Contribute to Open Source?
Contributing to open-source projects isn’t just about writing code. It’s about:

Gaining real-world experience.
Building a portfolio of meaningful work.
Collaborating with developers across the globe.
Giving back to the tech community.
How to Get Started?
Here’s a simple roadmap to begin your open-source journey:

Choose a Language or Framework You Know
Start with a language or framework you’re comfortable with. This makes understanding the project easier.

Find Beginner-Friendly Issues
Look for issues labeled “Good First Issue”, “Help Wanted”, or similar tags. These are specifically designed for new contributors.

Read the Contribution Guide
Most projects have a CONTRIBUTING.md file that outlines how to get started, set up the project, and make your first contribution.

Start Small
Begin with documentation fixes, bug reports, or adding simple tests. These tasks are a great way to ease into a project.

Where to Find Projects?
This is where things can get tricky. With so many projects out there, how do you find the right one for you? That’s where ContributeOpenSource.com comes in.

Our platform helps you discover open-source projects that match your interests and skill level. You can:

Sort projects by programming languages or frameworks.
Find beginner-friendly issues curated for first-time contributors.
Explore trending repositories in the open-source community.
We make it easier for you to dive into open-source without the overwhelm.

Your First PR (Pull Request)
Once you find an issue to work on:

Fork the repository.
Make your changes in a new branch.
Test your changes.
Submit your pull request with a clear description.
Celebrate when it gets merged—congratulations, you’re officially an open-source contributor! 🎉

Final Thoughts
Open source isn’t just about contributing code. It’s about learning, growing, and becoming a part of a global community. Don’t be afraid to ask questions, seek help, and make mistakes—that’s how you grow.

Ready to get started? Head over to ContributeOpenSource.com to find your first project today!