<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sanjeev Kumar</title>
    <description>The latest articles on Forem by Sanjeev Kumar (@ksanjeev284).</description>
    <link>https://forem.com/ksanjeev284</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F256654%2F9cab6257-93d3-4929-ba2c-17e59bcc7439.jpg</url>
      <title>Forem: Sanjeev Kumar</title>
      <link>https://forem.com/ksanjeev284</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ksanjeev284"/>
    <language>en</language>
    <item>
      <title>Introducing Splunk Native Embedder: Secure Dashboard Embedding, Done Right</title>
      <dc:creator>Sanjeev Kumar</dc:creator>
      <pubDate>Wed, 04 Feb 2026 12:06:48 +0000</pubDate>
      <link>https://forem.com/ksanjeev284/introducing-splunk-native-embedder-secure-dashboard-embedding-done-right-8o5</link>
      <guid>https://forem.com/ksanjeev284/introducing-splunk-native-embedder-secure-dashboard-embedding-done-right-8o5</guid>
      <description>

&lt;h1&gt;
  
  
  I’m happy to share that Splunk Native Embedder has been approved and is now available on Splunkbase.
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Splunk Native Embedder&lt;/strong&gt; is a lightweight configuration manager built on Splunk’s native capabilities. In this post, I’ll walk through the technical details behind how the app enables secure cross-origin dashboard embedding, allowing developers to integrate Splunk visualizations into external portals with fine-grained control.&lt;/p&gt;

&lt;p&gt;URL:&lt;a href="https://splunkbase.splunk.com/app/8405" rel="noopener noreferrer"&gt;https://splunkbase.splunk.com/app/8405&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Challenge: X-Frame-Options &amp;amp; Cookie Security
&lt;/h2&gt;

&lt;p&gt;Splunk Enterprise is secure by default. While this is a major strength, it introduces two common challenges when embedding Splunk content into external web applications:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Clickjacking Protection
&lt;/h3&gt;

&lt;p&gt;Splunk sets the &lt;code&gt;X-Frame-Options: SAMEORIGIN&lt;/code&gt; HTTP header by default. This tells browsers to block rendering when the parent page is hosted on a different domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cookie Policies
&lt;/h3&gt;

&lt;p&gt;Modern browsers such as Chrome, Safari, and Edge enforce &lt;code&gt;SameSite=Lax&lt;/code&gt; by default. This prevents session cookies from being sent in cross-site contexts (like iframes). The result is a familiar authentication loop: users log in successfully, but the session immediately drops because the browser refuses to send the cookie.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: Native Configuration Management
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Splunk Native Embedder&lt;/strong&gt; app removes this friction by acting as a UI wrapper around Splunk’s native &lt;code&gt;web.conf&lt;/code&gt; configuration endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Managing Frame Security
&lt;/h3&gt;

&lt;p&gt;When embedding is enabled from the app dashboard, the JavaScript controller (&lt;code&gt;embedder_config.js&lt;/code&gt;) makes a REST call to the &lt;code&gt;configs/conf-web&lt;/code&gt; endpoint. This updates &lt;code&gt;local/web.conf&lt;/code&gt; and toggles the required security flags:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[settings]&lt;/span&gt;
&lt;span class="c"&gt;# Disables the header that blocks cross-origin framing
&lt;/span&gt;&lt;span class="py"&gt;x_frame_options_sameorigin&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;false&lt;/span&gt;

&lt;span class="c"&gt;# Explicitly permits HTML dashboards to function within frames
&lt;/span&gt;&lt;span class="py"&gt;dashboard_html_allow_iframes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;dashboard_html_allow_embeddable_content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By managing these values directly at the platform level, the app preserves native behavior while ensuring optimal performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Solving the SameSite Cookie Issue
&lt;/h3&gt;

&lt;p&gt;For authentication to persist inside an iframe, the session cookie must be marked &lt;code&gt;SameSite=None; Secure&lt;/code&gt;. The app provides a simple toggle to apply this globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[settings]&lt;/span&gt;
&lt;span class="c"&gt;# REQUIRED for cross-site embedding over HTTPS
&lt;/span&gt;&lt;span class="py"&gt;cookieSameSite&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;none&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Setting &lt;code&gt;cookieSameSite = none&lt;/code&gt; requires HTTPS. If Splunk is accessed over HTTP, modern browsers will reject the cookie entirely due to current security standards.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. Handling Reverse Proxies &amp;amp; TLS Termination
&lt;/h3&gt;

&lt;p&gt;In many deployments, SSL/TLS is terminated at a load balancer (NGINX, F5), while Splunk runs on HTTP internally. In this setup, Splunk may not detect that traffic is secure and therefore won’t mark cookies as Secure.&lt;/p&gt;

&lt;p&gt;To handle this, the app exposes an additional setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[settings]&lt;/span&gt;
&lt;span class="c"&gt;# Forces cookies to be marked 'Secure' even if Splunk sees HTTP traffic
&lt;/span&gt;&lt;span class="py"&gt;tools.sessions.secure&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures cookies are accepted by browsers even in reverse-proxy scenarios.&lt;/p&gt;




&lt;p&gt;The app is open for use and feedback. By relying entirely on native configuration, the goal is to provide the most stable and Splunk-aligned way to share dashboards externally.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br&gt;
&lt;strong&gt;Sanjeev&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>splunk</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>Building the Ultimate Reddit Scraper: A Full-Featured, API-Free Data Collection Suite</title>
      <dc:creator>Sanjeev Kumar</dc:creator>
      <pubDate>Sun, 14 Dec 2025 00:30:13 +0000</pubDate>
      <link>https://forem.com/ksanjeev284/building-the-ultimate-reddit-scraper-a-full-featured-api-free-data-collection-suite-4al3</link>
      <guid>https://forem.com/ksanjeev284/building-the-ultimate-reddit-scraper-a-full-featured-api-free-data-collection-suite-4al3</guid>
      <description>&lt;p&gt;Building the Ultimate Reddit Scraper: A Full-Featured, API-Free Data Collection Suite&lt;/p&gt;

&lt;p&gt;December 2024 | By Sanjeev Kumar&lt;/p&gt;




&lt;p&gt;TL;DR&lt;br&gt;
I built a complete Reddit scraper suite that requires zero API keys. It comes with a beautiful Streamlit dashboard, REST API for integration with tools like Grafana and Metabase, plugin system for post-processing, scheduled scraping, notifications, and much more. Best of all—it’s completely open source.&lt;br&gt;
🔗 GitHub: reddit-universal-scraper&lt;/p&gt;




&lt;p&gt;The Problem&lt;br&gt;
If you’ve ever tried to scrape Reddit data for analysis, research, or just personal projects, you know the pain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Reddit’s API is heavily rate-limited (especially after the 2023 API changes)&lt;/li&gt;
&lt;li&gt; API keys require approval and are increasingly restricted&lt;/li&gt;
&lt;li&gt; Existing scrapers are often single-purpose - scrape posts OR comments, not both&lt;/li&gt;
&lt;li&gt; No easy way to visualize or analyze the data after scraping&lt;/li&gt;
&lt;li&gt; Running scrapes manually is tedious - you want automation
I decided to solve all of these problems at once.
________________________________________
The Solution: Universal Reddit Scraper Suite
After weeks of development, I created a full-featured scraper that:
Feature What It Does
📊 Full Scraping  Posts, comments, images, videos, galleries—everything
🚫 No API Keys    Uses Reddit’s public JSON endpoints and mirrors
📈 Web Dashboard  Beautiful 7-tab Streamlit UI for analysis
🚀 REST API   Connect Metabase, Grafana, DuckDB, and more
🔌 Plugin System  Extensible post-processing (sentiment analysis, deduplication, keywords)
📅 Scheduled Scraping Cron-style automation
📧 Notifications  Discord &amp;amp; Telegram alerts when scrapes complete
🐳 Docker Ready   One command to deploy anywhere
________________________________________
Architecture Deep Dive
How It Works Without API Keys
The secret sauce is in the approach. Instead of using Reddit’s official (and restricted) API, I leverage:&lt;/li&gt;
&lt;li&gt; Reddit’s public JSON endpoints: Every Reddit page has a .json suffix that returns structured data&lt;/li&gt;
&lt;li&gt; Multiple mirror fallbacks: When one source is rate-limited, the scraper automatically rotates through alternatives like Redlib instances&lt;/li&gt;
&lt;li&gt; Smart rate limiting: Built-in delays and cool-down periods to stay under the radar
MIRRORS = [
"&lt;a href="https://old.reddit.com" rel="noopener noreferrer"&gt;https://old.reddit.com&lt;/a&gt;",
"&lt;a href="https://redlib.catsarch.com" rel="noopener noreferrer"&gt;https://redlib.catsarch.com&lt;/a&gt;",
"&lt;a href="https://redlib.vsls.cz" rel="noopener noreferrer"&gt;https://redlib.vsls.cz&lt;/a&gt;",
"&lt;a href="https://r.nf" rel="noopener noreferrer"&gt;https://r.nf&lt;/a&gt;",
"&lt;a href="https://libreddit.northboot.xyz" rel="noopener noreferrer"&gt;https://libreddit.northboot.xyz&lt;/a&gt;",
"&lt;a href="https://redlib.tux.pizza" rel="noopener noreferrer"&gt;https://redlib.tux.pizza&lt;/a&gt;"
]
When one source fails, it automatically tries the next. No manual intervention needed.
The Core Scraping Engine
The scraper operates in three modes:&lt;/li&gt;
&lt;li&gt;Full Mode - The complete package
python main.py python --mode full --limit 100
This scrapes posts, downloads all media (images, videos, galleries), and fetches comments with their full thread hierarchy.&lt;/li&gt;
&lt;li&gt;History Mode - Fast metadata-only
python main.py python --mode history --limit 500
Perfect for quickly building a dataset of post metadata without the overhead of media downloads.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Monitor Mode - Live watching&lt;br&gt;
python main.py python --mode monitor&lt;br&gt;
Continuously checks for new posts every 5 minutes. Ideal for tracking breaking news or trending discussions.&lt;/p&gt;



&lt;p&gt;The Dashboard Experience&lt;br&gt;
One of the standout features is the 7-tab Streamlit dashboard that makes data exploration a joy:&lt;br&gt;
📊 Overview Tab&lt;br&gt;
At a glance, see: - Total posts and comments - Cumulative score across all posts - Media post breakdown - Posts-over-time chart - Top 10 posts by score&lt;br&gt;
📈 Analytics Tab&lt;br&gt;
This is where it gets interesting: - Sentiment Analysis: Run VADER-based sentiment scoring on your entire dataset - Keyword Cloud: See the most frequently used terms - Best Posting Times: Data-driven insights on when posts get the most engagement&lt;br&gt;
🔍 Search Tab&lt;br&gt;
Full-text search across all scraped data with filters for: - Minimum score - Post type (text, image, video, gallery, link) - Author - Custom sorting&lt;br&gt;
💬 Comments Analysis&lt;br&gt;
• View top-scoring comments&lt;br&gt;
• See who the most active commenters are&lt;br&gt;
• Track comment patterns over time&lt;br&gt;
⚙️ Scraper Controls&lt;br&gt;
Start new scrapes right from the dashboard! Configure: - Target subreddit/user - Post limits - Mode (full/history) - Media and comment toggles&lt;br&gt;
📋 Job History&lt;br&gt;
Full observability into every scrape job: - Status tracking (running, completed, failed) - Duration metrics - Post/comment/media counts - Error logging&lt;br&gt;
🔌 Integrations&lt;br&gt;
Pre-configured instructions for connecting: - Metabase - Grafana - DreamFactory - DuckDB&lt;/p&gt;



&lt;p&gt;The Plugin Architecture&lt;br&gt;
I designed a plugin system to allow extensible post-processing. The architecture is simple but powerful:&lt;br&gt;
class Plugin:&lt;br&gt;
"""Base class for all plugins."""&lt;br&gt;
name = "base"&lt;br&gt;
description = "Base plugin"&lt;br&gt;
enabled = True&lt;/p&gt;

&lt;p&gt;def process_posts(self, posts):&lt;br&gt;
    return posts&lt;/p&gt;

&lt;p&gt;def process_comments(self, comments):&lt;br&gt;
    return comments&lt;br&gt;
Built-in Plugins&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Sentiment Tagger Analyzes the emotional tone of every post and comment using VADER sentiment analysis:&lt;br&gt;
class SentimentTagger(Plugin):&lt;br&gt;
name = "sentiment_tagger"&lt;br&gt;
description = "Adds sentiment scores and labels to posts"&lt;/p&gt;

&lt;p&gt;def process_posts(self, posts):&lt;br&gt;
    for post in posts:&lt;br&gt;
        text = f"{post.get('title', '')} {post.get('selftext', '')}"&lt;br&gt;
        score, label = analyze_sentiment(text)&lt;br&gt;
        post['sentiment_score'] = score&lt;br&gt;
        post['sentiment_label'] = label&lt;br&gt;
    return posts&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deduplicator Removes duplicate posts that may appear across multiple scraping sessions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keyword Extractor Pulls out the most significant terms from your scraped content for trend analysis.&lt;br&gt;
Creating Your Own Plugin&lt;br&gt;
Drop a new Python file in the plugins/ directory:&lt;br&gt;
from plugins import Plugin&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;class MyCustomPlugin(Plugin):&lt;br&gt;
    name = "my_plugin"&lt;br&gt;
    description = "Does something cool"&lt;br&gt;
    enabled = True&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def process_posts(self, posts):
    # Your logic here
    return posts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Enable plugins during scraping:&lt;br&gt;
python main.py python --mode full --plugins&lt;/p&gt;




&lt;p&gt;REST API for External Integrations&lt;br&gt;
The REST API opens up the scraper to a whole ecosystem of tools:&lt;br&gt;
python main.py --api&lt;/p&gt;

&lt;h1&gt;
  
  
  API at &lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt;
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Docs at &lt;a href="http://localhost:8000/docs" rel="noopener noreferrer"&gt;http://localhost:8000/docs&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;Key Endpoints&lt;br&gt;
Endpoint    Description&lt;br&gt;
GET /posts  List posts with filters (subreddit, limit, offset)&lt;br&gt;
GET /comments   List comments&lt;br&gt;
GET /subreddits All scraped subreddits&lt;br&gt;
GET /jobs   Job history&lt;br&gt;
GET /query?sql=...  Raw SQL queries for power users&lt;br&gt;
GET /grafana/query  Grafana-compatible time-series data&lt;br&gt;
Real-World Integration: Grafana Dashboard&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Install the “JSON API” or “Infinity” plugin in Grafana&lt;/li&gt;
&lt;li&gt; Add datasource pointing to &lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Use the /grafana/query endpoint for time-series panels
SELECT date(created_utc) as time, COUNT(*) as posts 
FROM posts GROUP BY date(created_utc)
Now you have a real-time dashboard tracking Reddit activity!
________________________________________
Scheduled Scraping &amp;amp; Notifications
Automation Made Easy
Set up recurring scrapes with cron-style scheduling:
# Scrape every 60 minutes
python main.py --schedule delhi --every 60&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  With custom options
&lt;/h1&gt;

&lt;p&gt;python main.py --schedule delhi --every 30 --mode full --limit 50&lt;br&gt;
Get Notified&lt;br&gt;
Configure Discord or Telegram alerts when scrapes complete:&lt;/p&gt;

&lt;h1&gt;
  
  
  Environment variables
&lt;/h1&gt;

&lt;p&gt;export DISCORD_WEBHOOK_URL="&lt;a href="https://discord.com/api/webhooks/.." rel="noopener noreferrer"&gt;https://discord.com/api/webhooks/..&lt;/a&gt;."&lt;br&gt;
export TELEGRAM_BOT_TOKEN="123456:ABC..."&lt;br&gt;
export TELEGRAM_CHAT_ID="987654321"&lt;br&gt;
Now you get notified with scrape summaries directly in your preferred platform.&lt;/p&gt;




&lt;p&gt;Dry Run Mode: Test Before You Commit&lt;br&gt;
One of my favorite features is dry run mode. It simulates the entire scrape without saving any data:&lt;br&gt;
python main.py python --mode full --limit 50 --dry-run&lt;br&gt;
Output:&lt;br&gt;
🧪 DRY RUN MODE - No data will be saved&lt;br&gt;
🧪 DRY RUN COMPLETE!&lt;br&gt;
   📊 Would scrape: 100 posts&lt;br&gt;
   💬 Would scrape: 245 comments&lt;br&gt;
Perfect for: - Testing your scrape configuration - Estimating data volume before committing - Debugging without cluttering your dataset&lt;/p&gt;




&lt;p&gt;Docker Deployment&lt;br&gt;
Quick Start&lt;/p&gt;

&lt;h1&gt;
  
  
  Build
&lt;/h1&gt;

&lt;p&gt;docker build -t reddit-scraper .&lt;/p&gt;

&lt;h1&gt;
  
  
  Run a scrape
&lt;/h1&gt;

&lt;p&gt;docker run -v ./data:/app/data reddit-scraper python --limit 100&lt;/p&gt;

&lt;h1&gt;
  
  
  Run with plugins
&lt;/h1&gt;

&lt;p&gt;docker run -v ./data:/app/data reddit-scraper python --plugins&lt;br&gt;
Full Stack with Docker Compose&lt;br&gt;
docker-compose up -d&lt;br&gt;
This spins up: - Dashboard at &lt;a href="http://localhost:8501" rel="noopener noreferrer"&gt;http://localhost:8501&lt;/a&gt; - REST API at &lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt;&lt;br&gt;
Deploy to Any VPS&lt;br&gt;
ssh user@your-server-ip&lt;br&gt;
git clone &lt;a href="https://github.com/ksanjeev284/reddit-universal-scraper.git" rel="noopener noreferrer"&gt;https://github.com/ksanjeev284/reddit-universal-scraper.git&lt;/a&gt;&lt;br&gt;
cd reddit-universal-scraper&lt;br&gt;
docker-compose up -d&lt;br&gt;
Open the firewall:&lt;br&gt;
sudo ufw allow 8000&lt;br&gt;
sudo ufw allow 8501&lt;br&gt;
You now have a production-ready Reddit scraping platform!&lt;/p&gt;




&lt;p&gt;Data Export Options&lt;br&gt;
CSV (Default)&lt;br&gt;
All scraped data is saved as CSV files: - data/r_/posts.csv - data/r_/comments.csv&lt;br&gt;
Parquet (Analytics-Optimized)&lt;br&gt;
Export to columnar format for analytics tools:&lt;br&gt;
python main.py --export-parquet python&lt;br&gt;
Query directly with DuckDB:&lt;br&gt;
import duckdb&lt;br&gt;
duckdb.query("SELECT * FROM 'data/parquet/*.parquet'").df()&lt;br&gt;
Database Maintenance&lt;/p&gt;

&lt;h1&gt;
  
  
  Backup
&lt;/h1&gt;

&lt;p&gt;python main.py --backup&lt;/p&gt;

&lt;h1&gt;
  
  
  Optimize/vacuum
&lt;/h1&gt;

&lt;p&gt;python main.py --vacuum&lt;/p&gt;

&lt;h1&gt;
  
  
  View job history
&lt;/h1&gt;

&lt;p&gt;python main.py --job-history&lt;/p&gt;




&lt;p&gt;Data Schema&lt;br&gt;
Posts Table&lt;br&gt;
Column  Description&lt;br&gt;
id  Reddit post ID&lt;br&gt;
title   Post title&lt;br&gt;
author  Username&lt;br&gt;
score   Net upvotes&lt;br&gt;
num_comments    Comment count&lt;br&gt;
post_type   text/image/video/gallery/link&lt;br&gt;
selftext    Post body (for text posts)&lt;br&gt;
created_utc Timestamp&lt;br&gt;
permalink   Reddit URL&lt;br&gt;
is_nsfw NSFW flag&lt;br&gt;
flair   Post flair&lt;br&gt;
sentiment_score -1.0 to 1.0 (with plugins)&lt;br&gt;
Comments Table&lt;br&gt;
Column  Description&lt;br&gt;
comment_id  Comment ID&lt;br&gt;
post_permalink  Parent post URL&lt;br&gt;
author  Username&lt;br&gt;
body    Comment text&lt;br&gt;
score   Upvotes&lt;br&gt;
depth   Nesting level&lt;br&gt;
is_submitter    Whether author is OP&lt;/p&gt;




&lt;p&gt;Use Cases&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Academic Research
• Analyze subreddit community dynamics
• Track sentiment over time during events
• Study user engagement patterns&lt;/li&gt;
&lt;li&gt;Market Research
• Monitor brand mentions
• Track product feedback
• Identify emerging trends&lt;/li&gt;
&lt;li&gt;Content Creation
• Find popular topics in your niche
• Analyze what makes posts go viral
• Discover optimal posting times&lt;/li&gt;
&lt;li&gt;Data Journalism
• Archive discussions around breaking news
• Analyze public sentiment during events
• Track narrative evolution&lt;/li&gt;
&lt;li&gt;Personal Projects
• Build a dataset for ML training
• Create Reddit-based recommendation systems
• Archive communities you care about
________________________________________
Performance Considerations
Respect Reddit’s Servers
The scraper includes built-in delays: - 3 second cooldown between API requests - 30 second wait if all mirrors fail - Automatic mirror rotation to distribute load
Optimize Your Scrapes
• Use --mode history for faster metadata-only scrapes
• Use --no-media if you don’t need images/videos
• Use --no-comments for post-only data
Handle Large Datasets
• Parquet export for analytics queries
• SQLite database for structured storage
• Automatic deduplication to avoid bloat
________________________________________
What’s Next? Roadmap
I’m actively developing new features:
• ☐ Async scraping for even faster data collection
• ☐ Multi-subreddit monitoring in a single command
• ☐ Email notifications in addition to Discord/Telegram
• ☐ Cloud deployment templates (AWS, GCP, Azure)
• ☐ Web-based scraper configuration (no CLI needed)
________________________________________
Getting Started
Prerequisites
• Python 3.10+
• pip
Installation
# Clone the repo
git clone &lt;a href="https://github.com/ksanjeev284/reddit-universal-scraper.git" rel="noopener noreferrer"&gt;https://github.com/ksanjeev284/reddit-universal-scraper.git&lt;/a&gt;
cd reddit-universal-scraper&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Install dependencies
&lt;/h1&gt;

&lt;p&gt;pip install -r requirements.txt&lt;/p&gt;

&lt;h1&gt;
  
  
  Your first scrape
&lt;/h1&gt;

&lt;p&gt;python main.py python --mode full --limit 50&lt;/p&gt;

&lt;h1&gt;
  
  
  Launch the dashboard
&lt;/h1&gt;

&lt;p&gt;python main.py --dashboard&lt;br&gt;
That’s it! You’re now scraping Reddit like a pro.&lt;/p&gt;




&lt;p&gt;Contributing&lt;br&gt;
This is an open-source project and contributions are welcome! Whether it’s: - Bug fixes - New plugins - Documentation improvements - Feature suggestions&lt;br&gt;
Open an issue or submit a PR on GitHub.&lt;/p&gt;




&lt;p&gt;If you found this useful, consider giving the project a ⭐ on GitHub!&lt;/p&gt;




&lt;p&gt;Connect&lt;br&gt;
• GitHub: &lt;a class="mentioned-user" href="https://dev.to/ksanjeev284"&gt;@ksanjeev284&lt;/a&gt;&lt;br&gt;
• Project: reddit-universal-scraper&lt;/p&gt;

</description>
      <category>reddit</category>
      <category>scraper</category>
      <category>powerplatform</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Getting Started with Open Source: A Beginner’s Guide</title>
      <dc:creator>Sanjeev Kumar</dc:creator>
      <pubDate>Mon, 16 Dec 2024 04:47:35 +0000</pubDate>
      <link>https://forem.com/ksanjeev284/getting-started-with-open-source-a-beginners-guide-4o63</link>
      <guid>https://forem.com/ksanjeev284/getting-started-with-open-source-a-beginners-guide-4o63</guid>
      <description>&lt;p&gt;Have you ever wanted to contribute to open-source projects but felt overwhelmed by where to begin? You're not alone! Many aspiring developers face the same challenge, but the good news is that the open-source community is one of the most welcoming spaces for learners and professionals alike. Let’s break it down step by step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Contribute to Open Source?&lt;/strong&gt;&lt;br&gt;
Contributing to open-source projects isn’t just about writing code. It’s about:&lt;/p&gt;

&lt;p&gt;Gaining real-world experience.&lt;br&gt;
Building a portfolio of meaningful work.&lt;br&gt;
Collaborating with developers across the globe.&lt;br&gt;
Giving back to the tech community.&lt;br&gt;
How to Get Started?&lt;br&gt;
Here’s a simple roadmap to begin your open-source journey:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose a Language or Framework You Know&lt;/strong&gt;&lt;br&gt;
Start with a language or framework you’re comfortable with. This makes understanding the project easier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find Beginner-Friendly Issues&lt;/strong&gt;&lt;br&gt;
Look for issues labeled “Good First Issue”, “Help Wanted”, or similar tags. These are specifically designed for new contributors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read the Contribution Guide&lt;/strong&gt;&lt;br&gt;
Most projects have a CONTRIBUTING.md file that outlines how to get started, set up the project, and make your first contribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start Small&lt;/strong&gt;&lt;br&gt;
Begin with documentation fixes, bug reports, or adding simple tests. These tasks are a great way to ease into a project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to Find Projects?&lt;/strong&gt;&lt;br&gt;
This is where things can get tricky. With so many projects out there, how do you find the right one for you? That’s where ContributeOpenSource.com comes in.&lt;/p&gt;

&lt;p&gt;Our platform helps you discover open-source projects that match your interests and skill level. You can:&lt;/p&gt;

&lt;p&gt;Sort projects by programming languages or frameworks.&lt;br&gt;
Find beginner-friendly issues curated for first-time contributors.&lt;br&gt;
Explore trending repositories in the open-source community.&lt;br&gt;
We make it easier for you to dive into open-source without the overwhelm.&lt;/p&gt;

&lt;p&gt;Your First PR (Pull Request)&lt;br&gt;
Once you find an issue to work on:&lt;/p&gt;

&lt;p&gt;Fork the repository.&lt;br&gt;
Make your changes in a new branch.&lt;br&gt;
Test your changes.&lt;br&gt;
Submit your pull request with a clear description.&lt;br&gt;
Celebrate when it gets merged—congratulations, you’re officially an open-source contributor! 🎉&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Thoughts&lt;/strong&gt;&lt;br&gt;
Open source isn’t just about contributing code. It’s about learning, growing, and becoming a part of a global community. Don’t be afraid to ask questions, seek help, and make mistakes—that’s how you grow.&lt;/p&gt;

&lt;p&gt;Ready to get started? Head over to &lt;a href="https://www.contributeopensource.com/" rel="noopener noreferrer"&gt;ContributeOpenSource.com&lt;/a&gt; to find your first project today!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
