DEV Community

Cover image for Scrapebase + Permit.io: Web Scraping with API-First Authorization
Tamizh
Tamizh

Posted on

4 3 3 3 3

Scrapebase + Permit.io: Web Scraping with API-First Authorization

This is a submission for the Permit.io Authorization Challenge: Permissions Redefined

What I Built

I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.

In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.

Key Features

  • Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
  • API Key Authentication: Simple authentication using API keys
  • Role-Based Access Control: Permissions managed through Permit.io
  • Domain Blacklist System: Resource-level restrictions for sensitive domains
  • Text Processing: Basic and advanced text processing with role-based restrictions

How It Works

The core authentication and authorization flow:

  1. User sends request with x-api-key header
  2. permitAuth middleware intercepts the request
  3. Middleware maps API key to user role (free_user, pro_user, or admin)
  4. User is synced to Permit.io
  5. Permission check runs against Permit.io cloud PDP
  6. Request is allowed or denied based on policy decision
┌──────────┐    ┌───────────────┐    ┌────────────┐    ┌──────────────┐
│  Client  │───▶│ Scrapebase API│───▶│permitAuth  │───▶│  Permit.io   │
│          │◀───│               │◀───│ middleware │◀───│  Cloud PDP   │
└──────────┘    └───────────────┘    └────────────┘    └──────────────┘
     │                                                        ▲
     │                                                        │
     └────────────────────────────────────────────────────────┘
       Permission policies defined in Permit.io dashboard
Enter fullscreen mode Exit fullscreen mode

Demo

Scrapebase Demo

You can test the API using the following endpoints:

# Test with free user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_free" \
  -d '{"url": "https://example.com"}'

# Test with admin user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_admin" \
  -d '{"url": "https://example.com", "advanced": true}'
Enter fullscreen mode Exit fullscreen mode

Project Repo

Scrapebase with Permit.io Authorization

A powerful web scraping API with fine-grained authorization controls powered by Permit.io. This project demonstrates how to implement sophisticated authorization patterns in a real-world API service Demo- https://scrapebase-permit.up.railway.app/

Features

  • Tiered Access Control: Different permissions for Free, Pro, and Admin users
  • Resource-Based Authorization: Control access based on target domains
  • Rate Limiting: Tier-specific rate limits enforced through policies
  • Advanced Scraping Features: Premium capabilities restricted to Pro users
  • Real-time Policy Updates: Changes to permissions take effect immediately
  • Audit Logging: Track all authorization decisions

Quick Start

  1. Clone the repository:
git clone https://github.com/0xtamizh/scrapebase-permit-IO
cd scrapebase-permit-IO
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
npm install
Enter fullscreen mode Exit fullscreen mode
  1. Set up environment variables:
cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Edit .env with your Permit.io API key and other configurations:

PERMIT_API_KEY=your_permit_api_key
ADMIN_API_KEY=2025DEVChallenge_admin
USER_API_KEY=2025DEVChallenge_user
  1. Start the development server:
npm run dev
Enter fullscreen mode Exit fullscreen mode
  1. Visit http://localhost:3000 to access the testing UI

Testing the Authorization Features

Test Credentials

Admin User:

  • Username: admin
  • API Key…

My Journey

The Problem with Traditional Authorization

Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. When I started this project, I wanted to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.

I chose to build a web scraping service because it presents meaningful access control requirements:

  1. Tiered service levels that mirror real-world SaaS subscription models
  2. Administrative functions that require elevated permissions
  3. Resource-based restrictions through a domain blacklist system

The Power of API-First Authorization

The key insight that drove this project was the separation of concerns: business logic should be distinct from authorization decisions. By using Permit.io, I was able to:

  1. Define all permission policies in one place
  2. Enforce consistent access control across all endpoints
  3. Update policies without changing application code

The implementation was straightforward - here's the core middleware that powers the authorization flow:

// Map API key to user role
switch (apiKey) {
  case process.env.ADMIN_API_KEY:
    userKey = '2025DEVChallenge_admin';
    tier = 'admin';
    break;
  // ...other keys
}

// Sync user to Permit.io
await permit.api.syncUser({
  key: userKey,
  email: `${userKey}@scrapebase.xyz`,
  attributes: { tier, roles: [tier] }
});

// Check permission
const action = req.body.advanced ? 'scrape_advanced' : 'scrape_basic';
const permissionCheck = await permit.check(user.key, action, 'website');

if (!permissionCheck) {
  return res.status(403).json({
    success: false,
    error: 'Access denied by Permit.io'
  });
}
Enter fullscreen mode Exit fullscreen mode

Challenges Faced

Cloud PDP Limitations

Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:

// This DIDN'T work with cloud PDP
const resource = {
  type: 'website',
  key: hostname,
  attributes: {
    is_blacklisted: isBlacklistedDomain
  }
};

const permissionCheck = await permit.check(user.key, action, resource);
Enter fullscreen mode Exit fullscreen mode

The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:

// This works with cloud PDP
const permissionCheck = await permit.check(user.key, action, resourceType);
Enter fullscreen mode Exit fullscreen mode

Role Assignment

Another challenge was ensuring roles were properly synchronized and recognized. The solution was two-fold:

  1. Properly sync users with their role information
  2. Manually configure role permissions in the Permit.io dashboard

Using Permit.io for Authorization

Setting up Permit.io involved these key steps:

  1. Creating a project in the Permit.io dashboard
  2. Defining resources (website), actions (scrape_basic, scrape_advanced), and roles (free_user, pro_user, admin)
  3. Configuring the permission matrix in the dashboard
  4. Integrating the Permit.io SDK into my application

Here's the role-based capability matrix I implemented:

Feature Free User Pro User Admin
Basic Scraping
Advanced Scraping
Text Cleaning
AI Summarization
View Blacklist
Manage Blacklist
Access Blacklisted Domains

Permission Enforcement

Permissions are enforced in two places:

  1. The permitAuth middleware for API endpoints:
   const permissionCheck = await permit.check(user.key, action, 'website');
   if (!permissionCheck) {
     return res.status(403).json({ success: false, error: 'Access denied' });
   }
Enter fullscreen mode Exit fullscreen mode
  1. Directly in route handlers for specific features:
   // src/routes/summarize.ts
   if (summarize) {
     const userTier = req.user?.attributes?.tier;
     if (userTier !== 'pro_user' && userTier !== 'admin') {
       return res.status(403).json({
         success: false,
         error: 'Access denied',
         details: 'Text summarization is only available for Pro and Admin users'
       });
     }
   }
Enter fullscreen mode Exit fullscreen mode

What I Learned

Building Scrapebase with Permit.io taught me how to:

  1. Separate authorization concerns from business logic
  2. Implement role-based access control with external policy management
  3. Design a flexible permission system that doesn't require code changes to update policies

The advantages of this approach are clear:

  1. Separation of concerns: Business logic remains focused on core functionality while authorization is handled externally
  2. Adaptable policies: Permissions can be updated without code changes or redeployments
  3. Consistent enforcement: Authorization decisions follow the same rules across all application endpoints
  4. Improved security: Centralized policy management reduces the risk of inconsistent permission checks
  5. Developer experience: Cleaner codebase with reduced authorization-related complexity

This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.

Future Improvements

With more time, I would:

  1. Set up a local PDP to enable ABAC with resource attributes
  2. Implement tenant isolation for multi-tenant support
  3. Add UI components in the admin dashboard to view permission audit logs
  4. Create more granular roles and permissions beyond the three tiers
  5. Add a user management section to assign roles through the UI

Scrapebase demonstrates how modern SaaS apps can delegate complex authorization to a specialized service like Permit.io, allowing developers to focus on core features while maintaining robust access controls.

Tiugo image

Modular, Fast, and Built for Developers

CKEditor 5 gives you full control over your editing experience. A modular architecture means you get high performance, fewer re-renders and a setup that scales with your needs.

Start now

Top comments (1)

Collapse
 
inatom_labs_6568f3125f77e profile image
inAtom Labs

This is awesome! Really well done 👏

Tiger Data image

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

Read more

👋 Kindness is contagious

Explore this insightful write-up, celebrated by our thriving DEV Community. Developers everywhere are invited to contribute and elevate our shared expertise.

A simple "thank you" can brighten someone’s day—leave your appreciation in the comments!

On DEV, knowledge-sharing fuels our progress and strengthens our community ties. Found this useful? A quick thank you to the author makes all the difference.

Okay