How I Served 80,000+ Recommendations in Under 50ms

Mayank Parashar — Wed, 15 Apr 2026 04:53:56 +0000

Every recommendation tutorial I found was either a Netflix black box or a 1,000-row Jupyter notebook toy. I wanted something in between — real, deployable, and something I actually understood.
That's how Inkpick was born: a hybrid recommendation engine across cinema, music, and courses with sub-50ms inference on 80,000+ items. Just NumPy, FastAPI, and deliberate design choices.

What "Hybrid" Means

Content-Based Filtering — works on day one, no user history needed. But it traps users in a bubble.
Collaborative Filtering — discovers surprising cross-user patterns. Falls apart for new users (cold-start problem).

A hybrid blends both:

score_hybrid(i) = α · score_cb(i) + (1 - α) · score_cf(i)

Inkpick defaults α = 0.65 — content-biased for cold-start users, shifting toward collaborative as history grows.

plaintext
The Architecture
Client (Vanilla JS)
       │
  FastAPI (Async)
  ┌────┴────┬──────────┐
TF-IDF   Latent    Levenshtein
+ CSR    Factor    Fuzzy Search
  └────┬────┘
  Hybrid Layer
       │
 Service Registry
(cinema / audio / edu)

Each domain is fully decoupled. Adding a new domain = one new service file.

Content-Based: TF-IDF + Cosine Similarity
TF-IDF turns item metadata (title, genre, tags) into vectors. Words unique to one item = high weight. Common words like "the" = penalized.

Similarity between items is then a dot product:

similarity(q, i) = (q · i) / (‖q‖ · ‖i‖)

Why not SciPy? Inkpick implements CSR (Compressed Sparse Row) ops directly in NumPy — cutting a ~30MB dependency, reducing memory, and keeping full control over the pipeline. An 80,000-item matrix is ~98% zeros; CSR stores only non-zero values.

Collaborative Filtering: Latent Factors
CF decomposes the user–item interaction matrix into lower-dimensional embeddings:

R ≈ U × Vᵀ

These latent dimensions learn hidden patterns — "likes slow-burn thrillers" — without being told. In Inkpick, this module is a production-ready stub awaiting a trained ALS/BPR model. Honest limitation, next on the roadmap.

Fuzzy Search Fallback
Search "Godfater" → no match → system fails. Not ideal.
Inkpick uses Levenshtein edit-distance as a safety net:

"Godfater" → "Godfather" = 1 edit

When exact search fails, fuzzy kicks in and returns the closest matches. Small addition, big UX improvement.

The API
GET /recommend/cinema?item_id=tt0111161&top_k=5&mode=hybrid
json{
  "domain": "cinema",
  "results": [{ "title": "The Godfather", "score": 0.94 }],
  "latency_ms": 38
}

The mode param accepts content, collaborative, or hybrid — handy for debugging.

What I'd Fix in v2

Train the CF model — ALS or BPR. The hybrid is only as good as both components.
SBERT over TF-IDF — semantic similarity that keyword matching completely misses.

Add evaluation metrics — Precision@K, NDCG. Fast latency is measurable; recommendation quality currently isn't.

Dynamic α — learn the blend weight per user instead of hardcoding 0.65.

Diversity control — MMR to avoid returning "10 Batman movies."

Try It

live : inkpick.vercel.app
github : github.com/MayankParashar28/inkpick

Drop a comment if you're building something similar — would love to exchange notes.

Lumina — Where Blogs Meet AI

Mayank Parashar — Mon, 12 Jan 2026 16:39:26 +0000

GitHub Repository: https://github.com/MayankParashar28/Lumina
Live Demo: https://lluminaa.vercel.app
Developer: Mahir (Mayank Parashar)

Project Overview

Lumina is a full-stack intelligent blogging platform that integrates Artificial Intelligence and semantic search to enhance how users create, discover, and interact with content. The platform transforms traditional blogging into a context-aware knowledge ecosystem using vector embeddings and AI-powered analysis.

This project demonstrates strong capabilities in modern web development, backend architecture, API integrations, and scalable system design. Lumina was built with production readiness, performance optimization, and maintainability in mind.

Key Features

AI-Powered Semantic Search

Context-aware search powered by Google Gemini embeddings
Cosine similarity matching for accurate content retrieval
Smart recommendations based on semantic relevance

Intelligent Content Processing

Automated content summarization
Related article discovery using embeddings
Improved search ranking and relevance

Real-Time Experience

Live notifications using Socket.IO
Infinite scrolling feed for optimized UX
Responsive layout across devices

Security and Access Control

Secure authentication using OAuth and password hashing
Role-based access control
Admin moderation dashboard

Technology Stack

Layer	Technologies
Frontend	EJS Templates, HTML, CSS
Backend	Node.js, Express.js
Database	MongoDB Atlas
AI Services	Google Gemini API
Real-Time	Socket.IO
Deployment	Vercel, Docker
Version Control	GitHub

Architecture Overview

Lumina follows a modular and scalable architecture:

Client Layer: User interface rendering and user interaction
Application Layer: Routing, middleware, authentication
Business Logic Layer: Blog services, AI processing, search engine
External Services: Gemini API, MongoDB Atlas, CDN

This separation of concerns improves maintainability, scalability, and development velocity.

Engineering Highlights

Implemented semantic search pipelines using vector embeddings
Designed secure authentication and authorization workflows
Built real-time communication channels using WebSockets
Optimized backend performance with modular service architecture
Containerized the application for scalable deployments

Learning Outcomes

Practical experience integrating AI APIs into production systems
Full-stack development with real-world scalability considerations
Secure system design and data handling
Performance optimization and system modularization
Deployment automation and cloud hosting workflows

Repository and Demo

Source Code: https://github.com/MayankParashar28/Lumina
Live Application: https://lluminaa.vercel.app

Career Objective

I am actively seeking internship opportunities in Software Engineering, Full Stack Development, and AI-driven applications where I can contribute to impactful products while continuing to strengthen my engineering skills.

Forem: Mayank Parashar