Cameron Archer for Tinybird

Posted on May 13

Build a Real-Time Product Recommendation Engine with Tinybird

#analytics #database #api #tutorial

Creating a product recommendation engine that operates in real-time can significantly enhance user experience by providing personalized suggestions based on their interactions. This tutorial will guide you through building such an API using Tinybird. Tinybird is a data analytics backend for software developers. You use Tinybird to build real-time analytics APIs without needing to set up or manage the underlying infrastructure. Tinybird offers a local-first development workflow, git-based deployments, resource definitions as code, and features for AI-native developers. By leveraging Tinybird's data sources and pipes, you can implement a recommendation engine that responds dynamically to user behavior. This solution captures user interactions with products—like views, clicks, and purchases—and uses this data to generate personalized product recommendations. We will start by understanding how to model and ingest this interaction data into Tinybird. Then, we'll transform this data and publish an API endpoint that serves real-time recommendations. Finally, we'll cover how to deploy this solution to production with Tinybird.

Understanding the data

Imagine your data looks like this:

{"user_id": "user_413", "product_id": "prod_413", "interaction_type": "favorite", "timestamp": "2025-04-26 12:43:38", "session_id": "session_413", "value": 266338041300}
{"user_id": "user_5", "product_id": "prod_5", "interaction_type": "view", "timestamp": "2025-05-02 19:07:06", "session_id": "session_2005", "value": 52184700500}
... ```
{% endraw %}

This sample data from {% raw %}`user_interactions.ndjson`{% endraw %} represents various user interactions with products, capturing the type of interaction, when it occurred, and other related details. To store this data in Tinybird, we create a data source with the following schema:
{% raw %}

```json
DESCRIPTION >
    Stores user interactions with products such as views, clicks, purchases

SCHEMA >
    `user_id` String `json:$.user_id`,
    `product_id` String `json:$.product_id`,
    `interaction_type` String `json:$.interaction_type`,
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `value` Float32 `json:$.value`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "user_id, product_id, timestamp"

This schema is designed to efficiently query interactions by user and product, with a focus on performance. Sorting keys are chosen to optimize query speed for common access patterns, such as retrieving all interactions by a specific user. For data ingestion, Tinybird's Events API allows you to stream JSON/NDJSON events from your application frontend or backend with a simple HTTP request, ensuring low-latency, real-time data availability. Here's how you can ingest data:

curl -X POST "https://api.europe-west2.gcp.tinybird.co/v0/events?name=user_interactions&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV" \
     -H "Authorization: Bearer $TB_ADMIN_TOKEN" \
     -d '{
         "user_id": "user123",
         "product_id": "prod456",
         "interaction_type": "view",
         "timestamp": "2023-05-22 10:30:45",
         "session_id": "sess789",
         "value": 1.0
     }'

Beyond the Events API, you might consider the Kafka connector for streaming data or the Data Sources API and S3 connector for batch data ingestion.

Transforming data and publishing APIs

Tinybird transforms data and publishes APIs through pipes. Pipes can perform batch transformations, act as Materialized views, and ultimately serve as the backbone for API endpoints. For our recommendation engine, we have the following endpoint in user_based_recommendations.pipe:

DESCRIPTION >
    Generates product recommendations based on user interactions

NODE user_based_recommendations_node
SQL >
    SELECT 
        product_id,
        count() AS popularity_score,
        arrayStringConcat(groupArray(DISTINCT user_id), ',') AS users_who_interacted
    FROM user_interactions
    WHERE {% if defined(user_id) %}user_id != {{String(user_id, '')}} AND{% end %}
          interaction_type IN ('purchase', 'view', 'click')
    GROUP BY product_id
    ORDER BY popularity_score DESC
    LIMIT {{Int32(limit, 10)}}

TYPE endpoint

This pipe aggregates user interactions to calculate popularity scores for products, optionally excluding products already interacted with by a specific user. Query parameters make this API flexible, allowing consumers to specify a user ID and a limit for the number of recommendations. Example API calls:

# General recommendations
curl -X GET "https://api.europe-west2.gcp.tinybird.co/v0/pipes/user_based_recommendations.json?token=%24TB_ADMIN_TOKEN&limit=5&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV"


# Recommendations excluding specific user interactions
curl -X GET "https://api.europe-west2.gcp.tinybird.co/v0/pipes/user_based_recommendations.json?token=%24TB_ADMIN_TOKEN&user_id=user123&limit=10&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV"

Deploying to production

Deploy your project to Tinybird Cloud with tb --cloud deploy. This command creates production-ready, scalable API Endpoints with minimal effort. Tinybird manages resources as code, facilitating integration with CI/CD pipelines and ensuring a seamless development-to-production workflow. For security, Tinybird uses token-based authentication to protect your endpoints. Here's an example of how to call your deployed endpoint:

curl "https://api.tinybird.co/v0/pipes/user_based_recommendations.json?token=YOUR_READ_TOKEN&limit=10&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV"

Conclusion

In this tutorial, we've built a real-time product recommendation engine by leveraging Tinybird's data sources and pipes. This solution streams user interaction data, processes it to identify popular and relevant products, and serves personalized recommendations through a REST API. Tinybird simplifies the complex data engineering tasks, allowing you to focus on creating value from your data. Sign up for Tinybird to build and deploy your first real-time data APIs in a few minutes.

OpenFeature Multi-Provider: Enabling New Feature Flagging Use-Cases

DevCycle is the first feature management platform with OpenFeature built in. We pair the reliability, scalability, and security of a managed service with freedom from vendor lock-in, helping developers ship faster with true OpenFeature-native feature flagging.

Watch Full Video 🎥

Top comments (0)

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

DEV Community