Forem: SNEHASISH DUTTA

From Reddit Trolls to Real-Time Analytics: Building an LLM-Powered Flink Deployment System

SNEHASISH DUTTA — Thu, 29 May 2025 20:51:30 +0000

The Origin Story: When Reddit Roasts Spark Innovation

Picture this: You're a data engineer scrolling through Reddit, genuinely asking about emerging AI trends to stay ahead of the curve. You post a thoughtful question about what new technologies you should learn, expecting insights about MLOps, vector databases, or maybe the latest streaming frameworks.

Instead, you get: "You still have time to sell yourself on OnlyFans."

Most people would roll their eyes and move on. But sometimes, the most ridiculous comments spark the most interesting ideas. What if we took that sarcastic comment and turned it into a legitimate technical challenge? What if we built a sophisticated real-time data processing system that could handle the scale and complexity of a content platform, complete with an AI-powered deployment interface?

That's exactly what happened here, and the result is a fascinating exploration of modern data engineering architecture that combines LLM-powered DevOps automation with Apache Flink streaming processing.

The Technical Vision: Beyond the Meme

What started as a Reddit joke evolved into a comprehensive demonstration of cutting-edge data engineering patterns:

Natural Language DevOps: Using OpenAI GPT-4 to parse deployment commands and automatically provision Apache Flink jobs
Real-Time Stream Processing: Apache Flink jobs processing events with sub-second latency
Modern Data Lakehouse: Apache Iceberg tables providing ACID transactions and schema evolution
Event-Driven Architecture: Kafka-based event streaming with automatic scaling
Infrastructure as Code: Complete Docker Compose orchestration for reproducible deployments

The system architecture demonstrates enterprise-grade patterns while maintaining the flexibility to experiment with emerging technologies.

System Architecture: Three Pillars of Modern Data Processing

1. Event Publisher: The Data Generator

┌─────────────────┐    ┌──────────────┐    ┌─────────────────┐
│  Go Publisher   │───▶│   Redpanda   │───▶│  Event Topics   │
│                 │    │   (Kafka)    │    │                 │
│ • GPU Temp Sim  │    │              │    │ • content       │
│ • Configurable  │    │ • Multi-node │    │ • creator       │
│ • Docker Ready  │    │ • Web UI     │    │ • temperature   │
└─────────────────┘    └──────────────┘    └─────────────────┘

The first component simulates realistic event streams. While themed around content platforms, it's actually generating GPU temperature data - a perfect proxy for any time-series monitoring system. The publisher includes:

Smart Simulation Features:

Configurable anomaly injection (5% abnormal readings by default)
Multiple device simulation (scalable from 1 to N devices)
Adjustable publishing intervals (milliseconds to minutes)
Built-in Docker orchestration

Production-Ready Architecture:

type TemperatureReading struct {
    DeviceID    string    `json:"device_id"`
    Temperature float64   `json:"temperature"`
    IsAbnormal  bool      `json:"is_abnormal"`
    Timestamp   time.Time `json:"timestamp"`
}

The publisher demonstrates real-world patterns for event generation, including proper error handling, graceful shutdowns, and configurable parameters through environment variables.

2. LLM-Powered Deployment Service: The AI Operations Layer

This is where things get interesting. Instead of traditional deployment scripts or complex CI/CD pipelines, the system uses OpenAI GPT-4 to interpret natural language commands and automatically deploy Apache Flink jobs.

┌─────────────────┐    ┌──────────────┐    ┌─────────────────┐
│   Chat Input    │───▶│  OpenAI GPT  │───▶│  Flink Jobs     │
│                 │    │              │    │                 │
│ "deploy content │    │ • Parse NL   │    │ • Auto Deploy   │
│  event".        │    │ • Validate   │    │ • Docker/CLI    │
│                 │    │ • Generate   │    │ • Monitoring    │
└─────────────────┘    └──────────────┘    └─────────────────┘

Natural Language Processing Examples:

"deploy content event processor" → Launches content stream processing job
"I need creator analytics running" → Deploys creator event processor
"start processing video events" → Spins up video content pipeline

Dual Deployment Strategies:

Docker-Based Deployment:

func (d *DockerClient) deployFlinkJob(eventType string) (*JobInfo, error) {
    containerName := fmt.Sprintf("flink-%s-processor-%s", 
        eventType, time.Now().Format("20060102-150405"))

    // Create container with automatic port assignment
    config := &container.Config{
        Image: "flink-event-processor:latest",
        ExposedPorts: nat.PortSet{"8081/tcp": struct{}{}},
    }

    return d.createAndStartContainer(containerName, config)
}

CLI-Based Deployment:

func (f *FlinkClient) submitJob(eventType string) (*JobInfo, error) {
    cmd := exec.Command("flink", "run",
        "--jobmanager", f.config.JobManagerAddress,
        "--class", "com.eventprocessor.FlinkStreamingJob",
        f.config.JarPath,
        "--event-type", eventType)

    return f.executeWithMonitoring(cmd)
}

The service provides intelligent error handling, automatic retry logic, and comprehensive monitoring integration.

3. Flink Event Processor: The Stream Processing Engine

The heart of the system is a sophisticated Apache Flink application that processes multiple event types in real-time. This isn't a toy example - it's a production-ready streaming application with proper error handling, exactly-once processing guarantees, and multiple sink strategies.

┌─────────────────┐    ┌──────────────┐    ┌─────────────────┐
│  Kafka Source   │───▶│ Flink Stream │───▶│ Iceberg Tables  │
│                 │    │              │    │                 │
│ • Content Events│    │ • Transform  │    │ • ACID Trans    │
│ • Creator Events│    │ • Validate   │    │ • Time Travel   │
│ • Temp Events   │    │ • Enrich     │    │ • Schema Evolve │
└─────────────────┘    └──────────────┘    └─────────────────┘

Event Processing Architecture:

public class FlinkStreamingJob {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment
            .getExecutionEnvironment();

        // Configure for production
        env.enableCheckpointing(30000);  // 30-second checkpoints
        env.getCheckpointConfig()
           .setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

        // Create event-specific processors
        EventProcessorFactory factory = new EventProcessorFactory();
        BaseEventProcessor processor = factory.createProcessor(eventType);

        // Execute streaming pipeline
        processor.buildPipeline(env).execute();
    }
}

Multi-Event Support:

The system processes different event types with specialized handling:

// Content Events
public class ContentEvent {
    private String id;
    private String creatorId;
    private String title;
    private String contentType;
    private BigDecimal price;
    private Long viewCount;
    private Boolean isLocked;
    private List<String> tags;
    // ... additional fields
}

// Creator Events  
public class CreatorEvent {
    private String id;
    private String username;
    private String displayName;
    private Boolean isVerified;
    private Long subscriberCount;
    private BigDecimal monthlyPrice;
    private String category;
    // ... additional fields
}

Advanced Storage Integration:

The system supports multiple storage strategies, from simple file sinks to full Apache Iceberg integration:

public class IcebergTableManager {
    public void createTable(String tableName, Schema schema) {
        Table table = catalog.buildTable(TableIdentifier.of("default", tableName))
            .withSchema(schema)
            .withPartitionSpec(PartitionSpec.builderFor(schema)
                .day("created_at")
                .build())
            .withProperty(TableProperties.FORMAT_VERSION, "2")
            .create();
    }

    public DataStream<Row> createIcebergSink(DataStream<T> stream) {
        return stream.sinkTo(
            IcebergSinks.forRow(table, TableSchema.fromTypeInfo(typeInfo))
                .build());
    }
}

Key Technical Innovations

1. LLM-Powered Infrastructure Automation

The most fascinating aspect of this system is the natural language interface for infrastructure deployment. Instead of remembering complex CLI commands or navigating web UIs, operators can use conversational language:

User: "deploy creator event"
System: 🤖 Processing: deploy creator event 
        📋 Parsed command: deploy creator event  
        🚀 Submitting creator processing job...
        ✅ Successfully submitted creator processor job: flink-creator-processor-20240529-143022
        🌐 Monitor at: http://localhost:8081

This demonstrates a powerful pattern for the future of DevOps: using LLMs to abstract away the complexity of infrastructure management while maintaining full control and visibility.

2. Hybrid Deployment Architecture

The system supports both containerized and traditional CLI-based deployments, providing flexibility for different operational environments:

Docker Deployment: Perfect for development, testing, and containerized production environments
CLI Deployment: Integrates with existing Flink clusters and traditional operational workflows

3. Modern Data Lakehouse Patterns

The Apache Iceberg integration showcases modern data lakehouse architecture:

ACID Transactions: Ensuring data consistency even with concurrent writers
Schema Evolution: Adding new fields without breaking existing queries
Time Travel: Querying historical states of data
Partition Management: Automatic daily partitioning for optimal query performance

Performance and Scalability Considerations

The system is designed with production scalability in mind:

Flink Configuration:

// Optimized for throughput
env.setParallelism(4);
env.getConfig().setLatencyTrackingInterval(1000);

// Memory management
Configuration config = new Configuration();
config.setString("taskmanager.memory.process.size", "2g");
config.setString("jobmanager.memory.process.size", "1g");

Kafka Integration:

// High-throughput consumer configuration
Properties properties = new Properties();
properties.setProperty("fetch.min.bytes", "1048576");  // 1MB
properties.setProperty("fetch.max.wait.ms", "500");
properties.setProperty("max.partition.fetch.bytes", "10485760");  // 10MB

Monitoring and Observability:

The system includes comprehensive monitoring:

Flink Web UI for job monitoring and metrics
Structured logging with configurable levels
Docker container health checks
Kafka consumer lag monitoring

Real-World Applications

While the "OnlyFans" theming is obviously humorous, the underlying architecture patterns are applicable to numerous real-world scenarios:

Content Platforms:

Video streaming analytics
User engagement tracking
Content recommendation engines
Creator monetization systems

IoT and Monitoring:

Sensor data processing (the GPU temperature simulation)
Infrastructure monitoring
Anomaly detection systems
Predictive maintenance

Financial Services:

Transaction processing
Risk assessment
Fraud detection
Regulatory reporting

E-commerce:

User behavior analytics
Inventory management
Price optimization
Recommendation systems

Lessons Learned and Technical Insights

1. LLM Integration Complexity

Integrating LLMs into operational systems requires careful consideration of:

Error Handling: What happens when the LLM misinterprets a command?
Cost Management: OpenAI API costs can accumulate quickly in production
Latency: Adding an LLM call adds 1-3 seconds to deployment workflows
Security: Ensuring the LLM can't be tricked into executing malicious commands

2. Multi-Language Microservices

The combination of Go (for the LLM service) and Java (for Flink processing) demonstrates the power of polyglot architectures:

Go: Excellent for HTTP services, concurrent operations, and simple deployment
Java: Rich ecosystem for data processing, mature Flink integration, robust type systems

3. Stream Processing Design Patterns

The Flink application showcases several important patterns:

Factory Pattern: For creating event-specific processors
Strategy Pattern: For different sink implementations (files vs. Iceberg)
Builder Pattern: For configuring complex streaming pipelines

Future Enhancements and Roadmap

The system provides a solid foundation for several interesting extensions:

1. Advanced LLM Capabilities

Multi-step Deployments: "Deploy a content processing pipeline with anomaly detection"
Resource Optimization: LLM-driven resource allocation based on workload patterns
Troubleshooting Assistant: AI-powered diagnosis of failing jobs

2. Enhanced Stream Processing

Machine Learning Integration: Real-time feature engineering and model serving
Complex Event Processing: Pattern detection across multiple event streams
Auto-scaling: Dynamic parallelism adjustment based on throughput

3. Operational Excellence

GitOps Integration: Version control for deployment configurations
Multi-tenancy: Support for multiple teams and environments
Advanced Monitoring: Custom metrics and alerting integrations

Conclusion: From Meme to Modern Architecture

What started as a sarcastic Reddit comment evolved into a legitimate exploration of cutting-edge data engineering patterns. The system demonstrates several important trends in modern data infrastructure:

AI-Powered Operations: Using LLMs to simplify complex operational tasks
Event-Driven Architecture: Building resilient, scalable systems around event streams
Modern Data Lakehouse: Combining the flexibility of data lakes with the reliability of data warehouses
Polyglot Microservices: Choosing the right tool for each specific task

The technical implementation showcases production-ready patterns while maintaining the experimental spirit needed to explore emerging technologies. It proves that sometimes the best innovations come from the most unexpected inspirations.

Whether you're building content platforms, IoT systems, or financial services, the architectural patterns demonstrated here provide a solid foundation for modern real-time data processing systems. And if nothing else, it's a reminder that great engineering can emerge from the most unlikely sources - even Reddit trolls.

Technologies Used:

Apache Flink 1.17.1 (Stream Processing)
Apache Iceberg 1.3.1 (Data Lakehouse)
Apache Kafka / Redpanda (Event Streaming)
OpenAI GPT-4 (Natural Language Processing)
Go 1.21+ (LLM Service)
Java 11+ (Stream Processing)
Docker & Docker Compose (Orchestration)

Repository Links:

Built with ❤️ and a healthy sense of humor about Reddit comments.

"Our GPUs Are Melting": Building a Real-Time Streaming System with Go and Redpanda Ghibli Style

SNEHASISH DUTTA — Wed, 02 Apr 2025 19:55:00 +0000

Introduction

In the world of AI research and development, computational resources are pushed to their limits. This fact was humorously highlighted by OpenAI CEO Sam Altman's memorable tweet: "our GPUs are melting" - a testament to the intense workloads these specialized processors endure during advanced AI training.
While Altman's tweet was likely hyperbolic, the concern about GPU health is very real. Modern data centers and AI research facilities invest millions in GPU infrastructure, making temperature monitoring a critical operational concern. Overheating can lead to hardware damage, reduced lifespan, and even catastrophic failures that interrupt crucial workloads.
This article explores a practical implementation of a GPU temperature monitoring system built with Go (Golang). We'll demonstrate how to create a complete pipeline for:

Generating simulated GPU temperature data
Publishing these events to Redpanda (a Kafka API-compatible streaming platform)
Consuming these events in real-time
Triggering alerts when temperatures exceed critical thresholds

Go's combination of performance, concurrency model, and straightforward syntax makes it an excellent choice for building such systems.

The solution we'll explore is lightweight yet robust, capable of handling thousands of events per second while maintaining low latency - perfect for mission-critical monitoring applications.

Whether you're managing a small GPU cluster for research or a massive data center for production AI workloads, the patterns demonstrated here can be adapted to build a reliable early warning system that might just prevent your own GPUs from metaphorically (or literally) melting down.

Architecture Overview

The GPU Temperature Monitoring System employs an event-driven microservices architecture to track GPU temperatures in real-time and alert on potentially dangerous conditions. Built with Go, it leverages Redpanda (Kafka API-compatible) as the central event streaming platform.

Core Components

Temperature Publisher Service

Purpose: Simulates/collects GPU temperature readings and publishes to Redpanda

Key Features:

Configurable simulation of multiple GPU devices
Randomized temperature patterns with occasional anomalies
Efficient event serialization using JSON
Batched publishing for throughput optimization

Alert Consumer Service

Purpose: Monitors temperature stream and triggers alerts when thresholds are exceeded

Key Features:

Real-time consumption of temperature events
Threshold-based alerting logic with configurable thresholds
Cooldown mechanism to prevent alert storms
Integration with Slack for immediate notifications

Redpanda (Event Streaming Platform)

Purpose: Provides reliable, high-throughput messaging between services
Configuration:

Two-node cluster for redundancy
Topic-based event segregation
Persistent storage of temperature data

Data Flow

GPU temperature readings are generated/collected by the Publisher
Events are serialized to JSON and published to the gpu-temperature topic
Alert Consumer subscribes to the topic and processes each reading
When temperatures exceed thresholds, the Consumer formats and sends Slack alerts
All temperature data remains available in Redpanda for historical analysis

Technical Considerations

Scalability: Both services can be horizontally scaled to handle more GPUs
Resilience: Components reconnect automatically after network disruptions
Observability: Structured logging throughout the pipeline
Configuration: Environment-based configuration for deployment flexibility
Containerization: Docker-based deployment for consistent environments

This architecture demonstrates how Go's concurrency model and Redpanda's streaming capabilities can be combined to create an efficient, real-time monitoring system that helps prevent hardware damage from GPU overheating conditions.

Temperature Publisher Service :: Producer

Repository: https://github.com/snepar/gpu-temp-publisher

The Temperature Publisher Service forms the foundation of our GPU monitoring system. Written in Go, this service simulates multiple GPU devices and publishes temperature readings to Redpanda. Let's explore the core components with key code snippets.

Data Model

At the heart of our system is the temperature reading model:

type TemperatureReading struct {
    DeviceID     string    `json:"device_id"`
    Temperature  float64   `json:"temperature"`
    Timestamp    time.Time `json:"timestamp"`
    GPUModel     string    `json:"gpu_model"`
    HostName     string    `json:"host_name"`
    PowerConsume float64   `json:"power_consume"`
    GPUUtil      float64   `json:"gpu_util"`
}

This structure encapsulates all the essential data points for monitoring GPU health.

Temperature Simulation

The service generates realistic temperature patterns using a combination of baseline values, utilization-based adjustments, and random fluctuations:

func (s *TemperatureSimulator) generateDeviceReading(deviceIndex int, device GPUDevice) TemperatureReading {
    // Decide if this reading should be abnormal
    isAbnormal := s.rng.Float64() < s.abnormalProbability

    // Update GPU utilization with realistic movements
    targetUtil := s.utilization[deviceIndex]
    if s.rng.Float64() < 0.1 {
        // 10% chance of changing target utilization significantly
        targetUtil = s.rng.Float64() * 100
    }

    // Calculate temperature based on utilization and trend
    baseTemp := device.BaseTemp
    utilizationFactor := s.utilization[deviceIndex] / 100 * 40
    temperature := baseTemp + utilizationFactor + s.tempTrend[deviceIndex]

    // Apply abnormal spike if needed
    if isAbnormal {
        temperature += s.rng.Float64() * 100
    }

    return TemperatureReading{
        DeviceID:     device.ID,
        Temperature:  temperature,
        Timestamp:    time.Now(),
        GPUModel:     device.Model,
        HostName:     device.Host,
        PowerConsume: s.utilization[deviceIndex] * 3.5 + temperature/4,
        GPUUtil:      s.utilization[deviceIndex],
    }
}

This simulation logic creates realistic temperature patterns that mimic actual GPU behavior, including occasional temperature spikes.

Publishing to Redpanda

// NewRedpandaPublisher creates a new Redpanda publisher
func NewRedpandaPublisher(ctx context.Context, brokers, topic string) (*RedpandaPublisher, error) {
    // Create client options
    opts := []kgo.Opt{
        kgo.SeedBrokers(strings.Split(brokers, ",")...),
        kgo.DefaultProduceTopic(topic),
        kgo.AllowAutoTopicCreation(),
        kgo.ProducerBatchMaxBytes(1024 * 1024), // 1MB
        kgo.ProducerLinger(5 * time.Millisecond),
        kgo.RecordRetries(5),
    }

    // Create client with retry logic
    client, err := kgo.NewClient(opts...)
    if err != nil {
        return nil, fmt.Errorf("failed to create Redpanda client: %w", err)
    }

    return &RedpandaPublisher{
        client: client,
        topic:  topic,
    }, nil
}

The actual publishing of readings happens through a simple but effective method:

// PublishReading publishes a temperature reading to Redpanda
func (p *RedpandaPublisher) PublishReading(ctx context.Context, reading TemperatureReading) error {
    // Marshal reading to JSON
    data, err := json.Marshal(reading)
    if err != nil {
        return fmt.Errorf("failed to marshal reading: %w", err)
    }

    // Create record with metadata
    record := &kgo.Record{
        Key:   []byte(reading.DeviceID),
        Value: data,
        Topic: p.topic,
        Headers: []kgo.RecordHeader{
            {Key: "content-type", Value: []byte("application/json")},
        },
    }

    // Produce record with error handling
    result := p.client.ProduceSync(ctx, record)
    if err := result.FirstErr(); err != nil {
        return fmt.Errorf("failed to produce record: %w", err)
    }

    return nil
}

Main Service Loop

The service's main loop ties everything together:

func main() {
    // Load configuration
    cfg, err := config.Load()
    if err != nil {
        log.Fatalf("Failed to load configuration: %v", err)
    }

    // Create temperature simulator
    sim := simulator.NewTemperatureSimulator(cfg.NumDevices, cfg.AbnormalProbability)

    // Create publisher
    pub, err := publisher.NewRedpandaPublisher(ctx, cfg.RedpandaBrokers, cfg.Topic)
    if err != nil {
        log.Fatalf("Failed to create Redpanda publisher: %v", err)
    }
    defer pub.Close()

    // Set up ticker for regular publishing
    ticker := time.NewTicker(time.Duration(cfg.IntervalMs) * time.Millisecond)
    defer ticker.Stop()

    // Main loop
    for {
        select {
        case <-ticker.C:
            // Generate and publish temperature readings
            readings := sim.GenerateReadings()
            for _, reading := range readings {
                if err := pub.PublishReading(ctx, reading); err != nil {
                    log.Printf("Error publishing reading: %v", err)
                } else if reading.Temperature > 80.0 {
                    log.Printf("Published HIGH TEMP: Device %s - %.2f°C", 
                        reading.DeviceID, reading.Temperature)
                }
            }
        case <-sigChan:
            return // Graceful shutdown
        }
    }
}

Configuration via Environment Variables

The service is easily configurable through environment variables:

func Load() (*Config, error) {
    config := &Config{
        RedpandaBrokers:     getEnv("REDPANDA_BROKERS", "localhost:9092"),
        Topic:               getEnv("REDPANDA_TOPIC", "gpu-temperature"),
        NumDevices:          getEnvAsInt("NUM_DEVICES", 5),
        IntervalMs:          getEnvAsInt("INTERVAL_MS", 1000),
        AbnormalProbability: getEnvAsFloat("ABNORMAL_PROBABILITY", 0.05),
    }
    return config, nil
}

This approach makes the service highly configurable while maintaining sensible defaults.

Containerization

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /gpu-temp-publisher ./cmd/publisher

FROM alpine:3.18
COPY --from=builder /gpu-temp-publisher .
ENTRYPOINT ["/gpu-temp-publisher"]

Alert Generator Service :: Consumer

Repository: https://github.com/snepar/gpu-temp-alert

The Temperature Alert Service is the vigilant guardian in our GPU monitoring system. This service consumes temperature readings from Redpanda, analyzes them in real-time, and triggers alerts when temperatures reach dangerous levels. Let's explore how this service works with key code snippets.

Consumer Implementation

The core of the alert service is the Redpanda consumer that processes the stream of temperature readings:

// RedpandaConsumer consumes temperature readings from Redpanda
type RedpandaConsumer struct {
    client               *kgo.Client
    topic                string
    temperatureThreshold float64
    notifier             *notifier.SlackNotifier
    seenAlerts           map[string]time.Time
    seenAlertsMutex      sync.Mutex
    alertCooldown        time.Duration
}

This structure maintains the Redpanda client connection, configuration values, and a map to track recent alerts to prevent alert storms.

Connecting to Redpanda

The consumer establishes a connection to Redpanda with careful error handling and retry logic:

func NewRedpandaConsumer(ctx context.Context, brokers, topic, consumerGroup string, 
                        temperatureThreshold float64, notifier *notifier.SlackNotifier) (*RedpandaConsumer, error) {

    // Create Redpanda client options
    opts := []kgo.Opt{
        kgo.SeedBrokers(strings.Split(brokers, ",")...),
        kgo.ConsumerGroup(consumerGroup),
        kgo.ConsumeTopics(topic),
        kgo.ConsumeResetOffset(kgo.NewOffset().AtEnd()), // Start from newest messages
        kgo.DisableAutoCommit(),                         // Manual commit for better control
        kgo.SessionTimeout(30 * time.Second),
    }

    // Create client with retry logic
    client, err := kgo.NewClient(opts...)
    if err != nil {
        return nil, fmt.Errorf("failed to create Redpanda client: %w", err)
    }

    return &RedpandaConsumer{
        client:               client,
        topic:                topic,
        temperatureThreshold: temperatureThreshold,
        notifier:             notifier,
        seenAlerts:           make(map[string]time.Time),
        alertCooldown:        5 * time.Minute, // Prevent alert storms
    }, nil
}

Processing Temperature Readings

The service continuously polls for new temperature readings and processes them:

func (c *RedpandaConsumer) Start(ctx context.Context) error {
    log.Printf("Starting to consume from topic %s", c.topic)
    log.Printf("Monitoring for temperatures above %.2f°C", c.temperatureThreshold)

    for {
        select {
        case <-ctx.Done():
            return nil // Graceful shutdown

        default:
            // Poll for messages
            fetches := c.client.PollFetches(ctx)
            if fetches.IsClientClosed() {
                return fmt.Errorf("client closed")
            }

            // Process each record
            fetches.EachRecord(func(record *kgo.Record) {
                c.processRecord(ctx, record)
            })

            // Commit offsets
            if err := c.client.CommitUncommittedOffsets(ctx); err != nil {
                log.Printf("Error committing offsets: %v", err)
            }
        }
    }
}

Temperature Threshold Detection

The core logic for detecting high temperatures and triggering alerts:

func (c *RedpandaConsumer) processRecord(ctx context.Context, record *kgo.Record) {
    // Parse the temperature reading
    var reading model.TemperatureReading
    if err := json.Unmarshal(record.Value, &reading); err != nil {
        log.Printf("Error parsing record: %v", err)
        return
    }

    // Check if temperature exceeds threshold
    if reading.Temperature > c.temperatureThreshold {
        // Check if we've already alerted for this device recently
        if c.shouldSendAlert(reading.DeviceID) {
            log.Printf("HIGH TEMPERATURE ALERT: Device %s - %.2f°C exceeds threshold of %.2f°C",
                reading.DeviceID, reading.Temperature, c.temperatureThreshold)

            // Create alert event
            alert := &model.AlertEvent{
                DeviceID:            reading.DeviceID,
                Temperature:         reading.Temperature,
                Timestamp:           reading.Timestamp,
                TemperatureThreshold: c.temperatureThreshold,
                GPUModel:            reading.GPUModel,
                HostName:            reading.HostName,
                PowerConsume:        reading.PowerConsume,
                GPUUtil:             reading.GPUUtil,
            }

            // Send notification to Slack
            if err := c.notifier.SendTemperatureAlert(ctx, alert); err != nil {
                log.Printf("Failed to send alert to Slack: %v", err)
            }
        }
    }
}

Alert Cooldown Mechanism

To prevent alert storms, the service implements a cooldown period for each device:

func (c *RedpandaConsumer) shouldSendAlert(deviceID string) bool {
    c.seenAlertsMutex.Lock()
    defer c.seenAlertsMutex.Unlock()

    now := time.Now()
    if lastAlerted, exists := c.seenAlerts[deviceID]; exists {
        // If last alert was less than cooldown period ago, don't alert
        if now.Sub(lastAlerted) < c.alertCooldown {
            return false
        }
    }

    // Update last alerted time
    c.seenAlerts[deviceID] = now
    return true
}

This prevents notification fatigue by limiting alerts for the same device to a reasonable frequency.

Slack Integration

The service integrates with Slack to deliver immediate alerts when GPUs are overheating:

func (s *SlackNotifier) SendTemperatureAlert(ctx context.Context, alert *model.AlertEvent) error {
    // Determine color based on severity
    color := "#FF0000" // Default to red
    if alert.Temperature < alert.TemperatureThreshold+20 {
        color = "#FFA500" // Orange for less severe
    }

    // Create attachment
    attachment := Attachment{
        Fallback:  fmt.Sprintf("GPU IS MELTING: %.2f°C on %s", alert.Temperature, alert.DeviceID),
        Color:     color,
        Title:     "🔥 GPU IS MELTING 🔥",
        Text:      fmt.Sprintf("Temperature of *%.2f°C* detected, exceeding threshold of *%.2f°C*", 
                   alert.Temperature, alert.TemperatureThreshold),
        Timestamp: alert.Timestamp.Unix(),
        Fields: []Field{
            {
                Title: "Device ID",
                Value: alert.DeviceID,
                Short: true,
            },
            {
                Title: "Temperature",
                Value: fmt.Sprintf("%.2f°C", alert.Temperature),
                Short: true,
            },
            // Additional fields omitted for brevity
        },
    }

    slackMsg := SlackMessage{
        Channel:     s.channel,
        Username:    "GPU Temperature Monitor",
        IconEmoji:   ":fire:",
        Attachments: []Attachment{attachment},
    }

    return s.sendMessage(ctx, slackMsg)
}

This creates visually striking alerts that draw immediate attention to overheating GPUs.

Main Service Initialization

The service's main function handles setup and graceful shutdown:

func main() {
    // Load configuration
    cfg, err := config.Load()
    if err != nil {
        log.Fatalf("Failed to load configuration: %v", err)
    }

    // Create context that can be cancelled
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    // Create Slack notifier
    slack, err := notifier.NewSlackNotifier(cfg.SlackWebhookURL, cfg.SlackChannel)
    if err != nil {
        log.Fatalf("Failed to create Slack notifier: %v", err)
    }

    // Create consumer
    cons, err := consumer.NewRedpandaConsumer(
        ctx, 
        cfg.RedpandaBrokers, 
        cfg.SourceTopic, 
        cfg.ConsumerGroup,
        cfg.TemperatureThreshold,
        slack,
    )
    if err != nil {
        log.Fatalf("Failed to create Redpanda consumer: %v", err)
    }
    defer cons.Close()

    // Start consuming in a separate goroutine
    go func() {
        if err := cons.Start(ctx); err != nil {
            log.Printf("Consumer error: %v, shutting down", err)
            cancel()
        }
    }()

    // Wait for shutdown signal
    <-sigChan
    log.Println("Shutting down gracefully")
}

Environment-Based Configuration

Like the publisher service, the alert service uses environment variables for configuration:

func Load() (*Config, error) {
    config := &Config{
        RedpandaBrokers:      getEnv("REDPANDA_BROKERS", "localhost:9092"),
        SourceTopic:          getEnv("SOURCE_TOPIC", "gpu-temperature"),
        ConsumerGroup:        getEnv("CONSUMER_GROUP", "gpu-temp-alert-group"),
        TemperatureThreshold: getEnvAsFloat("TEMPERATURE_THRESHOLD", 190.0),
        SlackWebhookURL:      getEnv("SLACK_WEBHOOK_URL", ""),
        SlackChannel:         getEnv("SLACK_CHANNEL", "#alerts"),
    }

    // Validation omitted for brevity
    return config, nil
}

The Temperature Alert Service exemplifies how Go's strong concurrency model and straightforward error handling make it ideal for building reliable monitoring systems. By consuming temperature data from Redpanda and triggering immediate alerts when thresholds are exceeded, this service provides the critical "last mile" in preventing GPU damage.
The combination of efficient stream processing, intelligent alert suppression, and actionable notifications creates a monitoring solution that can help prevent the scenario that Sam Altman humorously tweeted about - truly melting GPUs.

Implementation Guide

This guide will walk you through setting up and running both services of our GPU temperature monitoring system. Follow these steps to deploy the publisher for generating temperature readings and the alert service for monitoring critical temperatures.

Prerequisites

Docker and Docker Compose installed
A Slack workspace with webhook URL configured

Build and Run the Publisher

docker-compose up -d

This will start the Redpanda cluster and the publisher service.

Monitor the Redpanda Console

Open your browser and navigate to http://localhost:8080 to access the Redpanda Console.

Verify Publisher is Running

docker logs -f gpu-temp-publisher

You should see output similar to this:

Build and Run the Alert Consumer

docker-compose up -d

Verify Alert Service is Running

docker logs -f gpu-temp-alert

You should see output similar to this:

Update Slack Webhook URL

Edit docker-compose.yml and replace YOUR_WEBHOOK_URL with your actual Slack webhook URL:

- SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX

Check Slack for Alerts

When a temperature exceeds the threshold, you should see an alert in your configured Slack channel:

This GPU temperature monitoring system, while inspired by Sam Altman's humorous "our GPUs are melting" tweet, is primarily a hypothetical scenario designed to demonstrate how Go and Kafka-compatible messaging systems like Redpanda can be effectively combined to build real-time event processing pipelines. Through this example, we've explored fundamental patterns in event-driven architecture, stream processing, and alerting that can be applied to many real-world monitoring challenges, all while learning practical Go programming techniques.

Data Engineer as a Real-Time Algo Trader – Turning Pipelines into Profit (or at Least Trying)!

SNEHASISH DUTTA — Mon, 09 Dec 2024 09:25:59 +0000

In an era where data drives decisions, this project explores the intersection of trading and real-time analytics.

Using Alpaca’s paper trading API, transactional data is streamed into Redpanda and analyzed with Apache Flink to extract actionable insights.
Sentiment analysis powers buy and sell signals, seamlessly delivered via Slack, creating a streamlined and responsive trading workflow.

GitHub :: https://github.com/snepar/flink-algo-trading

Data Flow Diagram (Architecture)

Introducing the Building Blocks

This concisely conveys the purpose of the following components.

Alpaca :: Alpaca Trading APIs offer commission-free, programmatic access to U.S. stock and ETF trading through a modern REST interface. The platform stands out for its developer-friendly paper trading environment, which allows risk-free testing of trading strategies using real market data. With comprehensive SDKs in multiple languages (Python, JavaScript, Go), real-time WebSocket data streams, and support for various order types (market, limit, stop), Alpaca enables developers to quickly build and test algorithmic trading systems. Whether you're developing a trading strategy or building a full-scale automated trading platform, Alpaca's combination of zero-cost paper trading and production-ready infrastructure makes it an ideal choice for both learning and deployment.

VADER :: (Valence Aware Dictionary and sEntiment Reasoner) is a powerful sentiment analysis tool specifically designed for social media text. It's part of the NLTK library and uses a combination of lexical features and rule-based analysis to assess text sentiment.
Key Features:

Pre-built sentiment lexicon with emotion-word ratings
Handles informal language, emojis, and social media context
Considers punctuation and capitalization for emphasis
Returns scores for positive, negative, neutral, and compound sentiment
No training required (rule-based approach)

Advantages:

Social Media Optimized: Accurately handles slang, emoticons, and informal language common in social media posts
Fast Processing: Being rule-based, it's computationally efficient and doesn't require model training
Context Sensitive: Understands sentiment intensifiers, contractions, and negations
Multiple Scores: Provides granular sentiment analysis with separate scores for different sentiments
Easy Integration: Simple to use with just a few lines of code
Domain Adaptable: Works well across various domains, from social media to product reviews

The compound score (-1 to +1) makes it particularly useful for quick sentiment classification, while the individual positive, negative, and neutral scores provide deeper insight into the text's emotional content.

Redpanda :: Redpanda is a modern streaming data platform designed as a Kafka API-compatible alternative with significantly improved performance and simplified operations. It's written in C++ and provides a zero-copy architecture, eliminating the need for a separate JVM, Zookeeper, or replication controller. Redpanda offers real-time data streaming with sub-millisecond latency, making it ideal for high-throughput scenarios like algorithmic trading. The platform stands out for its self-tuning capabilities, transparent data replication, and ability to handle millions of events per second while maintaining data consistency.
Key advantages include:

Kafka API compatibility without configuration overhead
Single binary deployment with no external dependencies
Hardware optimized performance with lower resource consumption
Built-in disaster recovery and data durability
Simple scaling and maintenance without complex configurations

Flink-SQL :: Flink SQL is a powerful query interface in Apache Flink that enables real-time stream processing and analytics using standard SQL syntax. It allows developers to write SQL queries that can analyze both streaming and batch data without changing the underlying code. What sets Flink SQL apart is its ability to handle continuous queries over streaming data, with support for event time processing, windowing operations, and complex event pattern matching.
Key features include:

Real-time continuous querying
Advanced window operations (sliding, tumbling, session)
Stream-table joins and temporal table joins
Pattern detection using MATCH_RECOGNIZE
Built-in connectors for various data sources/sinks
Dynamic table concepts for stream processing
Support for user-defined functions (UDFs)

Py-Flink :: PyFlink is Python's API for Apache Flink, offering a powerful stream processing framework with the accessibility of Python. It enables developers to write scalable stream processing applications using familiar Python syntax while leveraging Flink's robust distributed computing capabilities. PyFlink supports both the DataStream API for low-level stream processing and the Table API/SQL for declarative data analytics, making it particularly valuable for real-time data analysis and complex event processing.
Key features include:

Native Python UDFs (User Defined Functions)
Seamless integration with Python data science libraries
Support for both batch and stream processing
Real-time data analytics using SQL
Stateful stream processing capabilities
Event-time processing and windowing operations

The combination of Python's ease of use with Flink's performance makes PyFlink an excellent choice for building real-time data pipelines and streaming analytics applications.

Goal

Setup Paper Trading: using Alpaca
Data Ingestion: using Kafka APIs into RedPanda
Streaming Extract Transform and Aggregate: using Flink SQL
Generate Trade Signals : using Flink Source APIs to Slack
Backtest : Algorithmic Trading Strategies

Let Us Begin

>> Paper-Trading With Alpaca

Copy The API Keys

Configure the keys in your Python Project

pip install alpaca_trade_api

config = {
    'key_id': ' API KEY ',
    'secret_key': ' SECRET KEY ',
    'redpanda_brokers': 'localhost:9092,localhost:9093',
    'base_url': 'https://data.alpaca.markets/v1beta1/',
    'trade_api_base_url': 'https://paper-api.alpaca.markets/v2',
    'slack_token': '',
    'slack_channel_id': ''
}

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/alpaca_config/keys.py

>> Infrastructure With Docker

Configure docker-compose.yml
And Dockerfile-sql

services:
  redpanda-1:
...
  redpanda-console:
      ports:
        - 8080:8080
# Flink cluster
  jobmanager:
    container_name: jobmanager
    build:
      context: .
      dockerfile: Dockerfile-sql  
      ports:
        - 8081:8081
   sql-client:
     container_name: sql-client
     build:
      context: .
      dockerfile: Dockerfile-sql
     command:
       - /opt/flink/bin/sql-client.sh
...

Execute Command docker compose up -d --build

Check the Status of RedPanda
http://localhost:8080/overview

Check the Status of Flink SQL
http://localhost:8081/#/overview

Test Flink-SQL Client
Execute Command docker compose run sql-client
Execute Test Query Flink SQL> select 'hello world';

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/docker-compose.yml

>> The Redpanda Producer With Sentiment Analysis from Past News Headlines

Install NLTK Libraries
pip install nltk
Or Alternatively python -m nltk.downloader vader_lexicon
Or skip ssl and run

import nltk
import ssl

try:
  _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
  pass

else:
  ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

Select The Appropriate Model from the list

Define the sentiment analyser function

from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

sia = SIA()


def get_sentiment(text):
    scores = sia.polarity_scores(text)
    return scores['compound']

Install Kafka Libraries to Produce Data in RedPanda

pip install kafka-python requests pandas

def get_producer(brokers: List[str]):
    producer = KafkaProducer(
        bootstrap_servers=brokers,
        key_serializer=str.encode,
        value_serializer=lambda v: json.dumps(v).encode('utf-8')
    )
    return producer

Define the Kafka Producer for fetching Historical News From Alpaca and Publish to a Topic in RedPanda of a certain date range

def produce_historical_news(
        redpanda_client: KafkaProducer,
        start_date: str,
        end_date: str,
        symbols: List[str],
        topic: str
    ):
    key_id = config['key_id']
    secret_key = config['secret_key']
    base_url = config['base_url']

    api = REST(key_id=key_id, secret_key=secret_key, base_url=URL(base_url)) ...

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/news-producer.py

Invoke the Sentiment Function to calculate Sentiment Score Based on the News Headline

article = row._raw
            should_proceed = any(term in article['headline'] for term in symbols)
            if not should_proceed:
                continue
article['sentiment'] = get_sentiment(article['headline'])

Publish to RedPanda Topic market-news For the Company Name Apple / AAPL

produce_historical_news(
        get_producer(config['redpanda_brokers']),
        topic='market-news',
        start_date='2024-01-01',
        end_date='2024-12-08',
        symbols=['AAPL', 'Apple']
    )

Check the Topic market-news in RedPanda UI at http://localhost:8080/topics/market-news

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/news-producer.py

>> The Redpanda Producer for Historical and Real-Time Stock Price Changes

Use the Alpaca StockBarsRequest API

start_date = datetime.strptime(start_date, '%Y-%m-%d')
end_date = datetime.strptime(end_date, '%Y-%m-%d')
granularity = TimeFrame.Minute

request_params = StockBarsRequest(
        symbol_or_symbols=symbol,
        timeframe=granularity,
        start=start_date,
        end=end_date)

prices_df = api.get_stock_bars(request_params).df

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/prices-producer.py

Fetch the informations as follows in Response it also includes the Volume Weighted Average Price

{
  "symbol":"AAPL"
  "timestamp":1707246180000
  "open":188.7199
  "high":188.74
  "low":188.585
  "close":188.64
  "volume":108418
  "trade_count":1313
  "vwap":188.660311
  "provider":"alpaca"
}

Check the Topic stock-prices in RedPanda UI at http://localhost:8080/topics/stock-prices for key AAPL

>> Flink SQL Based Table Creation on RedPanda Topics

Market News Table
This table captures financial news data from Kafka with sentiment analysis:

Stores news articles with metadata (author, headline, source)
Includes sentiment scores for each news item
Uses event time processing with 5-second watermark
Connects to Redpanda topic market-news for data streaming

CREATE OR REPLACE TABLE market_news (
    id BIGINT,
    author VARCHAR,
    headline VARCHAR,
    source VARCHAR,
    summary VARCHAR,
    data_provider VARCHAR,
    `url` VARCHAR,
    symbol VARCHAR,
    sentiment DECIMAL,
    timestamp_ms BIGINT,
    event_time AS TO_TIMESTAMP_LTZ(timestamp_ms, 3),
    WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
    'connector' = 'kafka',
    'topic' = 'market-news',
    'properties.bootstrap.servers' = 'redpanda-1:29092,redpanda-2:29092',
    'properties.group.id' = 'test-group',
    'properties.auto.offset.reset' = 'earliest',
    'format' = 'json'
);

Stock Prices Table
Captures real-time stock price data

Stores OHLCV (Open, High, Low, Close, Volume) data
Includes additional metrics like VWAP and trade count
Uses event time processing with 5-second watermark
Streams data from a separate RedPanda topic stock-prices

CREATE OR REPLACE TABLE stock_prices (
    symbol VARCHAR,
    `open` FLOAT,
    high FLOAT,
    low FLOAT,
    `close` FLOAT,
    volume DECIMAL,
    trade_count FLOAT,
    vwap DECIMAL,
    provider VARCHAR,
    `timestamp` BIGINT,
    event_time AS TO_TIMESTAMP_LTZ(`timestamp`, 3),
    WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
    'connector' = 'kafka',
    'topic' = 'stock-prices',
    'properties.bootstrap.servers' = 'redpanda-1:29092,redpanda-2:29093',
    'properties.group.id' = 'test-group',
    'properties.auto.offset.reset' = 'earliest',
    'format' = 'json'
);

>> Real-Time Aggregation Using Flink-SQL and Simple Moving Average (Algorithm)

Moving Average Views (sma_20 and sma_50)

Creates 20-period and 50-period simple moving averages
Uses window functions for continuous calculation
Partitions by symbol for multiple stock analysis
Maintains temporal order using event_time

CREATE OR REPLACE VIEW sma_20 AS
SELECT symbol, `close`, event_time,
    AVG(`close`) OVER (PARTITION BY symbol ORDER BY event_time ROWS BETWEEN 19 PRECEDING AND CURRENT ROW) AS sma_20
FROM stock_prices;

CREATE OR REPLACE VIEW sma_50 AS
SELECT
    symbol,
    `close`,
    event_time,
    AVG(`close`) OVER (PARTITION BY symbol ORDER BY event_time ROWS BETWEEN 49 PRECEDING AND CURRENT ROW) AS sma_50
FROM stock_prices;

Price with Moving Averages View

Combines both SMAs (20 and 50 period)
Joins the moving averages on symbol and event_time
Provides a consolidated view of price and technical indicators

CREATE OR REPLACE VIEW price_with_movavg AS
SELECT
    s20.symbol,
    s20.`close`,
    s20.event_time,
    s20.sma_20,
    s50.sma_50
FROM sma_20 s20
JOIN sma_50 s50
    ON s20.symbol = s50.symbol AND s20.event_time = s50.event_time;

News and Prices View

Correlates news events with price movements
Uses a 2-minute window (±1 minute) to match news with prices
Combines sentiment data with technical indicators
Enables analysis of news impact on price

CREATE OR REPLACE VIEW news_and_prices AS
SELECT
    n.symbol,
    n.headline,
    n.sentiment,
    p.`close`,
    p.sma_20,
    p.sma_50,
    n.event_time AS news_time,
    p.event_time AS price_time
FROM market_news n
JOIN price_with_movavg p
    ON n.symbol = p.symbol
    AND n.event_time BETWEEN p.event_time - INTERVAL '1' MINUTE AND p.event_time + INTERVAL '1' MINUTE;

Trading Signals View
Implements the trading strategy:

BUY Signal Conditions:

Positive sentiment (sentiment > 0)
Price crosses below SMA20 (current < SMA20 && previous >= SMA20)

SELL Signal Conditions:

Negative sentiment (sentiment < 0)
Price crosses above SMA20 (current > SMA20 && previous <= SMA20)

Uses LAG function for price crossover detection

CREATE OR REPLACE VIEW trading_signals AS
SELECT
    symbol,
    news_time,
    `close`,
    1 as quantity,
    CASE
        WHEN sentiment > 0 AND `close` < sma_20 AND lag(`close`, 1) OVER (PARTITION BY symbol ORDER BY news_time) >= sma_20 THEN 'BUY'
        WHEN sentiment < 0 AND `close` > sma_20 AND lag(`close`, 1) OVER (PARTITION BY symbol ORDER BY news_time) <= sma_20 THEN 'SELL'
        ELSE NULL
    END AS signal
FROM news_and_prices;

>> Publish Trading Signals to a Topic in RedPanda using Flink-SQL

Create a topic in RedPanda trading-signals

Create a Table using Flink SQL

CREATE OR REPLACE TABLE trading_signals_sink (
    symbol STRING,
    signal_time TIMESTAMP_LTZ,
    signal STRING
) WITH (
    'connector' = 'kafka',
    'topic' = 'trading-signals',
    'properties.bootstrap.servers' = 'redpanda-1:29092, redpanda-2:29092',
    'properties.group.id' = 'test-group',
    'format' = 'json'
);

Start Publishing Trade Signals to the topic

INSERT INTO trading_signals_sink
SELECT symbol, news_time, signal
FROM trading_signals
WHERE signal IS NOT NULL;

Verify Data from RedPanda UI

>> PyFlink to Consume Trade Signals

Install Apache Flink Library pip install apache-flink

Create Flink Kafka Consumer Group
Add Relevant JAVA JARs

env = StreamExecutionEnvironment.get_execution_environment()
    env.set_parallelism(4)
    env.add_jars('<<location to>>/flink-sql-connector-kafka-3.1.0-1.18.jar')

    kafka_consumer_properties = {
        'bootstrap.servers': 'localhost:9092,localhost:9093',
        'group.id': 'news_trading_consumer_group',
        'auto.offset.reset': 'earliest'
    }

    kafka_consumer = FlinkKafkaConsumer(
        topics='trading-signals',
        deserialization_schema=SimpleStringSchema(),
        properties=kafka_consumer_properties
    )

kafka_stream = env.add_source(kafka_consumer, type_info=Types.STRING())

Process The Event

message_dict = json.loads(message)
        symbol = message_dict.get('symbol', 'N/A')
        signal_time = message_dict.get('signal_time', 'N/A')
        signal = message_dict.get('signal', 'N/A')

        formatted_message = """
        =============================
        ALERT ⚠️ New Trading Signal!\n
        Symbol: {symbol} \n
        Signal: {signal} \n
        Time: {signal_time}
        =============================
        """.format(
            symbol=symbol,
            signal=signal,
            signal_time=signal_time
        )

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/signal_handler.py

>> Slack to Post Alerts

Configure A Slack Channel with a BOT to Publish Message OAuth Scope = chat:write

Start Pushing Alert Messages to This Channel

def send_to_slack(message, token, channel_id):
    url = 'https://slack.com/api/chat.postMessage'
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {token}'
    }

    data = {
        'channel': channel_id,
        'text': message
    }

    response = requests.post(url, headers=headers, json=data)

    if response.status_code != 200:
        raise ValueError(f'Failed to send message to slack, {response.status_code}, response: {response.text}')

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/signal_handler.py

Alerts in Slack

>> Place Order Based on Trade Signals

Configure the Trade API to place order based on Buy and Sell Signal in Alpaca Broker

def place_order(symbol, qty, side, order_type, time_in_force):
    try:
        order = api.submit_order(
            symbol=symbol,
            qty=qty,
            side=side,
            type=order_type,
            time_in_force=time_in_force
        )
        print(f'Order submitted: {order}')
        return order
    except Exception as e:
        print(f'An error occured while submitting order {e}')
        return None

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/signal_handler.py

Console Log

Alpaca Dashboard

>> BackTesting The Algorithm Strategies

All Weather Strategy :: This implements Ray Dalio's All Weather strategy, which aims to perform well in any economic environment by balancing growth assets with protection against different economic scenarios (inflation, deflation, growth, recession).

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/strategies/AllWeatherStrategy.py

import backtrader as bt

class AllWeatherStrategy(bt.Strategy):
    def __init__(self):
        self.year_last_rebalanced = -1
        self.weights = {"VTI": 0.30, 'TLT': 0.40, 'IEF': 0.15, 'GLD': 0.075, 'DBC': 0.075}

    def next(self):
        if self.datetime.date().year == self.year_last_rebalanced:
            return

        self.year_last_rebalanced = self.datetime.date().year

        for i, d in enumerate(self.datas):
            symbol = d._name
            self.order_target_percent(d, target=self.weights[symbol])

This code defines a trading strategy class AllWeatherStrategy that inherits from Backtrader's Strategy class. The strategy implements annual portfolio rebalancing with predefined asset allocations:

In the __init__ method:

self.year_last_rebalanced = -1: Tracks the last rebalancing year
self.weights dictionary defines the asset allocation:

30% in VTI (Vanguard Total Stock Market ETF) - Growth
40% in TLT (Long-term Treasury Bonds) - Deflation protection
15% in IEF (Intermediate Treasury Bonds) - Income
7.5% in GLD (Gold) - Inflation protection
7.5% in DBC (Commodity Index) - Inflation protection

The next method executes the rebalancing logic:

Checks if we're in a new year (compared to last rebalance)
If it's a new year:

Updates the last rebalanced year
Iterates through each asset in the portfolio
Uses order_target_percent to adjust each position to match its target weight

Golden Cross Strategy ::

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/strategies/GoldenCrossStrategy.py

import backtrader as bt

class GoldenCrossStrategy(bt.Strategy):
    params = (
        ('short_window', 50),
        ('long_window', 200)
    )

    def __init__(self):
        self.short_ema = bt.indicators.EMA(self.datas[0].close, period=self.params.short_window)
        self.long_ema = bt.indicators.EMA(self.datas[0].close, period=self.params.long_window)
        self.crossover = bt.indicators.CrossOver(self.short_ema, self.long_ema)

    def next(self):
        if not self.position:
            if self.crossover > 0:
                self.buy()
        elif self.crossover < 0:
            self.close()

Golden Cross trading strategy, which is a popular technical analysis method.

The GoldenCrossStrategy class defines a trend-following strategy based on exponential moving average (EMA) crossovers:

Strategy Parameters (params):

short_window = 50: 50-day EMA period
long_window = 200: 200-day EMA period

In the __init__ method: Creates two EMAs using closing prices:

short_ema: 50-day EMA (faster moving average)
long_ema: 200-day EMA (slower moving average)

Creates a crossover indicator to detect when the EMAs cross

The next method contains trading logic:
If no position is held (if not self.position):
Buys when short EMA crosses above long EMA (crossover > 0)
If holding a position:
Sells when short EMA crosses below long EMA (crossover < 0)

This strategy follows the principle that:

A "Golden Cross" (short EMA crossing above long EMA) signals an uptrend and triggers a buy
A "Death Cross" (short EMA crossing below long EMA) signals a downtrend and triggers a sell

MomentumStrategy Backtesting Implementation

Key features of this strategy:

Uses momentum to identify strong upward price movements
Uses EMA as a trailing stop mechanism
Combines momentum and trend following concepts
Momentum > 100 indicates price is moving up significantly
Price below EMA suggests trend might be weakening

The strategy aims to:

Catch strong upward price movements (momentum > 100)
Stay in the trade while trend remains positive
Exit when trend weakens (price < EMA)

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/strategies/MomentumStrategy.py

import backtrader as bt

class MomentumStrategy(bt.Strategy):
    params = (
        ('momentum_period', 12),
        ('exit_period', 26)
    )

    def __init__(self):
        self.momentum = bt.indicators.MomentumOscillator(
            self.datas[0].close, period=self.params.momentum_period)
        self.exit_signal = bt.indicators.EMA(
            self.datas[0].close, period=self.params.exit_period)

    def next(self):
        if not self.position:
            if self.momentum > 100:
                self.buy()
        elif self.datas[0].close[0] < self.exit_signal[0]:
            self.close()

*Test Using Backtester (Momentum Strategy) *

Backtrader backtesting script integrates with Alpaca's API.
Creates Backtrader's main engine (Cerebro)
Sets initial cash amount
Adds the specified trading strategy
Runs the backtest
Prints initial and final portfolio values
Calculates percentage return
Reports Sharpe Ratio
Generates performance plots

def run_backtest(strategy, symbols, start, end, timeframe, cash):
    rest_api = REST(config['key_id'], config['secret_key'], base_url=config['trade_api_base_url'])

    #initialize backtrader broker
    cerebro = bt.Cerebro(stdstats=True)
    cerebro.broker.setcash(cash)

    # add strategy
    cerebro.addstrategy(strategy)

    # add analytics
    cerebro.addobserver(bt.observers.Value)
    cerebro.addobserver(bt.observers.BuySell)

    cerebro.addanalyzer(bt.analyzers.SharpeRatio, _name='mysharpre')

GitHub :: https://github.com/snepar/flink-algo-trading/blob/main/backtester.py

>> Paper Vs Real Verifications

Paper Trading

Real Trading

References

Realtime Algorithmic Trading with Apache Flink https://www.youtube.com/watch?v=7r_oO_uLbSM

Apache Spark-Structured Streaming :: Cab Aggregator Use-case

SNEHASISH DUTTA — Sun, 30 Jun 2024 17:50:09 +0000

Building helps you retain more knowledge.
But teaching helps you retain even more. Teaching is another modality that locks in the experience you gain from building.--Dan Koe

Objective

Imagine a very simple system that can automatically warn cab companies whenever a driver rejects a bunch of rides in a short time. This system would use Kafka to send ride information (accepted, rejected) and Spark Structured Streaming to analyze it in real-time. If a driver rejects too many rides, the system would trigger an alert so the cab company can investigate.

What is Spark Structured Streaming ?

Spark Structured Streaming is a powerful tool for processing data streams in real-time. It's built on top of Apache Spark SQL, which means it leverages the familiar DataFrame and Dataset APIs you might already use for batch data processing in Spark. This offers several advantages:

Unified Programming Model: You can use the same set of operations for both streaming and batch data, making it easier to develop and maintain code.

Declarative API: Spark Structured Streaming lets you describe what you want to achieve with your data processing, rather than writing complex low-level code to handle the streaming aspects.
Fault Tolerance: Spark Structured Streaming ensures your processing jobs can recover from failures without losing data. It achieves this through techniques like checkpointing and write-ahead logs.

Here's a breakdown of how Spark Structured Streaming works:

Streaming Data Source: Your data comes from a streaming source like Kafka, Flume, or custom code that generates a continuous stream of data.

Micro-Batching: Spark Structured Streaming breaks down the continuous stream into small chunks of data called micro-batches.

Structured Processing: Each micro-batch is processed like a regular DataFrame or Dataset using Spark SQL operations. This allows you to perform transformations, aggregations, and other data manipulations on the streaming data.

Updated Results: As new micro-batches arrive, the processing continues, and the results are constantly updated, reflecting the latest data in the stream.

Sinks: The final output can be written to various destinations like databases, dashboards, or other streaming systems for further analysis or action.

Benefits of Spark Structured Streaming:

Real-time Insights: Analyze data as it arrives, enabling quicker decision-making and proactive responses to events.

Scalability: Handles large volumes of streaming data efficiently by leveraging Spark's distributed processing capabilities.

Ease of Use: The familiar DataFrame/Dataset API makes it easier to develop and maintain streaming applications.

In essence, Spark Structured Streaming bridges the gap between batch processing and real-time analytics, allowing you to analyze data as it's generated and gain valuable insights from continuous data streams.

Project Architecture

Extract From : Apache Kafka
Transform Using : Apache Spark
Load Into : Apache Kafka

Producer and Infrastructure

Repository : https://github.com/snepar/cab-producer-infra

It is a Simple Application which ingests data into Kafka
It ingests Random Events either Accepted or Rejected

Sample Event

{ 
   "id": 3949106,
   "event_date": 1719749696532,
   "tour_value": 29.75265579847153,
   "id_driver": 3,
   "id_passenger": 11,
   "tour_status": rejected
}

Start the Infrastructure

docker compose up

Radom Events Generator

val statuses = List("accepted", "rejected")
    val status = statuses(Random.nextInt(statuses.length))
    while (true) {
      val topic = "ride"
      val r = scala.util.Random
      val id = r.nextInt(10000000)
      val tour_value = r.nextDouble() * 100
      val id_driver = r.nextInt(10)
      val id_passenger = r.nextInt(100)
      val event_date = System.currentTimeMillis
      val payload =
        s"""{"id":$id,"event_date":$event_date,"tour_value":$tour_value,"id_driver":$id_driver,"id_passenger":$id_passenger,"tour_status":"$status"}""".stripMargin

      EventProducer.send(topic, payload)
      Thread.sleep(1000)
    }

Send Random Events to Producer

def send(topic: String, payload: String): Unit = {
    val record = new ProducerRecord[String, String](topic, key, payload)
    producer.send(record)
  }

See the produced events from Topic named ride in the Docker Terminal

kafka-console-consumer --topic ride --bootstrap-server broker:9092

Spark Structured Streaming Application

Repository : https://github.com/snepar/spark-streaming-cab

Create Spark Session to Execute the application locally ::

val spark = SparkSession.builder()
      .appName("Integrating Kafka")
      .master("local[2]")
      .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

Configure Reader and Writer - Kafka topics

    val kafkahost = "localhost:9092"
    val inputTopic = "ride"
    val outputTopic = "rejectalert"
    val props = new Properties()
    props.put("host", kafkahost)
    props.put("input_topic",inputTopic)
    props.put("output_host", kafkahost)
    props.put("output_topic",outputTopic)
    props.put("checkpointLocation","/tmp")

Define Schema for the Events

val schema = StructType(Seq(
      StructField("id", IntegerType, nullable = true), 
      StructField("event_date", LongType, nullable = false), 
      StructField("tour_value", DoubleType, nullable = true), 
      StructField("id_driver", StringType, nullable = false), 
      StructField("id_passenger", IntegerType, nullable = false), 
      StructField("tour_status", StringType, nullable = false) 
    ))

Read from Kafka Topic and Create the Streaming Dataframe

val df = spark.readStream.format("kafka")
      .option("kafka.bootstrap.servers","localhost:9092")
      .option("failOnDataLoss","false")
      .option("startingOffsets", "latest")
      .option("subscribe", "ride").load()

Parse the Dataframe with Schema and filter out only Events which are marked as Rejected
The Rejected Events Signify that a Driver has rejected a ride

val parsedDF = df.selectExpr("cast(value as string) as value")
      .select(from_json(col("value"), schema).as("data"))
      .select("data.*").where("tour_status='rejected'")

Aggregate in a Window of 1 minute how many rides were rejected and Group By driver ID , also calculate how much money has been lost due to this rejection

val driverPerformance: DataFrame = parsedDF.groupBy(
      window(to_utc_timestamp(from_unixtime(col("event_date") / 1000, "yyyy-MM-dd HH:mm:ss"), "UTC+1")
        .alias("event_timestamp"),
        "1 minute"),
      col("id_driver"))
      .agg(count(col("id")).alias("total_rejected_tours"),
        sum("tour_value").alias("total_loss"))
.select("id_driver", "total_rejected_tours", "total_loss")

Set a threshold of 3 cancellations , if it crosses 3 generate an event

val thresholdCrossedDF = driverPerformance.where(col("total_rejected_tours").gt(3))

Write this DataFrame to a Kafka Topic rejectalert

thresholdCrossedDF.selectExpr("CAST(id_driver AS STRING) AS key", "to_json(struct(*)) AS value")
      .writeStream
      .format("kafka")
      .option("kafka.bootstrap.servers",
      prop.getProperty("output_host","localhost:9092"))
      .option("topic",prop.getProperty("output_topic","rejectalert"))
.outputMode("update".option("checkpointLocation",prop.getProperty("checkpoint","/tmp"))
      .start().awaitTermination()

Run the Complete Application : https://github.com/snepar/spark-streaming-cab/blob/master/src/main/scala/rideevent/AlertGenerator.scala

Using A Consumer on Kafka Broker Subscribe to these Alerts

References

https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html

https://github.com/rockthejvm/spark-streaming

Apache Flink :: E-commerce Data Pipeline Usecase

SNEHASISH DUTTA — Wed, 19 Jun 2024 07:35:15 +0000

In today's data-driven world, real-time streaming is essential for businesses to gain instant insights and respond swiftly to market changes. Technologies like Apache Flink, Kafka, Postgres, Elasticsearch, and Kibana combine to create a powerful stack for processing, storing, searching, and visualizing streaming data in real time.

This article explores the integration of these technologies to build a robust real-time streaming architecture for an e-commerce business.

GitHub :: https://github.com/snepar/flink-ecom

Introducing the Building Blocks

This concisely conveys the purpose of the following components.
Apache Kafka :: A distributed streaming platform that excels in handling real-time data feeds. It enables the seamless transmission of data across systems, ensuring high-throughput, low-latency, fault-tolerant communication.

Apache Flink :: A Powerful real-time stream engine (with APIs in Java/Scala/Python) for low-latency data analysis & transformations on massive data streams.

PostgreSQL (Postgres):: is an advanced open-source relational database known for its robustness, scalability, and SQL compliance. It supports complex queries, ACID transactions, and extensibility with custom functions and types.

Elasticsearch :: is a distributed, open-source search and analytics engine. It excels at full-text search, real-time data indexing, and querying. It's highly scalable and ideal for log and event data analysis.

Kibana :: is a powerful visualization tool for Elasticsearch. It provides real-time insights into data with interactive charts, graphs, and dashboards, making it easy to explore and analyze large datasets visually.

Data Flow Diagram (Architecture)

Goal

Data Ingestion: using Kafka
Streaming Extract: using Flink Source APIs
Transform and Aggregate: using Flink SerDe and transformations
Load: using Flink Sink APIs to Postgres and Elasticsearch
Visualise: using Kibana

Let Us Begin With the Infrastructure

Install the following

Python 3 , pip
Scala 2.12 , sbt
Docker Desktop
Docker Compose

Refer to this repository for the complete setup
https://github.com/snepar/flink-ecom-infra
Execute command docker compose up -d
Run the python file main.py to generate events to kafka

Execute the kafka consumer from your docker service to verify if the events are flowing as expected

kafka-console-consumer --topic financial_transactions --bootstrap-server broker:9092

Set Up Apache Flink

I have used Flink 1.16.3
Run Flink Locally using ->
https://nightlies.apache.org/flink/flink-docs-stable/docs/try-flink/local_installation/

The Flink Project Deep Dive

build.sbt manages all the dependencies for the project. To connect to Kafka, Postgres, and Elasticsearch.
https://github.com/snepar/flink-ecom/blob/master/build.sbt

Flink Kafka Connector uses org.apache.flink.connector.kafka.source.KafkaSource
Example Snippet::

KafkaSource.builder[Transaction]()
      .setBootstrapServers("localhost:9092")
      .setTopics(topic)
      .setGroupId("ecom-group")
      .setStartingOffsets(OffsetsInitializer.earliest())
      .setValueOnlyDeserializer(new JSONValueDeserializationSchema())
      .build()

JSON Deserializer is used to deserialze kafka payload to Transaction
Can be referred here :: https://github.com/snepar/flink-ecom/blob/master/src/main/scala/ecom/deserializer/JSONValueDeserializationSchema.scala
Important Fact: as we are dealing with Scala Case Classes
use DefaultScalaModule
import com.fasterxml.jackson.module.scala.DefaultScalaModule mapper.registerModule(DefaultScalaModule)

see the data consumed from Kafka in Flink using transactionStream.print()

Flink-Postgres Sink

DDLs are Defined here : https://github.com/snepar/flink-ecom/tree/master/src/main/scala/ecom/generators/DDL
Aggregation Examples (Monthly)

def writeToDBsalesPerMonth(transactionStream: DataStream[Transaction]) = {
    transactionStream.addSink(JdbcSink.sink(
      DDL.SalesPerMonthSQL.createTable,
      new JdbcStatementBuilder[Transaction] {
        override def accept(t: PreparedStatement, u: Transaction): Unit = {
        }
      },
      execOptions,
      connOptions
    ))

    transactionStream.map(transaction =>
      {
        val transactionDate = new Date(transaction.transactionDate.getTime);
        val year = transactionDate.toLocalDate().getYear();
        val month = transactionDate.toLocalDate().getMonth().getValue();
        SalesPerMonth(year, month, totalSales = transaction.totalAmount)
      }
    ).keyBy(spm=>(spm.year,spm.month)).reduce((acc,curr) => acc.copy(totalSales = acc.totalSales + curr.totalSales))
      .addSink(JdbcSink.sink(
        DDL.SalesPerMonthSQL.insertStmt,
        new JdbcStatementBuilder[SalesPerMonth] {
          override def accept(preparedStatement: PreparedStatement, salesPerMonth: SalesPerMonth): Unit = {
            preparedStatement.setInt(1, salesPerMonth.year)
            preparedStatement.setInt(2, salesPerMonth.month)
            preparedStatement.setDouble(3, salesPerMonth.totalSales)
          }
        },
        execOptions,
        connOptions
      ))

  }

Flink-Elastic Sink

Important :: While defining the emitter function, type description is required (it is not mentioned in the documentation examples somehow)

def writeToElastic(transactionStream: DataStream[Transaction]) = {

    val sink: ElasticsearchSink[Transaction] = new Elasticsearch7SinkBuilder[Transaction]
      .setHosts(new HttpHost("localhost", 9200, "http"))
      .setBulkFlushMaxActions(2)
      .setBulkFlushInterval(10L)
      .setEmitter[Transaction]{
        (transaction, context, indexer) => {
          val mapper = new ObjectMapper()
          mapper.registerModule(DefaultScalaModule)
          val json: String = mapper.writeValueAsString(transaction)

          val indexRequest = Requests.indexRequest()
            .index("transactions")
            .id(transaction.transactionId)
            .source(json, XContentType.JSON);
          indexer.add(indexRequest)
        }
      }.build()

    transactionStream.sinkTo(sink)
  }

Flink Job Execution

KafkaPGESIntegrationEcom.scala : Can be directly run from the IDE OR
Install a flink cluster and deploy using $flink run -c ecom.KafkaPGESIntegrationEcom flink-ecom_2.12-0.1.jar

A Glimpse into Postgres SQL

transactions

sales_per_category

sales_per_day

sales_per_month

Data on Elasticsearch

Indexing here’s the structure of the transaction index on elasticsearch.
You can get this by running GET transactions in the DevTools.

You can query them by running GET transactions/_search

Reindexing Data on Elasticsearch

To get a readable transaction date, we need to reindex into a different index. To reindex data on elasticsearch, we use the _reindex function.

POST _reindex
{
 "source": {"index": "transactions"}, 
 "dest": {"index": "transaction_part1"},
 "script": {"source":"""
   ctx._source.transactionDate = new 
   Date(ctx._source.transactionDate).toString();
"""}
}

GET reindex/_search

However, using toString() does not give us much room to wiggle around the format. So we need to use a more robust way to format the data.

POST _reindex
{
"source": {"index": "transactions"}, 
"dest": {"index": "transaction_part2"},
"script": {"source": 
 """SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
  formatter.setTimeZone(TimeZone.getTimeZone('UTC'));
  ctx._source.transactionDate = formatter.format (new 
  Date(ctx._source.transactionDate));"""
 }
}

GET transaction_part2/_search

Dashboard-ing in Realtime With Kibana

Index on transaction_part2

Creating Donut Chart

Number Of Transactions

Top Brands

Final Dashboard