Forem: Drew Schillinger

Public vs. Private LLMs: Another One Rides the Bus

Drew Schillinger — Sat, 16 Aug 2025 03:02:26 +0000

Weird Al popped into my head at lunch this week (not exactly unusual if we're being honest) while I was talking with a friend about the pros and cons of public vs. private LLMs. We work in wildly different industries, with vastly different AI experience — he's just starting to explore LLMs, and I've been building GPT-powered pipelines since the GPT-2 days — yet our perspectives on the tech, the people, and the problems lined up almost perfectly.

Somewhere in our conversation about data sanitization and between bites (Just Eat It!), the line from Another One Rides The Bus popped into my head:

Ridin' in the bus down the boulevard, and the place was pretty packed… It was smellin' like a locker room // There was junk all over the floor…

It fit our conversation perfectly. What happens when you don't want your proprietary dataset helping your competitors — or worse, seeing traces of their smelly data (my friend's words) in your results?

And then there's reliability. The bus can break down — sometimes for minutes, sometimes for hours — and you're stuck on the curb waiting for service to resume.

Or maybe the bus driver (read: OpenAI, Anthropic, Google, etc.) decides overnight to change the route entirely. One day you're happily riding GPT-4o, the next you're handed GPT-5 — a completely different personality — without so much as a stop-announcement.

That's not hypothetical; it happened this past week with the replacement of GPT-4o with 5, and the uproar from developers and end-users alike was proof that we've all gotten a bit too comfortable trusting someone else's transit map.

To quote a comment (https://www.linkedin.com/in/jimdiroffii/) I read on LI:

"Imagine if when Python 3 was released, the public wasn't allowed to test it, and all access to Python 2 was simultaneously restricted. It would have been calamitous."

When you're dependent on someone else's fleet, you inherit their scheduling, their maintenance priorities, and their budget decisions — even if those decisions wreck your carefully tuned workflows. In other words: you might be paying for a ticket, but you're not steering the wheel.

Weird Al might have been singing about public transportation, but the analogy works frighteningly well for public Large Language Models (LLMs) like GPT, Gemini, Perplexity, Grok, or Claude: easy to hop on, zero maintenance (for us), and they usually get you where you need to go. Usually. But when another one rides the bus — or a few million others do — the limitations of a shared ride become very clear.

Private LLMs: Owning the Car

If public LLMs are the bus, then private LLMs are your own set of wheels. You choose the route, pick the passengers, and set the speed — no surprise detours (unless something doesn't build, a dependency doesn’t work), no mystery riders peeking into your data, and definitely no smelly data stinking up the back seat.

But car ownership isn't free. Running your own model — whether on-prem, in a private cloud, or on rented GPUs — means you're paying for the car and the garage. The garage is the capital expenditure (or OpEx if you're renting) of the infrastructure: racks of servers, high-end GPUs that can run into the tens of thousands each (or \$2–\$5/hour per A100/H100 on AWS if you're renting source), the network backbone, and the cooling and power to keep it all alive.

Then there's the gas: training your models isn't just costly in dollars — it burns time, patience, and talent. Fine-tuning or pretraining requires huge datasets, long-running compute jobs, and often, iteration after iteration before you get something production-worthy.

Owning the car also means you're both the driver and the mechanic. You handle the monitoring, patching, scaling, and security hardening. You plan the oil changes (retraining schedules) and replace the tires (deprecated libraries and outdated dependencies). You keep a constant eye on performance, guard against drift, and respond fast if something breaks down at 2 a.m.

And don't overlook the opportunity cost. Every hour our engineers spend babysitting infrastructure and wrangling GPUs is an hour they're not building new customer-facing features — the stuff that actually differentiates your brand and makes money.

If public LLMs sometimes feel like a noisy city bus at rush hour, private models are the quiet, climate-controlled road trip where you control the playlist. But that playlist comes with a service manual — and you're the one holding the wrench.

Choosing Your Ride

Whether you're boarding the bus or grabbing the keys, the choice between public and private LLMs boils down to trade-offs. Public models are fast, cheap to board, and require zero maintenance — but you share the ride, the baggage, and the consequences when the driver changes the route without warning. Private models give you control over every turn of the wheel, every passenger, and every mile per hour — but you pay for the privilege in cost, complexity, and upkeep.

Public vs. Private LLMs at a Glance

Public LLMs – The Bus
[✓] Fast boarding — API key and go
[✓] Maintenance-free — someone else handles scaling and patches
[✓] Great for quick, low-sensitivity projects
[!] Noisy neighbors can cause slowdowns
[!] Shared baggage — your data may mix into the communal pool
[!] Privacy leaks possible
[!] Routes change without notice (model swaps, price hikes)

Private LLMs – The Car
[✓] Full control over architecture, data, and updates
[✓] No smelly data from competitors
[✓] Fine-tuned for your exact needs
[✓] Predictable privacy boundaries
[!] You’re the driver and the mechanic
[!] Gas isn’t cheap — GPUs, training costs, MLOps overhead
[!] Longer on-ramp to production
[!] Feature velocity can slow if you’re stuck under the hood

In the end, it's about priorities:

Need speed, flexibility, and low commitment? Hop on the bus and enjoy the ride.
Need control, privacy, and consistency? Grab the keys and drive yourself.

Just remember: in both cases, you're not immune to road hazards. Public buses can get rerouted without notice, and private cars can break down if you skip maintenance.

You could be sharing a spot that's "smellin' like a locker room [with] junk all over the floor" or take more time than expected training your model.

The trick is knowing which trade-offs you can live with — and which ones will leave you stranded by the side of the road, humming Weird Al while you wait for a tow.

Building a RAG Chatbot with LlamaIndex and eBay API Integration

Drew Schillinger — Tue, 20 Aug 2024 21:52:12 +0000

RAG (Retrieval-Augmented Generation) is all the rage. And there's a good reason why. Like so many others, I instinctively felt an air of excitement at the beginning of the internet. The Browser Wars, Java vs Mocha. And then again in 2007 when the iPhone led a paradigm shift to how, where, and when we consume media. Just as I do now,

In the rapidly advancing field of AI, Retrieval-Augmented Generation (RAG) has become a crucial technique, enhancing the capabilities of large language models by integrating external knowledge sources. By leveraging RAG, you can build chatbots that generate responses informed by real-time data, ensuring both coherence and relevance. This guide will provide you with a step-by-step walkthrough of integrating the eBay API with LlamaIndex to develop your own RAG-powered chatbot.

Why RAG?

RAG enhances the capabilities of your chatbot by allowing it to access and retrieve information from external sources in real time. Instead of relying solely on pre-trained data, your chatbot can now query external APIs or databases to obtain the most relevant and up-to-date information. This ensures that the responses generated are not only accurate but also contextually relevant, reflecting the latest available data. It’s like moving from managing a static collection of DVDs or Blu-Rays to streaming on-demand content, where the latest information is always at your fingertips.

Step 1: Setting Up LlamaIndex

To kick off, you’ll need to set up LlamaIndex, a powerful tool that simplifies the integration of external data sources into your chatbot.

Installation: Start by running the following command in your terminal:

npx create-llama@latest

This command scaffolds out a Next.js project and walks you through the initial setup, including key concepts of RAG and LLMs like document scraping and multi-agent systems. It will provide sample pdfs to work off of. You'll want to remove this if you have an idea in mind, or expect to scrape the web, and want to host your app on a worker like Vercel or Cloudflare.

Configuration: Once the setup is complete, navigate to the llama.config.js file. Here, you’ll define the sources your chatbot will retrieve information from. For our purposes, we’ll be focusing on integrating the eBay API.

Step 2: Integrating the eBay API

Now, let's connect your chatbot to the vast repository of data available through the eBay API.

OAuth Authentication: eBay’s API requires OAuth for secure access. You’ll first need to generate an OAuth token. Here’s a quick function to handle this:

const eBayAuthToken = require("ebay-oauth-nodejs-client");

const ebayClientId = process.env.EBAY_API_KEY || "";
const ebayClientSecret = process.env.EBAY_CLIENT_SECRET || "";
const redirectUri = process.env.EBAY_REDIRECT_URI || ""; // Optional unless you're doing user consent flow

const authToken = new eBayAuthToken({
  clientId: ebayClientId,
  clientSecret: ebayClientSecret,
  redirectUri: redirectUri, // Optional unless you're doing user consent flow
});

let cachedToken: string | null = null;
let tokenExpiration: number | null = null;

export async function getOAuthToken(): Promise<string | null> {
  if (cachedToken && tokenExpiration && Date.now() < tokenExpiration) {
    return cachedToken;
  }

  try {
    const response = await authToken.getApplicationToken("PRODUCTION"); // or 'SANDBOX'
    let tokenData;

    if (typeof response === "string") {
      // Parse the response string into a JSON object
      tokenData = JSON.parse(response);
    } else {
      tokenData = response;
    }

    cachedToken = tokenData.access_token;
    tokenExpiration = Date.now() + tokenData.expires_in * 1000 - 60000; // Set expiration time

    return cachedToken;
  } catch (error) {
    console.error("Error obtaining OAuth token:", error);
    throw new Error("Failed to obtain OAuth token");
  }
}

Replace `YOUR_BASE64_ENCODED_CREDENTIALS` with your actual credentials.

Fetching Data: With your token in hand, you can now query the eBay API to fetch relevant data. Here’s how you can fetch the price of a specific item:

import axios from 'axios';

export async function fetchCardPrices(searchTerm: string) {
  const token = await getOAuthToken();
  if (!token) throw new Error("No OAuth token available");

  const response = await axios.get(`https://api.ebay.com/buy/browse/v1/item_summary/search?q=${encodeURIComponent(searchTerm)}`, {
    headers: {
      'Authorization': `Bearer ${token}`,
    },
  });

  return response.data.itemSummaries.map(item => ({
    title: item.title,
    price: item.price.value,
    currency: item.price.currency,
  }));
}

This function returns an array of items with their titles and prices, which your chatbot can use to provide users with up-to-date information.

Step 3: Querying and Responding

With LlamaIndex and eBay integrated, it’s time to build the logic that allows your chatbot to query these sources and generate informed responses.

Extracting Search Terms: Before querying the eBay API, you need to extract relevant search terms from the user's input. Here’s a helper function to do that:

function extractSearchTerm(query: string): string {
   // Simple keyword extraction logic
      return query.replace(/.*price of/i, '').trim();
}

Handling API Responses: Finally, you can tie everything together by creating a route in your Next.js app to handle incoming requests, query eBay, and return the results:

typescript

import {NextApiRequest, NextApiResponse} from 'next';
import {fetchCardPrices} from './utils/fetchCardPrices';

export default async (req: NextApiRequest, res: NextApiResponse) => {
    const query = req.query.q as string;
    const searchTerm = extractSearchTerm(query);
    const prices = await fetchCardPrices(searchTerm);
    res.status(200).json(prices);
};

Challenges and Solutions

As you build your RAG chatbot, you might encounter some common pitfalls. For instance, I use GPT-4o and Claude Sonnet-3.5 as coding interns. While setting up LlamaIndex in a Python app, I asked GPT to help me debug and the code snippets were outdated.

Conclusion

By following these steps, you’ve empowered your chatbot to fetch and utilize real-time data from eBay, enhancing its usefulness and relevance to your users. RAG is a powerful technique that unlocks a wide range of possibilities, and with the right tools and guidance, you can leverage it to create truly intelligent applications.

Deploying to Vercel: Challenges with Edge Functions

When it comes to deploying your application on Vercel, it's important to be aware of some limitations, particularly when using edge functions. Vercel's edge functions are designed for low-latency responses, but they do come with some constraints:

Unsupported Modules on Edge Functions: Certain Node.js modules, like sharp and onnxruntime-node, are not supported in Vercel Edge Functions due to the limitations of the edge runtime environment. If your application relies on these modules, you'll need to ensure they are only used in serverless functions or consider replacing them with alternative solutions that are compatible with the edge environment.
Module Not Found Errors: You might encounter "Module not found" errors during the build process, especially when modules are not correctly installed or are being used in the wrong environment. To resolve these issues, double-check that all necessary modules are installed and that they are configured to run in the appropriate environment (Edge vs. Serverless). It's crucial to separate the logic that requires these modules from the parts of your app that run on the edge.

While Vercel provides a powerful platform for deploying your applications with minimal overhead, being mindful of these challenges will save you from headaches during deployment and ensure your app runs smoothly in production.

OpenAI can detect ChatGPT-written content, but couldn't we all?

Drew Schillinger — Thu, 08 Aug 2024 16:34:18 +0000

OpenAI has built a text watermarking method to detect ChatGPT-written content — company has mulled its release over the past year

Ah, the mouse has become the cat, now able to catch people "sneaking a cheat" and detect when content is crafted by AI. (My rendition of how AI writes.)

Although GPT has made significant strides from 3.5 turbo to 4o, I thought it was obvious which content was written by a person and which was artificially generated. In fact, I've had to tone down my writing over the last few years.

I am in that 69% camp of fearing and being falsely accused of using AI to write for me. My emails have always leaned a bit on the pedantic, academic, and "fussy" side. Years back, I was beyond proud achieving a Flesch-Kincaid grade level of 18. (I was an English major and always been a logophile after all).

Commentary aside, I am very interested to watch this play out if it is true that GPT has been embedding thumbprint patterns in their responses. (and if that thumbprint looks like "ah, the old thumbprint in the code...").

On a side note, I have seen where some job posters surreptitiously embed a prompt for an AI bot to The debate surrounding this technology highlights the complex interplay between advancing AI capabilities and maintaining trust and transparency.

As OpenAI continues to refine its approach, it remains crucial for users and developers alike to stay informed and critical, ensuring that the integration of AI into our daily lives is both ethical and beneficial.
(Flesch-Kincaid grade level: 17.6)

Building a Modern Full-Stack MonoRepo Application: A Journey with GraphQL, NextJS, Bun, and AWS

Drew Schillinger — Sun, 26 Nov 2023 22:43:34 +0000

Welcome to my exploration of building a modern full-stack application using a monorepo approach. With over 20 years of experience in web development, including roles at NBA.com, Adult Swim, and Goodr, I've had the opportunity to delve deeply into these various technologies. I just never took the time to write them up. And since I worked for companies with closed-source code and NDAs, it made sense to build something from the ground up!

Ricks Associated with their Morties is the NextJS running on CloudFront by way of Amplify and relying on a GraphQL Apollo "server" running on a lambda @ the Edge, both in a monorepo that at some point used bun and then switched back to yarn:

This blog post is a journey through a project leveraging GraphQL, Apollo Server, AWS Amplify, Lambda @ Edge, monorepos, and Progressive Web Apps (PWAs), demonstrating practical applications and insights gained along the way.

Project Overview:

This project was conceived as a coding challenge and a demonstration of integrating modern web technologies into a cohesive application. The core of this project involves leveraging GraphQL to interact with the open-source Rick and Morty API, a task I often set for potential hires.

This task was not just a technical exercise but also an opportunity to showcase my expertise in federating data via GraphQL, developing PWAs with NextJS, and experimenting with Bun, especially in relation to Express, Docker, and Serverless technologies. And as I write this blog entry, I am expanding my GraphQL resolver to include the Morties from the PocketMorties game.

One of the unique aspects of this project was creating associations between Ricks and their corresponding Morties within the API—a relationship that doesn't exist in the original data sources. This challenge was an exciting way to demonstrate the power of GraphQL and my experience in crafting complex data relationships.

In addition to GraphQL, the project also focused on exploring the capabilities of Bun, particularly its speed in compilation and Docker compatibility. However, practical challenges led to a pivot back to Yarn, highlighting the importance of choosing the right tool for the job and balancing efficiency with stability.

Deep Dives into Key Technologies

1. Bun: Performance and Design Philosophy

Bun: Performance and Design Philosophy

Bun emerged as a compelling choice for this project due to its significant differentiation from traditional package managers like NPM, Pnpm, and Yarn. It's built using Zig, a language known for its performance and safety, allowing for rapid task execution. Key features include:

Speed: Bun's core feature is its speed in various operations, from installing packages to running scripts, significantly outpacing traditional package managers.
Concurrency Model: Unlike NPM and Yarn, Bun adopts a concurrency model similar to languages like Go, enabling efficient resource utilization and faster I/O-bound tasks.
Package Installation: Bun's approach to package installation, involving a global cache and concurrent downloading, contrasts with the more linear methods of NPM and Yarn.

Despite these advantages, practical challenges, such as the GraphQL integration issue, led me back to Yarn, emphasizing the need for stability in web development.

2. AWS Amplify: Simplifying Cloud Integration

#### The Role of AWS Amplify in the Project

AWS Amplify played a critical role in this project, streamlining the deployment and management of cloud services, and offering an integrated approach for both backend and frontend development. Amplify's suite of tools and services provided a cohesive platform that significantly simplified complex cloud operations.

Streamlined Workflow and Integration

Streamlined Workflow: Amplify's ability to abstract the complexities of cloud infrastructure provided a more user-friendly approach compared to managing custom Docker scripts. Its automated CI/CD pipelines facilitated continuous integration and delivery, ensuring a smooth deployment process.
Backend and Frontend Integration: Amplify's integration with backend AWS services and frontend optimizations, like server-side rendering and edge deployment, significantly enhanced the application's performance and user experience.

Cloud Storage, Delivery, and Security

Cloud Storage and Delivery: Amplify utilized AWS S3 for storing front-end assets, ensuring high durability and availability. The integration with Amazon CloudFront, AWS's CDN, allowed for efficient content delivery, reducing latency and improving load times globally.
SSL and Custom Domains: The provision of SSL encryption and support for custom domains enhanced the security and brand identity of the application, making it more professional and trustworthy.

Amplify's Developer-Friendly Nature

Amplify's developer-friendly interface, backed by extensive community support, made it a superior choice over custom Docker builds. This approach not only optimized the development process but also aligned with best practices for cloud-based applications.

Insights Gained from CloudFormation Designer Template

The project's use of AWS CloudFormation, visualized in the CloudFormation Designer template, represents the infrastructure as code (IaC) aspect of our deployment. This template is a visual representation of the serverless architecture, encompassing various AWS services and configurations:

S3 Bucket Configuration: It includes settings for an S3 bucket, essential for storing deployment packages.
Lambda Functions and IAM Roles: The template details Lambda functions and their associated IAM roles, outlining the permissions and policies necessary for secure and efficient function execution.
API Gateway: It illustrates the setup of the API Gateway, which acts as the entry point for the application's backend, handling requests and routing them to the appropriate Lambda functions.
Logging and Monitoring: The inclusion of log groups and CloudWatch roles highlights the focus on monitoring and logging, crucial for maintaining application health and performance.

The CloudFormation Designer template is a testament to the robustness and scalability of the AWS infrastructure utilized in the project. It showcases the intricate setup of serverless components, emphasizing the project's commitment to leveraging AWS services for optimal performance and security.

3. GraphQL: Enhancing API Interactions

GraphQL: Enhancing API Interactions

The heart of this application lies in its use of GraphQL. The custom rickAndMortyAssociations function within the GraphQL schema highlights the ability to create new relationships in existing APIs. This implementation demonstrates GraphQL's power in crafting flexible and efficient data queries, offering significant enhancements over traditional REST APIs.

Case Study - The GraphQL Implementation

Crafting the GraphQL Queries

In this project, the GraphQL queries were not just a tool for data retrieval; they were the linchpin for transforming and enriching data from diverse sources. The schema and resolvers were intricately designed to not only fetch data from the Rick and Morty API but to also incorporate data from the Pocket Morties game, demonstrating GraphQL's ability to federate disparate data sources.

A Refresher On How GraphQL Works

At its core, GraphQL is more than just a query language; it's a powerful tool for API design and data federation. Unlike REST APIs, which require multiple requests to fetch different types of data, GraphQL allows for fetching all necessary data in a single request. This capability is particularly beneficial in situations like this project, where data from fundamentally different sources needs to be federated and presented in a unified format. This approach significantly improves performance, especially on slow mobile network connections, by reducing the number of required network requests.

`rickAndMortyAssociations` Functionality

The rickAndMortyAssociations function in the GraphQL schema was a custom implementation addressing the absence of direct associations between Ricks and Morties in the original API. This function showcases GraphQL's flexibility in data manipulation and presentation:

Schema Definition: The schema defines the rickAndMortyAssociations type, specifying the structure and types of the data that can be queried.
Resolvers: The resolver logic processes these queries, combining data from the Rick and Morty API with the Pocket Morties game. This involves fetching relevant data and then applying custom logic to create meaningful associations between Ricks and their corresponding Morties.
Query Execution: When a query for rickAndMortyAssociations is made, the GraphQL server executes the resolver, returning a combined set of data that enhances the original API's capabilities with additional context and relationships.

The implementation of rickAndMortyAssociations function is a prime example of GraphQL's power in federating and enriching data from multiple sources, creating a more comprehensive and nuanced data set that goes beyond the limitations of traditional APIs.

Navigating IaC Challenges: Terraform to Lambda Shift

Challenges with Terraform in AWS Fargate and ElastiCache Setup

The initial approach to infrastructure involved using Terraform for configuring AWS Fargate and ElastiCache. While I've been using Terraform since its initial alpha days, I still encounter specific challenges as AWS and HashiCorp add new features. In this case:

Health Checks in ECS and ELB: Configuring health checks within Elastic Container Service (ECS) and Elastic Load Balancer (ELB) proved complex, crucial for ensuring service reliability.
Service Discovery Issues: Ensuring effective service discovery of ElastiCache within the same Virtual Private Cloud (VPC) as Fargate was intricate and required precise configuration.

Pivot to AWS Lambda

Given these challenges, a strategic pivot was made to AWS Lambda for its simplicity and speed, because at this time, having something to show interviewers and perspective employers is more important than the technology I leverage.

This shift was influenced by:

Simplicity and Efficiency: Lambda's streamlined approach was better suited for our project's time constraints and requirements.
Statelessness and Resource Optimization: The stateless nature of Lambda aligned with our needs, offering cost-effectiveness and efficient resource utilization.
Project Progression Focus: The pivot to Lambda was a pragmatic decision to keep the project moving forward, balancing the complexities of Terraform with the functionalities required.

Future Plans with Terraform

The journey with Terraform will be revisited in future updates, providing an opportunity to explore and document overcoming its initial challenges. This effort underscores a commitment to mastering complex IaC solutions and sharing these experiences.

Server vs. Server-Lambda: Dockerfile, Redis, and Lambda Enhancements

Distinct Server Configurations: Local and Lambda

In this project, I utilized two distinct server files to cater to different environments: server.ts for local development and server-lambda.ts for AWS Lambda deployment.

server.ts for Local Development: This file configures a traditional server setup, optimized for local development and testing. It provides a streamlined and efficient process, free from the complexities of a serverless environment.
server-lambda.ts for AWS Lambda: Tailored for deployment in a serverless architecture, this file includes specific adjustments and integrations, like Redis for efficient data caching, ensuring optimal performance and scalability in AWS Lambda.
Dockerfile Solution: A Dockerfile was used to containerize the application, ensuring consistent environments and simplifying deployment. This approach aids in managing dependencies and environmental configurations across development and production stages.

The dual server file strategy reflects the project's adaptability, ensuring each environment's unique demands are met efficiently. This approach demonstrates the importance of tailoring the architecture to suit different deployment scenarios.

Conclusion and Reflections

Embracing Challenges and Learning

This journey through building a modern full-stack monorepo application has been as enlightening as it has been challenging. It reaffirmed my passion for digital architecture and the pursuit of innovative solutions in the web development realm.

Key takeaways from this project include:

Adaptability in Technology Choices: The need to pivot from Bun to Yarn and from Terraform to AWS Lambda highlighted the importance of flexibility in technology choices. It showed that while cutting-edge tools can offer significant advantages, sometimes established technologies provide the necessary stability and reliability.
Enhancing API Capabilities with GraphQL: The use of GraphQL to enrich the Rick and Morty API showcased the power of this query language in creating efficient, flexible data interactions, going beyond the limitations of traditional REST APIs.
Balancing Innovation with Practicality: The project underlined the balance between embracing new technologies and ensuring practical, stable solutions, especially in a professional setting where reliability and maintainability are paramount.
Infrastructure as Code (IaC) Learning Curve: The challenges faced with Terraform, and the subsequent switch to AWS Lambda, provided valuable insights into cloud infrastructure management, emphasizing the need for continuous learning and adaptability.
Server Configuration for Different Environments: The use of separate server configurations for local development and AWS Lambda deployment highlighted the importance of environment-specific optimizations in software development.

Looking Ahead

As I continue to explore and master various technologies, I plan to revisit some of the initial challenges, like those encountered with Terraform, and document these experiences. This ongoing journey not only contributes to my professional growth but also serves as a resource for others navigating similar paths.

In summary, this project was a testament to the dynamic nature of web development and cloud infrastructure, where continuous learning, adaptability, and a pragmatic approach are key to success.

Building a Fargate API Server with Go, Gin, Docker, and AWS Copilot

Drew Schillinger — Wed, 25 Oct 2023 16:16:42 +0000

Hello fellow tech enthusiasts!

As a self-taught engineer with over two decades of experience, starting back in the days the browser wars, my journey has been driven by curiosity, innovation, and a passion for understanding the why and how technology does its "magic".

I chose to use Golang over Python not just because it's fun, but because it presented an opportunity to demystify its "magic" and deepen my understanding (even though I've been using it since it was in beta). This became especially relevant during a conversation with a respected leader who posed a seemingly simple question about Go's struct vs. interface. As a self-taught engineer who's gone toe-to-toe with imposter syndrome, vocabulary questions like this give me sweaty palms. But I realized that this was a great opportunity to break down these concepts and really understand them at a deeper level.

(PS A struct is a composite data type that groups together variables under a single name, while an interface is a collection of method signatures that a type must implement. Understanding the difference between these two concepts is crucial for writing clean and efficient Go code.)

Langchain was another exciting venture, and I am thrilled that Harrison Chase figured this one out because chaining GPT-2 and GPT-3 calls in Python in 2021 was a pain the keister!

And then there's AWS Fargate, a game-changer in serverless container platforms. Having used Fargate at NBA Digital and NBATV to serve content to 50 million concurrent users at scale nightly, its efficiency and scalability are undeniable.

Through all these experiences, my hope is to inspire fellow self-taught engineers, innovators, and anyone with a thirst for knowledge. Let's explore, learn, and innovate together!

In this blog post, we'll walk through the process of building an API server using Go and the Gin framework, containerizing it with Docker, and deploying it to AWS Fargate using AWS Copilot. We'll also delve into the importance of environment management and testing. Let's get started.

Please refer to the code in this git repo: https://github.com/doctor-ew/go_skippy_lc

1. Docker: Containerizing the Go Application

Docker allows us to package our application and its dependencies into a container, ensuring consistent behavior across different environments. Let's break down the Dockerfile from the go_skippy_lc repository:

1.1. DockerfileCopy code

FROM golang:1.17-alpine

Explanation: This line specifies the base image for our Docker container. In this case, you're using the official Go image (golang) with version 1.17 based on the lightweight Alpine Linux distribution (alpine). This image will have the Go runtime and tools pre-installed.

1.2. DockerfileCopy code

WORKDIR /app

Explanation: This line sets the working directory inside the container to /app. All subsequent commands in the Dockerfile (like COPY or RUN) will be executed in this directory. Essentially, this directory will be the root for our application inside the container.

1.3. DockerfileCopy code

COPY go.mod . COPY go.sum .

Explanation: These lines copy the go.mod and go.sum files from our local machine (outside the container) to the current directory inside the container (/app). These files are essential for Go's module system, ensuring that the correct dependencies are used when building our application.

1.4. DockerfileCopy code

RUN go mod download

Explanation: This line runs the go mod download command inside the container. This command fetches all the dependencies listed in go.mod and go.sum, ensuring they're available in the container for the build process.

1.5. DockerfileCopy code

COPY . .

Explanation: This line copies everything from our current directory on our local machine (i.e., the root of our Go project) to the current directory inside the container (/app). This ensures that all our application's source code and other necessary files are available inside the container.

1.6. DockerfileCopy code

RUN go build -o ./out/myapi .

Explanation: This line runs the go build command inside the container to compile our Go application. The -o ./out/myapi flag specifies the output directory and name for the compiled binary. In this case, the binary will be named myapi and will be located in the /app/out/ directory inside the container.

1.7. DockerfileCopy code

CMD ["./out/myapi"]

Explanation: This line specifies the command that will be executed when the container starts. In this case, it's running the compiled Go application (./out/myapi). The CMD instruction allows the container to behave like an executable, meaning when you run the container, it will automatically start our Go application.

2. Setting Up the Gin Server, Environment Management, and .gitignore

Before diving into the code, it's essential to understand the importance of environment management and the role of .gitignore in safeguarding sensitive information.

2.1. Environment Files and `.gitignore`

Before we delve into the code, it's essential to mention the importance of environment files and .gitignore. Storing sensitive information, like API keys, in environment variables is a best practice. This ensures that these keys are not hard-coded into the application, reducing the risk of accidental exposure. The .gitignore file ensures that certain files, like the .env containing these keys, are not committed to version control, further safeguarding them.

2.2. Imports

Here's a brief overview of each import:

Standard Library Imports:
- context: Provides a way to carry deadlines, cancellations, and other request-scoped values across API boundaries.
- fmt: Implements formatted I/O functions.
- log: Provides logging capabilities.
- net/http: Provides HTTP client and server implementations.
- os: Provides a platform-independent interface to operating system functionality.
- os/signal: Provides a way to intercept and act upon signals sent to the application.
- syscall: Contains an interface to the low-level operating system primitives.
- time: Provides functionality for measuring and displaying time.
Third-party Imports:
- github.com/gin-gonic/gin: Gin is a web framework for building APIs in Go. It's known for its performance and small memory footprint.
- github.com/joho/godotenv: A package to load environment variables from a .env file.
- github.com/tmc/langchaingo/llms: A library related to the language chain (from the context, it seems to be related to interfacing with OpenAI).
- github.com/tmc/langchaingo/llms/openai: Specific OpenAI related functionalities for the language chain.
- github.com/tmc/langchaingo/schema: Defines the schema or structure for the messages and responses with OpenAI.

2.3. Code Structure

a. Interface vs. Struct

Interface:

goCopy code

type Chat interface { Call(ctx context.Context, messages []schema.ChatMessage, options ...llms.CallOption) (*schema.AIChatMessage, error) }

Explanation: An interface Chat is defined, which any type must satisfy if it has a Call method with the specified signature. This allows for flexibility and can be used to mock the OpenAI chat for testing or to use different implementations.
Struct:

goCopy code

type RequestBody struct { Message string `json:"message"` }

Explanation: This struct RequestBody defines the structure of the request body that the /ask-skippy endpoint expects. The json:"message" tag indicates that when this struct is unmarshaled from a JSON object, the Message field corresponds to the message key in the JSON.

2.4. Functions

a. `srv` Function

This function sets up and starts the Gin server. It defines two endpoints: /ask-skippy for POST requests and / for GET requests. It also sets up graceful shutdown for the server when it receives an interrupt or termination signal.

b. `askSkippy` Function

This function interfaces with the OpenAI chat (or any other implementation that satisfies the Chat interface). It sends a system message to set the context for the AI and then sends the user's message. It then waits for a response from the AI and returns it.

c. `main` Function

This is the entry point of the application. It loads the environment variables from the .env file, retrieves the OpenAI API key, initializes the OpenAI chat, and then starts the server.

This breakdown provides a high-level overview of the code's structure and functionality. Each section and function plays a crucial role in setting up the server, interfacing with OpenAI, and serving responses to the user.

3. Testing: Ensuring Our Application Works as Expected

Testing is a crucial aspect of software development. It ensures that our application behaves as expected and helps catch issues early in the development process. Let's dive into the testing code for the askSkippy function:

3.1. Imports

Standard Library Imports:
- context: As before, this provides a way to carry request-scoped values.
- errors: Provides functions to manipulate errors.
- testing: The standard Go testing package.
Third-party Imports:
- github.com/stretchr/testify/mock: The testify library's mocking package. It provides utilities to easily mock interfaces for testing.
- github.com/tmc/langchaingo/schema: Defines the schema or structure for the messages and responses with OpenAI.

3.2. MockChat Struct

goCopy code

type MockChat struct { mock.Mock }

Explanation: The MockChat struct embeds the mock.Mock type from the testify library. This allows MockChat to have all the methods and functionalities provided by mock.Mock, enabling easy setup of expected method calls and their return values.

3.3. Test Functions

a. `TestAskSkippy`

This test function checks the "happy path" scenario where everything works as expected.

Mock Setup:

goCopy code

mockChat.On("Call", mock.Anything, mock.Anything, mock.Anything).Return(&schema.AIChatMessage{Content: "Mocked response"}, nil)

Here, we're setting up the expectation that the Call method of mockChat will be invoked with any arguments (mock.Anything is a placeholder that matches any value). When invoked, it should return a mocked AI chat message with the content "Mocked response" and no error.
Function Call:

goCopy code

response, err := askSkippy(context.Background(), mockChat, "English", "Test message for Skippy")

We then call the askSkippy function with our mock chat and check the response.
Assertions: The test checks two things:

1. That there's no error.


That the response matches the expected "Mocked response".

b. `TestAskSkippy_Error`

This test function checks the scenario where the Call method returns an error.

Mock Setup:

goCopy code

mockChat.On("Call", mock.Anything, mock.Anything, mock.Anything).Return(nil, errors.New("Mocked error"))

Here, we're setting up the expectation that the Call method of mockChat will be invoked and will return an error "Mocked error".
Function Call:

goCopy code

_, err := askSkippy(context.Background(), mockChat, "English", "Test error message for Skippy")

We then call the askSkippy function with our mock chat and check for an error.
Assertion: The test checks that an error is returned.

Potential Issues and Solutions:

MockChat Implementation: The MockChat struct is defined, but its methods aren't. For the tests to work, the MockChat needs to have a Call method that uses the testify mock's functionalities. This method should look something like:

goCopy code

func (m *MockChat) Call(ctx context.Context, messages []schema.ChatMessage, options ...llms.CallOption) (*schema.AIChatMessage, error) { args := m.Called(ctx, messages, options) return args.Get(0).(*schema.AIChatMessage), args.Error(1) }
Dependencies: Ensure that the testify library is installed. If not, it can be added using:

bashCopy code

go get github.com/stretchr/testify
Integration with Main Code: Ensure that the Chat interface in the main code and the MockChat in the test code are in sync. If the interface changes, the mock and tests need to be updated accordingly.

This breakdown provides an overview of the testing code's structure and functionality. The tests are designed to validate the behavior of the askSkippy function in both normal and error scenarios.

4. AWS Fargate and Copilot: Deploying and Managing the Service

AWS Fargate and Copilot provide a seamless experience for deploying and managing containerized applications on AWS without the need to manage the underlying infrastructure. With AWS Copilot, you can define, release, and manage services using simple CLI commands. Let's dive into the Copilot configuration files from the go_skippy_lc/copilot directory to understand how the service and environment are set up:

4.1. Service Configuration: `copilot/go-skippy-lc/manifest.yml`

Overview:

This file defines the configuration for the go-skippy-lc service. It's a Load Balanced Web Service, which means it's a public-facing web service that's behind a load balancer.

Breakdown:

name & type:
- name: go-skippy-lc specifies the name of the service.
- type: Load Balanced Web Service indicates the type of service.
http:
- path: '/' specifies the path that the load balancer should forward requests to.
- alias: This is a list of domain names that should route to this service. This is useful for custom domain routing.
- healthcheck: (commented out) would specify a custom health check path for the service.
image:
- build: Dockerfile specifies the Dockerfile to use for building the container image.
- port: 80 is the port the container listens on.
cpu, memory, count:
- These fields specify the resources for the ECS task. It uses 256 CPU units, 512 MiB of memory, and there should always be 1 task running.
exec:
- exec: true allows you to run commands in our container.
network:
- connect: true enables Service Connect for intra-environment traffic between services.
storage:
- readonly_fs: (commented out) would limit the mounted root filesystems to read-only access.
variables & secrets:
- (Both commented out) would allow us to pass environment variables and secrets to our service.
environments:
- (Commented out) would allow us to override any of the above values for specific environments.

4.2. Environment Configuration: `copilot/environments/test/manifest.yml`

Overview:

This file defines the configuration for the test environment.

Breakdown:

name & type:
- name: test specifies the name of the environment.
- type: Environment indicates the type of configuration.
network:
- (Commented out) would allow us to specify a custom VPC or configure how the VPC should be created.
http:
- public: Specifies the configuration for the public load balancer in the environment.
- certificates: Lists the ARN of the SSL certificate to use with the load balancer.
observability:
- container_insights: false specifies that container insights (for monitoring) should not be enabled for this environment.

Manual Step:

I had to manually point the A Record to the ALB. This step is necessary because while AWS Copilot can automate the creation of resources like the ALB, the final step of updating DNS records to point to the ALB often requires manual intervention, especially if you're using a third-party DNS provider or have specific routing requirements.

Conclusion

As an architect or engineer, we need to think through the atoms that compose the thing we're building. In this case, a robust and scalable API server encompasses everything from server setup and environment management to containerization and rigorous testing. I've shared this guide to provide a solid foundation for deploying a Go-based API server to AWS Fargate.

My hope is that this not only serves as a practical guide but also inspires fellow self-taught engineers, innovators, and all curious minds. Let's continue to explore, learn, and push the boundaries of innovation together.

Excelsior!

Is it safe to cross the streams (er, cloud providers)?

Drew Schillinger — Tue, 14 Dec 2021 19:05:52 +0000

Hello, fellow technophiles!

I'm tasked with rearchitecting an existing app for a start-up. There are 3 top priorities: speed to rebuild a crippled tech stack and created needed functionality, stability to said crippled stack as we are currently wooing investors and clients, and security: build the new stack right (and right now).

The former team mismanaged a DotNet stack on Azure (like, think diaper fire mishmash of misused VMs and outdated sdks/packages), and I have a ton of confidence and experience in AWS space.

If I were to say "let's take a 6-week hiautus), I and the other dev could rebuild what they have now with a pgsql, documentdb, and some lambdas in node or python or go. But the reality is we will be using the current dotnet stack until we've moved over to a new one (and likely using both as new features will be built in new Thing-X).

All that said, what is y'alls experience in coming in new and rebuilding better?

My concern in holding steady on Azure and building new on AWS mirror's Egon's warning about crossing the streams: "Try to image all life as you know it stopping instantaneously and every molecule in your body exploding at the speed of light."

Thank you for any advice, or for at least listening as I get my thoughts out on e-paper.

Excelsior!

DoctorEw

Forem: Drew Schillinger

Public vs. Private LLMs: Another One Rides the Bus

Private LLMs: Owning the Car

Choosing Your Ride

Public vs. Private LLMs at a Glance

Building a RAG Chatbot with LlamaIndex and eBay API Integration

Why RAG?

Step 1: Setting Up LlamaIndex

Step 2: Integrating the eBay API

Step 3: Querying and Responding

Challenges and Solutions

Conclusion

Deploying to Vercel: Challenges with Edge Functions

OpenAI can detect ChatGPT-written content, but couldn't we all?

Building a Modern Full-Stack MonoRepo Application: A Journey with GraphQL, NextJS, Bun, and AWS

Project Overview:

Deep Dives into Key Technologies

1. Bun: Performance and Design Philosophy

Bun: Performance and Design Philosophy

2. AWS Amplify: Simplifying Cloud Integration

#### The Role of AWS Amplify in the Project

Streamlined Workflow and Integration

Cloud Storage, Delivery, and Security

Amplify's Developer-Friendly Nature

Insights Gained from CloudFormation Designer Template

3. GraphQL: Enhancing API Interactions

GraphQL: Enhancing API Interactions

Case Study - The GraphQL Implementation

Crafting the GraphQL Queries

A Refresher On How GraphQL Works

rickAndMortyAssociations Functionality

Navigating IaC Challenges: Terraform to Lambda Shift

Challenges with Terraform in AWS Fargate and ElastiCache Setup

Pivot to AWS Lambda

Future Plans with Terraform

Server vs. Server-Lambda: Dockerfile, Redis, and Lambda Enhancements

Distinct Server Configurations: Local and Lambda

Conclusion and Reflections

Embracing Challenges and Learning

Key takeaways from this project include:

Looking Ahead

Building a Fargate API Server with Go, Gin, Docker, and AWS Copilot

1. Docker: Containerizing the Go Application

1.1. DockerfileCopy code

1.2. DockerfileCopy code

1.3. DockerfileCopy code

1.4. DockerfileCopy code

1.5. DockerfileCopy code

1.6. DockerfileCopy code

1.7. DockerfileCopy code

2. Setting Up the Gin Server, Environment Management, and .gitignore

2.1. Environment Files and .gitignore

2.2. Imports

2.3. Code Structure

a. Interface vs. Struct

2.4. Functions

a. srv Function

b. askSkippy Function

c. main Function

3. Testing: Ensuring Our Application Works as Expected

3.1. Imports

3.2. MockChat Struct

3.3. Test Functions

a. TestAskSkippy

b. TestAskSkippy_Error

Potential Issues and Solutions:

4. AWS Fargate and Copilot: Deploying and Managing the Service

4.1. Service Configuration: copilot/go-skippy-lc/manifest.yml

Overview:

Breakdown:

4.2. Environment Configuration: copilot/environments/test/manifest.yml

Overview:

Breakdown:

Manual Step:

Conclusion

Is it safe to cross the streams (er, cloud providers)?

`rickAndMortyAssociations` Functionality

2.1. Environment Files and `.gitignore`

a. `srv` Function

b. `askSkippy` Function

c. `main` Function

a. `TestAskSkippy`

b. `TestAskSkippy_Error`

4.1. Service Configuration: `copilot/go-skippy-lc/manifest.yml`

4.2. Environment Configuration: `copilot/environments/test/manifest.yml`