Forem: Tomas Fernandez

What is MCP Really?

Tomas Fernandez — Mon, 03 Nov 2025 13:02:00 +0000

Photo by Pavel Danilyuk: https://www.pexels.com/photo/a-robot-holding-a-wine-8439094/

The Model Context Protocol (MCP) has been described as “the USB-C for AI”. It’s a fitting analogy, but what does it really mean? What is MCP?

Large language models (LLMs) are incredibly capable, but they only know what they know. Once trained, an LLM can’t access real-time information or specialized systems until it connects to external tools.

MCP provides a communication protocol that lets models like GPT or Claude interact with any compatible tool or service. Instead of relying on proprietary APIs or one-off integrations, MCP introduces a shared language for interaction between AIs (as clients) and software (as servers).

How the MCP Works

At its core, MCP is a simple client–server model. The large language model acts as the client, while a server provides one or more tools the AI can use. Communication between the two happens through JSON-RPC.

During initialization, the AI and server negotiate capabilities. Then the AI sends a client a tools/list request.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/list",
  "params": {
    "cursor": "optional-cursor-value"
  }
}

And the server responds with a manifest of available tools:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather information for a location",
        "inputSchema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name or zip code"
            }
          },
          "required": ["location"]
        }
      }
    ],
    "nextCursor": "next-page-cursor"
  }
}

The AI now knows what tools or functions are available and picks one for a user request. So, in our example, the AI would call the get_weather tool.

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/call",
  "params": {
    "name": "get_weather",
    "arguments": {
      "location": "Lisbon"
    }
  }
}

And the MCP server responds with structured output, in this case, how's the weather in Lisbon:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "Current weather in Lisbon:\nTemperature: 32°C\nConditions: Partly cloudy"
      }
    ],
    "isError": false
  }
}

Local vs Remote MCP Servers

The easiest way to implement the MCP server is to run it on the same host as the client. For example, if I'm using OpenAI Codex or Claude Desktop, the AI can spawn an MCP server locally and communicate over standard input and output (stdio).

For more complex setups, the MCP allows communication over HTTP and provides mechanisms for authentication and authorization. These servers can require credentials, API keys, or tokens, depending on how sensitive their capabilities are.

The State of the Standard

MCP is still an emerging standard. It was introduced in 2024 as an open specification. MCP is being developed collaboratively by several players in the AI ecosystem.

The initial specification was published at modelcontextprotocol.info, and the work is happening in the open with input from AI companies, open-source developers, and infrastructure providers.

Conclusion

MCP represents a quiet but fundamental shift in how AI systems interact with the world.

It offers a shared, open standard; a common language that any model and any tool can use to talk to each other.

For developers, this means fewer one-off connectors and more reusable, interoperable systems. For users, it means AI assistants that can reach beyond their training data and tap into live information, files, or applications with precision and context.

Thanks for reading, and happy building!

Lightweight Containers With Docker and WebAssembly

Tomas Fernandez — Mon, 18 Dec 2023 14:14:22 +0000

🐳Did you know you can now build and run WebAssembly with Docker Desktop? That's the only tool you need to run the same binary file at native speed on any platform: Linux, Windows, Mac or any browser.

WebAssembly, or Wasm, is increasingly relevant in software development. It's a portable binary code format designed for efficient and fast execution on any platform, including web browsers.

Watch the hands-on tutorial:

WebAssembly is so essential for the web that Solomon Hykes, the founder of Docker, announced that if Wasm and WASI were available in 2008, Docker might not have been developed.

Despite its advantages, WebAssembly has yet to reach the same level of adoption as Docker. Part of the challenge lies in the complexity of the tooling for Wasm, particularly in building, running, and debugging across different languages. For example, creating a Wasm binary involves installing a language-specific compiler toolchain.

Docker offers a solution here, providing a reproducible and isolated build environment. Let's see how it works.

Docker and WebAssembly

In July 2023, Docker introduced experimental support for WebAssembly, adding a new dimension to running Wasm applications. This integration brings several advantages:

Simplified Process: Using Docker for building and running Wasm applications reduces the learning curve by minimizing the required tools.
Enhanced Portability: Wasm containers don't require different builds for different machine architectures, simplifying deployment.
Consistent Builds: Docker's isolated environment ensures consistent builds across various platforms.
Integration with Existing Tools: Docker's compatibility with Docker Compose and Kubernetes facilitates complex deployments and scaling.

Building a Wasm container

Let's start by creating a new project:



$ cargo new docker-wasm-demo-rust

The default project is a "Hello, World" application printing to console, perfect as a first step.

To build and run it, we use cargo:



$ cargo build --release
$ cargo run
Hello World!

But this is a native binary, not WebAssembly. To compile a Wasm target, let's install the WebAssembly toolchain. This command requires that you have installed rustup:



$ rustup target add wasm32-wasi

Next, we can build a Wasm target with:



$ cargo build --target wasm32-wasi --release

We can't run this directly from the command line unless we install some runtime like wasmtime:



$ was time target/wasm32-wasi/release/docker-wasm-demo-rust.wasm
Hello World!

This works, but there's a lot of tooling involved. We need the Rust compilers, the Wasm toolchain, and some runtime to test it. We can streamline all this with Docker.

Running Wasm in Docker

Before using Docker with WebAssembly, we need to ensure we have the latest version of Docker Desktop installed and then enable containerd and Wasm support in the options:

Then we need to create a simple Dockerfile that copies the wasm inside the container:



# Dockerfile
FROM scratch
COPY target/wasm32-wasi/release/docker-wasm-demo-rust.wasm /hello.wasm
ENTRYPOINT [ "/hello.wasm" ]

We need to build the image, but we need to switch to the wasi/wasm platform for this case.



docker build --platform wasi/wasm -t hello-wasm .

If we check the Docker Desktop image tab, we should see the new image with the WASM badge. This means the image is actually WebAssembly instead of a regular container:

To run the image, we use the familiar docker run with two extra flags:

--runtime=io.containerd.wasmedge.v1 one of the possible runtimes supported by Docker Desktop
--platform=wasi/wasm to tell Docker we want to run the Wasm image. Otherwise, Docker will fail to find the image.



docker run --runtime=io.containerd.wasmedge.v1 --platform=wasi/wasm hello-wasm

Building Wasm with Docker

We can step the process further by using Docker to build the image. This allows us to run the build in a clean, shareable environment, making it easier to run the build stage in CI/CD.

The following 2-stage Dockerfile builds and creates the Wasm image:

The first stage uses the Rust base image to build the Wasm inside the container
The second stage copies the Wasm binary from the first stage and creates the Wasm image.



# Dockerfile
FROM --platform=$BUILDPLATFORM rust:1.74 AS build
RUN rustup target add wasm32-wasi
RUN mkdir -p /build
WORKDIR /build
COPY Cargo.toml .
COPY src ./src
RUN cargo build --target wasm32-wasi --release
RUN chmod a+x /build/target/wasm32-wasi/release/docker-wasm-demo-rust.wasm

FROM scratch
COPY --link --from=build /build/target/wasm32-wasi/release/docker-wasm-demo-rust.wasm /hello.wasm
ENTRYPOINT [ "/hello.wasm" ]

Let's build the image again:



$ docker build --platform wasi/wasm -t hello-wasm .

And then run it:



$ docker run --runtime=io.containerd.wasmedge.v1 --platform=wasi/wasm hello-wasm

As you can see, this new Docker feature lets us work with Wasm without needing anything other than Docker Desktop on our machines. Neat!

Conclusion

WebAssembly and Docker do make an interesting combination. At the end of the day, having more options for developers, it's always good news. Be sure to check the YouTube tutorial video above for a more in-depth explanation.

Thanks for reading, and happy building!

State of DevOps 2023 Report: Key Findings and Insights

Tomas Fernandez — Thu, 14 Dec 2023 17:39:36 +0000

The much-anticipated State of DevOps 2023 report is finally out. Developer by the DORA team at Google, this annual report is a synthesis of trends and insights collected from professionals and organizations working in the realm of DevOps.

Key takeaways:

The ramifications of AI on productivity
What factors contribute to job satisfaction and burnout.
What matters when choosing a cloud.
The non-linear nature of investing in reliability and SRE practices.

I've published a video covering all the key insights. Scroll down to see a summary.

Here’s a summarized look at some of the major takeaways. You can download the report for free from Google:

📕 2023 State of DevOps Report

Key findings

From the outset, the report presented several new areas of focus. On page 4, the spotlight was on:

The pivotal role documentation plays in the ecosystem.
How underrepresented groups are at a higher risk of burnout.
The growing emphasis on prioritizing user feedback and needs.
The role of the cloud in boosting productivity and job satisfaction.

Page 6 of the report outlines key organizational outcomes like enhancing productivity and job satisfaction while curbing burnout.

Prioritizing Users

One of the standout findings is how tuning into users during the developmental stages can amplify performance metrics. Wether they are customers, developers, analysts or testers, when we focus on the needs of our users there is a substantial increase in performance and reliability across the board.

Taking a user-centric approach reaps multifold benefits. It not only fosters an environment conducive to enhanced performance, CI/CD, job satisfaction, and productivity, but also plays a role in mitigating burnout.

The technical capabilities that matter

New technologies call for a refreshed set of metrics. The results show that the impact of AI on team performance was less pronounced than expected. Instead, more foundational capabilities like CI/CD, robust documentation, trunk-based development, and rapid code reviews emerged as stronger pillars of performance.

The power of flexibility

The flexibility of infrastructure directly correlates with performance, with most organizations turning to the cloud for this adaptability. However, the type of cloud chosen holds significance:

Private clouds improve operational and team performance.
Public clouds enhance organizational and team performance at the expense of operational and software delivery effectiveness. This dip in performance may be due teams shifting to the cloud and encountering unfamiliar tools.

Documentation: an amplifier of technical capabilities

State of DevOps 2023 underscored the immense value of quality documentation. Beyond being foundational, it magnifies other technical capabilities, with the potential to boost performance by nearly 13x when paired with trunk-based development.

In addition, well-maintained documentation also positively influences the human side of development, elevating job satisfaction and productivity and reducing burnout tendencies.

Cultivating a Generative Culture

The idea of a generative culture has been a recurring theme in the State of DevOps reports. Rooted in the studies of American sociologist Ron Westrum, a generative culture paves the way for innovation, higher productivity, and a more fulfilling work environment.

The data confirms that a generative culture is a robust indicator of high performance.

Wrapping Up

I have only scratched the surface on the insights that State of DevOps 2023 reveals. I highly recommend downloading the report and checking it our by yourself.

Thanks for reading!

Function Calling: Integrate Your GPT Chatbot With Anything

Tomas Fernandez — Wed, 09 Aug 2023 14:02:14 +0000

Yes we do!

Imagine creating an AI assistant to which you can say something like, "Book me the earliest reservation for the nearest Thai restaurant and update my calendar." Language models continue to push boundaries and evolve. OpenAI, the company behind ChatGPT, recently introduced a powerful new feature called function calling in their GPT models. Function calling simplifies the creation of chatbots that communicate with external tools and APIs, opening up a new realm of possibilities for AI-powered applications.

In this article, we will delve into the concept of function calling, its implications, and its transformative impact on how we interact with AI systems by creating NewsGPT, a chatbot that brings you breaking news worldwide.

Example run of NewsGPT. A command line interface shows the user asking, "What's the latest news in the US?" and receiving a list of five articles with links to the journals or sources.

What is Function Calling?

Function calling is a new feature in OpenAI's GPT-4-0613 and GPT-3.5 Turbo-0613 models. These AI models are trained to detect the need for function calling based on the user's prompt and respond with a structured call request instead of regular text.

Function calling allows chatbots to interact with other systems, enabling the GPT models to respond to questions they otherwise could not, such as those requiring real-time information or data not included in their training set. In other words, function calling provides another way to teach AI models how to interact with the external world.

What is the Purpose of Function Calling?

Before function calling, there were only two ways of augmenting the capabilities of a GPT language model:

Fine-tuning: further training the language model by providing example responses. Fine-tuning is a powerful technique, but it requires significant work (and cost) to prepare the training data. In addition, only a few older models can be fine-tuned until OpenAI enables this feature in GPT-3.5 and GPT-4 models.
Embeddings: enriching the prompt with context data can extend the bot's knowledge and create more accurate responses. The downside is that this context can take up a lot of tokens, increasing the cost and leaving fewer tokens free for building complex responses.

Function calling adds a third way of extending the GPT capabilities by allowing it to ask us to run functions on its behalf. The model can then take the function's result and build a human-readable response that fits seamlessly into the current conversation.

How to Use Function Calling

The introduction of function calling changes how we interact with the GPT API. Before these functions, the interaction was simple:

Send a prompt to the API.
Receive a response.
Repeat.

A diagram showing the basic interaction with the GPT API before function calling.

With function calling, the sequence becomes more involved:

Send the user prompt along with a list of callable functions.
The GPT model responds with either a regular text response or a function call request.
If the model requests a function call, your chatbot's job is to execute it and return the results to the API.
Using the supplied data, the model then forms a coherent text response. However, in some cases, the API may request a new function call.

A diagram showing the more complex interaction with the GPT API using function calling.

Function Calling with the Chat Completions API

To allow the model to call functions, we must use the Chat Completions API. The API takes a POST request with a JSON payload containing a list of messages to process. A typical prompt sent to the API looks like the following:

{
 "model": "gpt-3.5-turbo",
 "messages": [
 {
 "role": "user",
 "content": "How many planets does the solar system have?"
 }
 ]
}

The role: user informs the API that the content is user-generated. The GPT API might reply with something along these lines:

{
   "id":"chatcmpl-7WVo3fYwerpAptzeqU46JamOvgBzh",
   "object":"chat.completion",
   "created":1687983115,
   "model":"gpt-3.5-turbo-0613",
   "choices":[
      {
         "index":0,
         "message":{
            "role":"assistant",
            "content":"There are eight planets in the solar system. They are:\n\n1. Mercury\n2. Venus\n3. Earth\n4. Mars\n5. Jupiter\n6. Saturn\n7. Uranus\n8. Neptune"
         },
         "finish_reason":"stop"
      }
   ],
   "usage":{
      "prompt_tokens":15,
      "completion_tokens":44,
      "total_tokens":59
   }
}

The role: assistant corresponds to messages generated by the GPT model. To keep the conversation flow, we must supply the entire message history back to the API on each request. For example, if we want to delve deeper into our previous question, the corresponding JSON payload would be:

{
   "model":"gpt-3.5-turbo",
   "messages":[
      {
         "role":"user",
         "content":"How many planets does the solar system have?"
      },
      {
         "role":"assistant",
         "content":"There are eight planets in the solar system. They are:\n\n1. Mercury\n2. Venus\n3. Earth\n4. Mars\n5. Jupiter\n6. Saturn\n7. Uranus\n8. Neptune"
      },
      {
         "role":"user",
         "content":"Tell me more about the second planet."
      }
   ]
}

To let the language model know it can call functions, we need to add a list of them to the payload. For example:

{
   "model":"gpt-3.5-turbo-0613",
   "messages":[
      {
         "role":"user",
         "content":"How is the weather in NYC?"
      }
   ],
   "functions":[
      {
         "name":"get_current_weather",
         "description":"Get the current weather in a given location",
         "parameters":{
            "type":"object",
            "properties":{
               "location":{
                  "type":"string",
                  "description":"The city and state, e.g. San Francisco, CA"
               },
               "unit":{
                  "type":"string",
                  "enum":[
                     "celsius",
                     "fahrenheit"
                  ]
               }
            },
            "required":[
               "location"
            ]
         }
      }
   ]
}

You may have noticed that we switched the model to "gpt-3.5-turbo-0613" because it supports function calling. If the model decides to call the function, we will receive a response of type role: assistant with a function_call property defined like this:

{
   "id":"chatcmpl-7WWG94C1DCFlAk5xmUwrZ9OOhFnOq",
   "object":"chat.completion",
   "created":1687984857,
   "model":"gpt-3.5-turbo-0613",
   "choices":[
      {
         "index":0,
         "message":{
            "role":"assistant",
            "content":null,
            "function_call":{
               "name":"get_current_weather",
               "arguments":"{\n \"location\": \"New York, NY\"\n}"
            }
         },
         "finish_reason":"function_call"
      }
   ],
   "usage":{
      "prompt_tokens":81,
      "completion_tokens":19,
      "total_tokens":100
   }
}

Our task is to execute get_current_weather with the provided arguments. OpenAI does not execute the function. Instead, it's the job of our chatbot to run it and parse the returned data.

Once we retrieve the weather data, we send it back to the model using a new type of role called function. For example:

{
   "model":"gpt-3.5-turbo-0613",
   "messages":[
      {
         "role":"user",
         "content":"How is the weather in NYC?"
      },
      {
         "role":"assistant",
         "content":null,
         "function_call":{
            "name":"get_current_weather",
            "arguments":"{\n \"location\": \"New York, NY\"\n}"
         }
      },
      {
         "role":"function",
         "name":"get_current_weather",
         "content":"Temperature: 57F, Condition: Raining"
      }
   ],
   "functions":[
      {
         "name":"get_current_weather",
         "description":"Get the current weather in a given location",
         "parameters":{
            "type":"object",
            "properties":{
               "location":{
                  "type":"string",
                  "description":"The city and state, e.g. San Francisco, CA"
               },
               "unit":{
                  "type":"string",
                  "enum":[
                     "celsius",
                     "fahrenheit"
                  ]
               }
            },
            "required":[
               "location"
            ]
         }
      }
   ]
}

Note that we passed the entire message history to the API, including our original prompt, the function call from the model, and the result of executing the weather function in our code. This enables the language model to understand the context in which the function was called.

Finally, the model may reply with a properly formatted answer, responding to our initial question:

{
   "id":"chatcmpl-7WWQUccvLUfjhbIcuvFrj2MDJVEiN",
   "object":"chat.completion",
   "created":1687985498,
   "model":"gpt-3.5-turbo-0613",
   "choices":[
      {
         "index":0,
         "message":{
            "role":"assistant",
            "content":"The weather in New York City is currently raining with a temperature of 57 degrees Fahrenheit."
         },
         "finish_reason":"stop"
      }
   ],
   "usage":{
      "prompt_tokens":119,
      "completion_tokens":19,
      "total_tokens":138
   }
}

Building NewsGPT

To learn more about function calling, we will build NewsGPT, a Python chatbot capable of accessing breaking news in real time. The bot will use function calling to determine what kind of data to fetch from NewsAPI.org.

To build the bot, you will need the following:

An OpenAI API key. This requires a credit card because API requests have a cost. However, new accounts receive \$5 credit for the first three months.
A NewsAPI API key. Register at NewsAPI.org and get a starter key for free.
Python 3.

Setting Up the Project

Install the required dependencies:

$ pip install openai tiktoken

The project consists of only one file; let's name it newsgpt.py. I will begin by adding all the necessary imports:

import openai
import tiktoken
import json
import os
import requests

Next, I will define a few constants:

The GPT model to use. I will use gpt-3.5-turbo-16k as it has a 16k token limit, allowing me to process longer conversations with more context.
The system prompt that instructs the model on its basic purpose.
The encoding used to count tokens in strings and messages; required to ensure we do not exceed the language model limits.
The maximum number of functions to call in a chain (more on this later).

llm_model = "gpt-3.5-turbo-16k"
llm_max_tokens = 15500
llm_system_prompt = "You are an assistant that provides news and headlines to user requests. Always try to get the lastest breaking stories using the available function calls."
encoding_model_messages = "gpt-3.5-turbo-0613"
encoding_model_strings = "cl100k_base"
function_call_limit = 3

All OpenAI models have a token limit. If this limit is exceeded, the API will throw an error instead of responding to our request. So, we need a function to count the number of tokens. I will use this function from the official example documentation:

def num_tokens_from_messages(messages):
 """Returns the number of tokens used by a list of messages."""
 try:
 encoding = tiktoken.encoding_for_model(encoding_model_messages)
 except KeyError:
 encoding = tiktoken.get_encoding(encoding_model_strings)

 num_tokens = 0
 for message in messages:
 num_tokens += 4
 for key, value in message.items():
 num_tokens += len(encoding.encode(str(value)))
 if key == "name":
 num_tokens += -1
 num_tokens += 2
 return num_tokens

Defining a Function to Call

Now, I will define a function to query the NewsAPI.org API to get the breaking news:


def get_top_headlines(query: str = None, country: str = None, category: str = None):
 """Retrieve top headlines from newsapi.org (API key required)"""

    base_url = "https://newsapi.org/v2/top-headlines"
    headers = {
    "x-api-key": os.environ['NEWS_API_KEY']
    }
    params = { "category": "general" }
    if query is not None:
    params['q'] = query
    if country is not None:
    params['country'] = country
    if category is not None:
    params['category'] = category

    # Fetch from newsapi.org - reference: https://newsapi.org/docs/endpoints/top-headlines
    response = requests.get(base_url, params=params, headers=headers)
    data = response.json()

    if data['status'] == 'ok':
    print(f"Processing {data['totalResults']} articles from newsapi.org")
    return json.dumps(data['articles'])
    else:
    print("Request failed with message:", data['message'])
    return 'No articles found'

To inform GPT about this function, we need to describe using a specific JSON structure. The format is described in the official documentation as follows:

signature_get_top_headlines = {
   "name":"get_top_headlines",
   "description":"Get top news headlines by country and/or category",
   "parameters":{
      "type":"object",
      "properties":{
         "query":{
            "type":"string",
            "description":"Freeform keywords or a phrase to search for."
         },
         "country":{
            "type":"string",
            "description":"The 2-letter ISO 3166-1 code of the country you want to get headlines for"
         },
         "category":{
            "type":"string",
            "description":"The category you want to get headlines for",
            "enum":[
               "business",
               "entertainment",
               "general",
               "health",
               "science",
               "sports",
               "technology"
            ]
         }
      },
      "required":[

      ]
   }
}

Using the Chat Completions API with Function Calling

Next, I will define the complete function, which performs several tasks:

Adds a system prompt at the end of the messages. This system message helps define the role that the GPT model will fulfill.
Removes old messages if the total token count exceeds the model's limit.
Sends the request to the GPT API.
Removes the system message from the end of the list.

def complete(messages, function_call: str = "auto"):
 """Fetch completion from OpenAI's GPT"""

 messages.append({"role": "system", "content": llm_system_prompt})

 # delete older completions to keep conversation under token limit
 while num_tokens_from_messages(messages) >= llm_max_tokens:
 messages.pop(0)

 print('Working...')
 res = openai.ChatCompletion.create(
 model=llm_model,
 messages=messages,
 functions=[signature_get_top_headlines],
 function_call=function_call
 )

 # remove system message and append response from the LLM
 messages.pop(-1)
 response = res["choices"][0]["message"]
 messages.append(response)

 # call functions requested by the model
 if response.get("function_call"):
 function_name = response["function_call"]["name"]
 if function_name == "get_top_headlines":
 args = json.loads(response["function_call"]["arguments"])
 headlines = get_top_headlines(
 query=args.get("query"),
 country=args.get("country"),
 category=args.get("category") 
 )
 messages.append({ "role": "function", "name": "get_top_headline", "content": headlines})

To complete the bot, we will create the main loop that continuously prompts the user for input and provides the answers:

print("\nHi, I'm a NewsGPT, a breaking news AI assistant. I can give you news for most countries over a wide range of categories.")
print("Here are some example prompts:\n - Tell me about the recent science discoveries\n - What is the latest news in the US?\n - What has Elon Musk been up to recently?")

messages = []
while True:
    prompt = input("\nWhat would you like to know? => ")
    messages.append({"role": "user", "content": prompt})
    complete(messages)

    # the LLM can chain function calls, this implements a limit
    call_count = 0
    while messages[-1]['role'] == "function":
    call_count = call_count + 1
    if call_count < function_call_limit:
    complete(messages)
    else:
    complete(messages, function_call="none")

    # print last message
    print("\n\n==Response==\n")
    print(messages[-1]["content"].strip())
    print("\n==End of response==")

Next, we need to define environment variables for the OpenAI and NewsAPI API keys:

export OPENAI_API_KEY=YOUR_API_KEY
export NEWS_API_KEY=YOUR_API_KEY

Finally, load the file and run the chatbot.

$ source .env
$ python newsgpt.py

Testing NewsGPT

We can inquire about a specific topic or person. For example:

=> What is Elon Musk up these days?
Working...
Processing 1 articles from newsapi.org
Working...


==Response==

Elon Musk has been in the news recently regarding a feud with Mark Zuckerberg, the CEO of Meta. According to an article from The Washington Post, Mark Zuckerberg is trying to make himself more relevant to the tech elite, and there seems to be a rivalry between him and Elon Musk. The article discusses their ongoing fight and how Zuckerberg is trying to compete with Musk's online antagonism and offline antics. You can read more about it [here](https://www.washingtonpost.com/technology/2023/06/24/elon-musk-mark-zuckerberg-fight-meta-twitter-rival/).

==End of response==

The bot is also capable of recognizing categories, allowing us to request health-related news:

Are there important developments in health?
Working...
Processing 1000 articles from newsapi.org
Working...


==Response==

Based on the latest headlines in the health category, here are some important developments:

1. [A Case of Herpes Zoster Ophthalmicus in a Recently Transplanted Renal Patient](https://www.cureus.com/articles/164429-a-case-of-herpes-zoster-ophthalmicus-in-a-recently-transplanted-renal-patient?score_article=true) - This case study presents the clinical course of a 51-year-old male who underwent a renal transplant and developed Herpes Zoster Ophthalmicus.

2. [Cuáles son los principales consejos a seguir para mejorar el estado de ánimo a base de una buena alimentación](https://news.google.com/rss/articles/CBMilQFodHRwczovL3d3dy5jYW5hbDI2LmNvbS9nZW5lcmFsL2N1YWxlcy1zb24tbG9zLXByaW5jaXBhbGVzLWNvbnNlam9zLWEtc2VndWlyLXBhcmEtbWVqb3Jhci1lbC1lc3RhZG8tZGUtYW5pbW8tYS1iYXNlLWRlLXVuYS1idWVuYS1hbGltZW50YWNpb24tLTM0NTMwM9IBAA?oc=5) (in Spanish) - This article provides tips on improving mood through a good diet.

3. [Как да стопите коремните мазнини за лятото](https://news.google.com/rss/articles/CBMiXmh0dHBzOi8vbS5hei1qZW5hdGEuYmcvYS81LXpkcmF2ZS1pLWtyYXNvdGEvNjM1Mzkta2FrLWRhLXN0b3BpdGUta29yZW1uaXRlLW1hem5pbmktemEtbGlhdG90by_SAWJodHRwczovL20uYXotamVuYXRhLmJnL2EvNS16ZHJhdmUtaS1rcmFzb3RhLzYzNTM5LWthay1kYS1zdG9waXRlLWtvcmVtbml0ZS1tYXpuaW5pLXphLWxpYXRvdG8vYW1wLw?oc=5) (in Bulgarian) - This article provides tips on losing belly fat for the summer.

4. [Recap: From RCC Diagnosis to Treatment, Toxicity Management, and Beyond](https://news.google.com/rss/articles/CBMiZ2h0dHBzOi8vd3d3LmNhbmNlcm5ldHdvcmsuY29tL3ZpZXcvcmVjYXAtZnJvbS1yY2MtZGlhZ25vc2lzLXRvLXRyZWF0bWVudC10b3hpY2l0eS1tYW5hZ2VtZW50LWFuZC1iZXlvbmTSAQA?oc=5) - This article discusses the diagnosis, treatment, and management of renal cell carcinoma.

5. [Review Highlights Progress, Challenges With CRS in Cancer Immunotherapies](https://news.google.com/rss/articles/CBMiYmh0dHBzOi8vd3d3LmFqbWMuY29tL3ZpZXcvcmV2aWV3LWhpZ2hsaWdodHMtcHJvZ3Jlc3MtY2hhbGxlbmdlcy13aXRoLWNycy1pbi1jYW5jZXItaW1tdW5vdGhlcmFwaWVz0gEA?oc=5) - This review highlights the progress and challenges in cancer immunotherapies targeting CRS (cytokine release syndrome).

These are just a few of the recent developments in the field of health. For more detailed information, you can read the full articles by clicking on the provided links.

==End of response==

It can countries as well, enabling us to query the bot for news about a particular region:

What is the latest news from France?
Working...
Processing 34 articles from newsapi.org
Working...


==Response==

Here are some of the latest news headlines in France:

1. "Diplôme national du brevet session 2023" - The Ministry of National Education announces the national diploma for the 2023 session. [Read more](https://news.google.com/rss/articles/CBMiTGh0dHBzOi8vd3d3LmVkdWNhdGlvbi5nb3V2LmZyL2RpcGxvbWUtbmF0aW9uYWwtZHUtYnJldmV0LXNlc3Npb24tMjAyMy0zNzg1NjDSAQA?oc=5)

2. "Cyclisme: la Nordiste Victoire Berteau sacrée championne de France après sa victoire sur les routes de Cassel" - Victoire Berteau from Nord wins the championship in cycling in France. [Read more](https://news.google.com/rss/articles/CBMiiQFodHRwczovL3d3dy5mcmFuY2V0dmluZm8uZnIvc3BvcnRzL2N5Y2xpc21lL2N5Y2xpc21lLXZpY3RvaXJlLWJlcnRlYXUtc2FjcmVlLWNoYW1waW9ubmUtZGUtZnJhbmNlLWFwcmVzLXNhLXZpY3RvaXJlLWEtY2Fzc2VsXzU5MDg4NDcuaHRtbNIBAA?oc=5)

3. "Guerre en Ukraine: comment les capitales étrangères réagissent-elles à la rébellion de la milice Wagner en Ru" - Foreign capitals' reactions to the rebellion of the Wagner militia in Ukraine. [Read more](https://news.google.com/rss/articles/CBMiwAFodHRwczovL3d3dy5mcmFuY2V0dmluZm8uZnIvbW9uZGUvZXVyb3BlL21hbmlmZXN0YXRpb25zLWVuLXVrcmFpbmUvZ3VlcnJlLWVuLXVrcmFpbmUtY29tbWVudC1sZXMtY2FwaXRhbGVzLWV0cmFuZ2VyZXMtcmVhZ2lzc2VudC1lbGxlcy1hLXJlYmVsbGlvbi1kZS1sYS1taWxpY2Utd2FnbmVyLWVuLXJ1c3NpZV81OTA4NzY2Lmh0bWzSAQA?oc=5)

4. "Marche des fiertés LGBT+: six jeunes mineurs interpellés pour homophobie" - Six minors arrested for homophobia during the LGBT+ Pride March. [Read more](https://news.google.com/rss/articles/CBMifmh0dHBzOi8vd3d3LnJ0bC5mci9hY3R1L2p1c3RpY2UtZmFpdHMtZGl2ZXJzL21hcmNoZS1kZXMtZmllcnRlcy1sZ2J0LXNpeC1qZXVuZXMtbWluZXVycy1pbnRlcnBlbGxlcy1wb3VyLWhvbW9waG9iaWUtNzkwMDI3Nzg4M9IBAA?oc=5)

5. "ATP 500 Queen's - De Minaur a dominé Rune avec autorité: le film de la demi-finale" - Alex de Minaur dominates Rune in the ATP 500 Queen's semifinals. [Read more](https://news.google.com/rss/articles/CBMimwFodHRwczovL3d3dy5ldXJvc3BvcnQuZnIvdGVubmlzL2F0cC1sb25kcmVzLzIwMjMvYXRwLTUwMC1xdWVlbi1zLXN1aXZlei1sYS1kZW1pLWZpbmFsZS1lbnRyZS1hbGV4LWRlLW1pbmF1ci1ldC1ob2xnZXItcnVuZS1lbi1kaXJlY3Rfc3RvOTY3MTM4My9zdG9yeS5zaHRtbNIBAA?oc=5)

These are just a few of the latest news headlines in France. Let me know if you want more information about any specific news article.

==End of response==

Ideas for Improvement

This simple bot is quite capable, even with a single function call. Now, imagine the possibilities if we integrate more features. Here are a few ideas to augment NewsGPT:

Retrieve the original articles to get summaries and analyze the news. We would need to navigate paywalls, perform web scraping, or check if RSS feeds or APIs provide content.
Add more endpoints. NewsAPI offers endpoints for searching news by date, categories, and filtering through sources.
Incorporate extra integrations, such as obtaining real-time data from sources like weather or finance.

Conclusion

Function calling is a powerful feature in OpenAI's GPT models, enabling them to interact with external tools and APIs in a more deterministic and structured manner. This feature lays the groundwork for more dynamic and responsive AI applications capable of providing current information and executing tasks beyond what was previously possible.

Happy building!

Word Embeddings: Giving Your ChatBot Context For Better Answers

Tomas Fernandez — Tue, 25 Jul 2023 22:19:52 +0000

Photo by Mojahid Mottakin on Unsplash

There is no doubt that OpenAI's ChatGPT is exceptionally intelligent — it has passed the lawyer's bar test, it possesses knowledge akin to a doctor, and some tests have clocked its IQ at 155. However, it tends to fabricate information instead of conceding ignorance. This tendency, coupled with the fact that its knowledge ceases in 2021, poses challenges in building specialized products using the GPT API.

How can we surmount these obstacles? How can we impart new knowledge to a model like GPT-3? My goal is to address these questions by constructing a question-answering bot employing Python, the OpenAI API, and word embeddings.

What I Will Be Building

I intend to create a bot that generates continuous integration pipelines from a prompt, which, as you may know, are formated with YAML in Semaphore CI/CD.

Here is an example of the bot in action:

Screenshot of the running program. On the screen, the command python query.py "Create a CI pipeline that builds and uploads a Docker image to Docker Hub" is executed, and the program prints out YAML corresponding to a CI pipeline that performs the requested action.

In the spirit of projects like DocsGPT, My AskAI, and Libraria, I plan to "teach" the GPT-3 model about Semaphore and how to generate pipeline configuration files. I will achieve this by leveraging the existing documentation.

I will not assume prior knowledge of bot building and will maintain clean code so that you can adapt it to your requirements.

Prerequisites

You do not need experience in coding a bot or knowledge of neural networks to follow this tutorial. However, you will need:

Python 3.
A Pinecone account (sign up for the Starter plan for free).
An OpenAI API Key (paid, requires a credit card); new users can experiment with $5 in free credit during the first 3 months.

But ChatGPT Can't Learn, Can It?

ChatGPT, or more accurately, GPT-3 and GPT-4, the Large Language Models (LLMs) powering them, have been trained on a massive dataset with a cutoff date around September 2021.

In essence, GPT-3 knows very little about events beyond that date. We can verify this with a simple prompt:

ChatGPT doesn't know who won the World Cup in 2022.

While some OpenAI models can undergo fine-tuning, the more advanced models, such as the ones were interested in, cannot; we cannot augment their training data.

How can we get answers from GPT-3 beyond its training data? One method involves exploiting its text comprehension abilities; by enhancing the prompt with relevant context, we can likely obtain the correct answer.

In the example below, I provide context from FIFA's official site, and the response differs significantly:

With the supplied context, ChatGPT can answer accurately.

We can deduce that the model can respond to any prompt if given enough relevant context. The question remains: how can we know what's relevant given an arbitrary prompt? To address this, we need to explore what word embeddings are.

What Are Word Embeddings?

In the context of language models, an embedding is a way of representing words, sentences, or entire documents as vectors or lists of numbers.

To calculate embeddings, we will need a neural network such as word2vec or text-embedding-ada-002. These networks have been trained on massive amounts of text and can find relationships between words by analyzing the frequencies with which specific patterns appear in the training data.

Let’s say we have the following words:

Cat
Dog
Ball
House

Imagine we use one of these embedding networks to calculate the vectors for each word. For example:

Word	Vector	Context
Cat	[0.1, 0.2, 0.3, 0.4, 0.5]	Animals, objects, small things
Dog	[0.6, 0.7, 0.8, 0.9, 1.0]	Animals, objects, large things
Ball	[0.2, 0.4, 0.6, 0.8, 1.0]	Objects, toys, small things
House	[0.3, 0.6, 0.9, 1.2, 1.5]	Buildings, homes, large things

Once we have the vectors for each word, we can use them to represent the meaning of the text. For example, the sentence “The cat chased the ball” can be represented as the vector [0.1, 0.2, 0.3, 0.4, 0.5] + [0.2, 0.4, 0.6, 0.8, 1.0] = [0.3, 0.6, 0.9, 1.2, 1.5]. This vector represents a sentence that is about an animal chasing an object.

Word embeddings can be visualized as multidimensional spaces where words or sentences with similar meanings are close together. We can compute the "distance" between vectors to find similar meanings for any input text.

3D representation of embeddings as vector spaces. In reality, these spaces can have hundreds or thousands of dimensions. Source: Meet AI’s Multitool: Vector Embeddings

The actual mathematics behind all this is beyond the scope of this article. However, the key takeaway is that vector operations allow us to manipulate or determine meaning using mathematics. Take the vector that represents the word “queen,” subtract the “woman” vector from it, and add the “man” vector. The result should be a vector in the vicinity of “king.” If we add “son,” we should get somewhere close to “prince.”

Embedding Neural Networks with Tokens

So far, we have discussed embedding neural networks taking words as inputs and numbers as outputs. However, many modern networks have moved from processing words to processing tokens.

A token is the smallest unit of text that can be processed by the model. Tokens can be words, characters, punctuation marks, symbols, or parts of words.

We can see how words are converted to tokens by experimenting with the OpenAI online tokenizer, which uses Byte-Pair Encoding (BPE) to convert text to tokens and represent each one with a number:

There is often a 1-to-1 relationship between tokens and words. Most tokens include the word and a leading space. However, there are special cases like "embedding," which consists of two tokens, "embed" and "ding," or "capabilities," which consists of four tokens. If you click "Token IDs," you can see the model's numeric representation of each token.

Designing a Smarter Bot Using Embeddings

Now that we have an understanding of what embeddings are, the next question is: how can they help us build a smarter bot?

First, let's consider what happens when we use the GPT-3 API directly. The user issues a prompt, and the model responds to the best of its ability.

However, when we add context to the equation, things change. For example, when I asked ChatGPT about the winner of the World Cup after providing context, it made all the difference.

So, the plan to build a smarter bot is as follows:

Intercept the user's prompt.
Calculate the embeddings for that prompt, yielding a vector.
Search a database for documents near the vector, as they should be semantically relevant to the initial prompt.
Send the original prompt to GPT-3, along with any relevant context.
Forward GPT-3's response to the user.

Let's begin like most projects, by designing the database.

Creating a Knowledge Database with Embeddings

Our context database must include the original documentation and their respective vectors. In principle, we can employ any type of database for this task, but a vector database is the optimal tool for the job.

Vector databases are specialized databases designed to store and retrieve high-dimensional vector data. Instead of employing a query language such as SQL for searching, we supply a vector and request the N closest neighbors.

To generate the vectors, we will use text-embedding-ada-002 from OpenAI, as it is the fastest and most cost-effective model they offer. The model converts the input text into tokens and uses an attention mechanism known as Transformer to learn their relationships. The output of this neural network is vectors representing the meaning of the text.

To create a context database, I will:

Collect all the source documentation.
Filter out irrelevant documents.
Calculate the embeddings for each document.
Store the vectors, original text, and any other relevant metadata in the database.

Converting Documents into Vectors

First, I must initialize an environment file with the OpenAI API key. This file should never be committed to version control, as the API key is private and tied to your account.

export OPENAI_API_KEY=YOUR_API_KEY

Next, I'll create a virtualenv for my Python application:

$ virtualenv venv
$ source venv/bin/activate
$ source .env

And install the OpenAI package:

bash
$ pip install openai numpy


Let's try calculating the embedding for the string "Docker Container". You can run this on the Python REPL or as a Python script:

python
$ python

import openai

embeddings = openai.Embedding.create(input="Docker Containers", engine="text-embedding-ada-002")

embeddings

JSON: {
"data": [
{
"embedding": [
-0.00530336843803525,
0.0013223182177171111,

... 1533 more items ...,

-0.015645816922187805
],
"index": 0,
"object": "embedding"
}
],
"model": "text-embedding-ada-002-v2",
"object": "list",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
}
}


As you can see, OpenAI's model responds with an `embedding` list containing 1536 items — the vector size for text-embedding-ada-002.

## Storing the embeddings in Pinecone

While there are multiple vector database engines to choose from, like [Chroma](https://www.trychroma.com/) which is open-source, I chose [Pinecone](https://www.pinecone.io/) because its a managed database with a free tier, which makes things simpler. Their [Starter plan](https://www.pinecone.io/pricing/) is more than capable of handling all the data I will need.

After creating my Pinecone account and retrieving my API key and environment, I add both values to my `.env` file.


![Pinecone API Key generation screenshot](https://raw.githubusercontent.com/tomfern/blog-assets/master/images/aa79dbcadd804a2b1aefe607553f50e960619717590f573da6ca04e967668802.jpg)

Now `.env` should contain my Pinecone and OpenAI secrets.

bash
export OPENAI_API_KEY=YOUR_API_KEY

Pinecone secrets

export PINECONE_API_KEY=YOUR_API_KEY
export PINECONE_ENVIRONMENT=YOUR_PINECONE_DATACENTER


Then, I install the Pinecone client for Python:

bash
$ pip install pinecone-client


I need to initialize a database; these are the contents of the `db_create.py` script:

python

db_create.py

import pinecone
import openai
import os

index_name = "semaphore"
embed_model = "text-embedding-ada-002"

api_key = os.getenv("PINECONE_API_KEY")
env = os.getenv("PINECONE_ENVIRONMENT")
pinecone.init(api_key=api_key, environment=env)

embedding = openai.Embedding.create(
input=[
"Sample document text goes here",
"there will be several phrases in each batch"
], engine=embed_model
)

if index_name not in pinecone.list_indexes():
print("Creating pinecone index: " + index_name)
pinecone.create_index(
index_name,
dimension=len(embedding['data'][0]['embedding']),
metric='cosine',
metadata_config={'indexed': ['source', 'id']}
)


The script can take a few minutes to create the database.

bash
$ python db_create.py


Next, I will install the [tiktoken](https://github.com/openai/tiktoken) package. I'll use it to calculate how many tokens the source documents have. This is important because the embedding model can only handle up to 8191 tokens.

bash
$ pip install tiktoken


While installing packages, let's also install `tqdm` to produce a nice-looking progress bar.

bash
$ pip install tqdm


Now I need to upload the documents to the database. The script for this will be called `index_docs.py`. Let's start by importing the required modules and defining some constants:

python

index_docs.py

Pinecone db name and upload batch size

index_name = 'semaphore'
upsert_batch_size = 20

OpenAI embedding and tokenizer models

embed_model = "text-embedding-ada-002"
encoding_model = "cl100k_base"
max_tokens_model = 8191


Next, we'll need a function to count tokens. There is a [token counter example](https://platform.openai.com/docs/guides/embeddings/how-can-i-tell-how-many-tokens-a-string-has-before-i-embed-it) on OpenAI page:

python
import tiktoken
def num_tokens_from_string(string: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_model)
num_tokens = len(encoding.encode(string))
return num_tokens


Finally, I'll need some filtering functions to convert the original document into usable examples. Most examples in the documentation are between code fences, so I'll just extract all YAML code from every file:

python
import re
def extract_yaml(text: str) -> str:
"""Returns list with all the YAML code blocks found in text."""
matches = [m.group(1) for m in re.finditer("


", text)]
 return matches

I'm done with the functions. Next, this will load the files in memory and extract the examples:


 python

from tqdm import tqdm
import sys
import os
import pathlib

repo_path = sys.argv[1]
repo_path = os.path.abspath(repo_path)
repo = pathlib.Path(repo_path)

markdown_files = list(repo.glob("**/*.md")) + list(
 repo.glob("**/*.mdx")
)

print(f"Extracting YAML from Markdown files in {repo_path}")
new_data = []
for i in tqdm(range(0, len(markdown_files))):
 markdown_file = markdown_files[i]
 with open(markdown_file, "r") as f:
 relative_path = markdown_file.relative_to(repo_path)
 text = str(f.read())
 if text == '':
 continue
 yamls = extract_yaml(text)
 j = 0
 for y in yamls:
 j = j+1
 new_data.append({
 "source": str(relative_path),
 "text": y,
 "id": f"github.com/semaphore/docs/{relative_path}[{j}]"
 })

At this point, all the YAMLs should be stored in the new_data list. The final step is to upload the embeddings into Pinecone.


 python

import pinecone
import openai

api_key = os.getenv("PINECONE_API_KEY")
env = os.getenv("PINECONE_ENVIRONMENT")
pinecone.init(api_key=api_key, enviroment=env)
index = pinecone.Index(index_name)

print(f"Creating embeddings and uploading vectors to database")
for i in tqdm(range(0, len(new_data), upsert_batch_size)):

 i_end = min(len(new_data), i+upsert_batch_size)
 meta_batch = new_data[i:i_end]
 ids_batch = [x['id'] for x in meta_batch]
 texts = [x['text'] for x in meta_batch]

 embedding = openai.Embedding.create(input=texts, engine=embed_model)
 embeds = [record['embedding'] for record in embedding['data']]

 # clean metadata before upserting
 meta_batch = [{
 'id': x['id'],
 'text': x['text'],
 'source': x['source']
 } for x in meta_batch] 

 to_upsert = list(zip(ids_batch, embeds, meta_batch))
 index.upsert(vectors=to_upsert)

As a reference, you can find the full index_docs.py file in the demo repository

Let's run the index script to finish with the database setup:


 bash
$ git clone https://github.com/semaphoreci/docs.git /tmp/docs
$ source .env
$ python index_docs.py /tmp/docs

Testing the database

The Pinecone dashboard should show vectors in the database.

We can query the database with the following code, which you can run as a script or in the Python REPL directly:


 python
$ python

>>> import os
>>> import pinecone
>>> import openai

# Compute embeddings for string "Docker Container"
>>> embeddings = openai.Embedding.create(input="Docker Containers", engine="text-embedding-ada-002")


# Connect to database
>>> index_name = "semaphore"
>>> api_key = os.getenv("PINECONE_API_KEY")
>>> env = os.getenv("PINECONE_ENVIRONMENT")
>>> pinecone.init(api_key=api_key, environment=env)
>>> index = pinecone.Index(index_name)

# Query database
>>> matches = index.query(embeddings['data'][0]['embedding'], top_k=1, include_metadata=True)

>>> matches['matches'][0]
{'id': 'github.com/semaphore/docs/docs/ci-cd-environment/docker-authentication.md[3]',
 'metadata': {'id': 'github.com/semaphore/docs/docs/ci-cd-environment/docker-authentication.md[3]',
 'source': 'docs/ci-cd-environment/docker-authentication.md',
 'text': '\n'
 '# .semaphore/semaphore.yml\n'
 'version: v1.0\n'
 'name: Using a Docker image\n'
 'agent:\n'
 ' machine:\n'
 ' type: e1-standard-2\n'
 ' os_image: ubuntu1804\n'
 '\n'
 'blocks:\n'
 ' - name: Run container from Docker Hub\n'
 ' task:\n'
 ' jobs:\n'
 ' - name: Authenticate docker pull\n'
 ' commands:\n'
 ' - checkout\n'
 ' - echo $DOCKERHUB_PASSWORD | docker login '
 '--username "$DOCKERHUB_USERNAME" --password-stdin\n'
 ' - docker pull /\n'
 ' - docker images\n'
 ' - docker run /\n'
 ' secrets:\n'
 ' - name: docker-hub\n'},
 'score': 0.796259582,
 'values': []}

As you can see, the first match is the YAML for a Semaphore pipeline that pulls a Docker image and runs it. It's a good start since it's relevant to our "Docker Containers" search string.

Building the bot

We have the data, and we know how to query it. Let's put it to work in the bot.

The steps for processing the prompt is:

Take the user's prompt.
Calculate its vector.
Retrieve relevant context from the database.
Send the user's prompt along with context to GPT-3.
Forward the model's response to the user.

As usual, I'll start by defining some constants in complete.py, the bot's main script:


 python
# complete.py

# Pinecone database name, number of matched to retrieve
# cutoff similarity score, and how much tokens as context
index_name = 'semaphore'
context_cap_per_query = 30
match_min_score = 0.75
context_tokens_per_query = 3000

# OpenAI LLM model parameters
chat_engine_model = "gpt-3.5-turbo"
max_tokens_model = 4096
temperature = 0.2 
embed_model = "text-embedding-ada-002"
encoding_model_messages = "gpt-3.5-turbo-0301"
encoding_model_strings = "cl100k_base"

import pinecone
import os

# Connect with Pinecone db and index
api_key = os.getenv("PINECONE_API_KEY")
env = os.getenv("PINECONE_ENVIRONMENT")
pinecone.init(api_key=api_key, environment=env)
index = pinecone.Index(index_name)

Next, I'll add functions to count tokens as shown in the OpenAI examples. The first function counts tokens in a string, while the second counts tokens in messages. We'll see messages in detail in a bit. For now, let's just say it's a structure that keeps the state of the conversation in memory.


 python
import tiktoken

def num_tokens_from_string(string: str) -> int:
 """Returns the number of tokens in a text string."""
 encoding = tiktoken.get_encoding(encoding_model_strings)
 num_tokens = len(encoding.encode(string))
 return num_tokens


def num_tokens_from_messages(messages):
 """Returns the number of tokens used by a list of messages. Compatible with model """

 try:
 encoding = tiktoken.encoding_for_model(encoding_model_messages)
 except KeyError:
 encoding = tiktoken.get_encoding(encoding_model_strings)

 num_tokens = 0
 for message in messages:
 num_tokens += 4 # every message follows {role/name}\n{content}\n
 for key, value in message.items():
 num_tokens += len(encoding.encode(value))
 if key == "name": # if there's a name, the role is omitted
 num_tokens += -1 # role is always required and always 1 token
 num_tokens += 2 # every reply is primed with assistant
 return num_tokens

The following function takes the original prompt and context strings to return an enriched prompt for GPT-3:


 python
def get_prompt(query: str, context: str) -> str:
 """Return the prompt with query and context."""
 return (
 f"Create the continuous integration pipeline YAML code to fullfil the requested task.\n" +
 f"Below you will find some context that may help. Ignore it if it seems irrelevant.\n\n" +
 f"Context:\n{context}" +
 f"\n\nTask: {query}\n\nYAML Code:"
 )

The get_message function formats the prompt in a format compatible with API:


 python
def get_message(role: str, content: str) -> dict:
 """Generate a message for OpenAI API completion."""
 return {"role": role, "content": content}

There are three types of roles that affect how the model reacts:

User: for the user's original prompt.
System: helps set the behavior of the assistant. While there is some controversy regarding its effectiveness, it appears to be more effective when sent at the end of the messages list.
Assistant: represents past responses of the model. The OpenAI API does not have a "memory"; instead, we must send the model's previous responses back during each interaction to maintain the conversation.

Now for the engaging part. The get_context function takes the prompt, queries the database, and generates a context string until one of these conditions is met:

The complete text exceeds context_tokens_per_query, the space I reserved for context.
The search function retrieves all requested matches.
Matches that have a similarity score below match_min_score are ignored.


 python
import openai

def get_context(query: str, max_tokens: int) -> list:
 """Generate message for OpenAI model. Add context until hitting `context_token_limit` limit. Returns prompt string."""

 embeddings = openai.Embedding.create(
 input=[query],
 engine=embed_model
 )

 # search the database
 vectors = embeddings['data'][0]['embedding']
 embeddings = index.query(vectors, top_k=context_cap_per_query, include_metadata=True)
 matches = embeddings['matches']

 # filter and aggregate context
 usable_context = ""
 context_count = 0
 for i in range(0, len(matches)):

 source = matches[i]['metadata']['source']
 if matches[i]['score'] < match_min_score:
 # skip context with low similarity score
 continue

 context = matches[i]['metadata']['text']
 token_count = num_tokens_from_string(usable_context + '\n---\n' + context)

 if token_count < context_tokens_per_query:
 usable_context = usable_context + '\n---\n' + context 
 context_count = context_count + 1

 print(f"Found {context_count} contexts for your query")

 return usable_context

The next and final function, complete, issues the API request to OpenAI and returns the model's response.


 python
def complete(messages):
 """Query the OpenAI model. Returns the first answer. """

 res = openai.ChatCompletion.create(
 model=chat_engine_model,
 messages=messages,
 temperature=temperature
 )
 return res.choices[0].message.content.strip()

That's all; now I only have to deal with the command line arguments and call the functions in the correct order:


 python
import sys

query = sys.argv[1]

context = get_context(query, context_tokens_per_query)
prompt = get_prompt(query, context)

# initialize messages list to send to OpenAI API
messages = []
messages.append(get_message('user', prompt))
messages.append(get_message('system', 'You are a helpful assistant that writes YAML code for Semaphore continuous integration pipelines and explains them. Return YAML code inside code fences.'))

if num_tokens_from_messages(messages) >= max_tokens_model:
 raise Exception('Model token size limit reached') 

print("Working on your query... ")
answer = complete(messages)
print("Answer:\n")
print(answer)
messages.append(get_message('assistant', answer))

It's time to run the script and see how it fares:


 bash
$ python complete.py "Create a CI pipeline that builds and uploads a Docker image to Docker Hub"

The result is:


 yaml
version: v1.0
name: Docker Build and Push
agent:
 machine:
 type: e1-standard-2
 os_image: ubuntu1804

blocks:
 - name: "Build and Push Docker Image"
 task:
 jobs:
 - name: "Docker Build and Push"
 commands:
 - checkout
 - docker build -t /: .
 - echo "$DOCKERHUB_PASSWORD" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
 - docker push /:

promotions:
 - name: Deploy to production
 pipeline_file: deploy-production.yml
 auto_promote:
 when: "result = 'passed' and branch = 'master'"

This is the first good result. The model has inferred the syntax from the context examples we provided.

Thoughts on Expanding the Bot's Capabilities

Remember that I started with a modest goal: creating an assistant to write YAML pipelines. With richer content in my vector database, I can generalize the bot to answer any question about Semaphore (or any product — remember cloning the docs into /tmp?).

The key to obtaining good answers is — unsurprisingly — quality context. Merely uploading every document into the vector database is unlikely to yield good results. The context database should be curated, tagged with descriptive metadata, and be concise. Otherwise, we risk filling the token quota in the prompt with irrelevant context.

So, in a sense, there is an art — and a great deal of trial and error — involved in fine-tuning the bot to meet our needs. We can experiment with the context limit, remove low-quality content, summarize, and filter out irrelevant context by adjusting the similarity score.

Implementing a Proper Chatbot

You may have noticed that my bot does not enable us to have actual conversation like ChatGPT. We ask one question and get one answer.

Converting the bot into a fully-fledged chatbot is, in principle, not too challenging. We can maintain the conversation by resending previous responses to the model with each API request. Prior GPT-3 answers are sent back under the "assistant" role. For example:


 python

messages = []

while True:

 query = input('Type your prompt:\n')

 context = get_context(query, context_tokens_per_query)
 prompt = get_prompt(query, context)
 messages.append(get_message('user', prompt))
 messages.append(get_message('system', 'You are a helpful assistant that writes YAML code for Semaphore continuous integration pipelines and explains them. Return YAML code inside code fences.'))

 if num_tokens_from_messages(messages) >= max_tokens_model:
 raise Exception('Model token size limit reached') 

 print("Working on your query... ")
 answer = complete(messages)
 print("Answer:\n")
 print(answer)

 # remove system message and append model's answer
 messages.pop() 
 messages.append(get_message('assistant', answer))

Unfortunately, this implementation is rather rudimentary. It will not support extended conversations as the token count increases with each interaction. Soon enough, we will reach the 4096-token limit for GPT-3, preventing further dialogue.

So, we have to find some way of keeping the request within token limits. A few strategies follow:

Delete older messages. While this is the simplest solution, it limits the conversation's "memory" to only the most recent messages.
Summarize previous messages. We can utilize "Ask the model" to condense earlier messages and substitute them for the original questions and answers. Though this approach increases the cost and lag between queries, it may produce superior outcomes compared to simply deleting past messages.
Set a strict limit on the number of interactions.
Wait for the GPT-4 API general availability, which is not only smarter but has double token capacity.
Use a newer model like "gpt-3.5-turbo-16k" which can handle up to 16k tokens.

Conclusion

Enhancing the bot's responses is possible with word embeddings and a good context databse. To achieve this, we need good quality documentation. There is a substantial amount of trial and error involved in developing a bot that seemingly possesses a grasp of the subject matter.

I hope this in-depth exploration of word embeddings and large language models aids you in building a more potent bot, customized to your requirements.

Happy building!

Breaking Free From the Cloud With MRSK: Just Enough Orchestration for Your Apps

Tomas Fernandez — Wed, 19 Jul 2023 21:07:32 +0000

A lightweight alternative to Kubernetes. Powering the likes of Hey.com and Basecamp.

Breaking Free From the Cloud With Mrsk: Just Enough Orchestration for Your Apps

There is no doubt that the cloud has changed the way we run our software. Startups, for instance, can get started without buying expensive hardware and scale flexibly. Also, the cloud has enabled novel solutions such as serverless, managed Kubernetes and Docker, or edge functions. For a time, cloud-native applications seemed to be the way forward for most teams, big or small.

But in exchange for all this power, we pay a cost. And it can be a steep one. 37signals — the company behind HEY.com and Basecamp.com — has calculated that by buying a few servers and moving from the cloud to on-premise, they can save 7 million dollars over 5 years.

And it is not an isolated case. Recently, Amazon Prime — the poster child of serverless architectures — moved part of its infrastructure from serverless into a monolith and cut costs by 90%.

Does this mean that we should we go back to bare metal servers like in the old days? Not quite. We can still enjoy many of the benefits of the cloud, like horizontal scalability and no-outage upgrades using containers in combination with an orchestration platform. On that note, 37signals recently released MRSK, a tool that allowed them to to completely leave the cloud, saving them money and improving performance for their users.

What is MRSK?

MRSK is a deployment tool for containerized applications. It uses Docker containers to run the software and Traefik, a reverse proxy and load manager, to perform rolling updates.

The application container runs behind an instance of Traefik running as a reverse proxy. User requests are routed to the active application container.

MRSK keeps things simple by:

Using standard Docker images.
Connecting to the servers via SSH.
Giving each server a single role.

The last point gives us the first clue about how MRSK works. It assumes a multi-server setup, where every machine fulfills only one role. In other words, if your application requires a database, MRSK expects to have at least two servers: one for the application and one for the database.

Deploying the application and database with a single MRSK command.

MRSK cares about your servers and little else. Load balancing is limited to the server level. MRSK uses Traefik to forward HTTP traffic to every container running on a machine. If you want to do horizontal scaling, you'll need to put a separate load balancer in front of everything.

MRSK does not manage multi-server load balancing. You need to add your own.

Because MRSK doesn't care where the application runs. You can use a cloud server, on-premise machines, or even run VMs on your laptop to test drive the tool.

Getting started with MRSK

You'll need the following to build and deploy an application with MRSK:

Docker.
A Docker Hub account or similar.
A software project with its Dockerfile.
Two servers with SSH access.

Note: will need at least one server for each role. A deployment will typically need at least two machines: one for the application and one for the database. You cannot host the database on the same machine as the application.

Ensure you have SSH access

Before starting, ensure that you have SSH access to the deployment server. It can be any cloud, bare metal server, or even a VM on your laptop. What matters is that it already has your SSH key authorized for root access.

$ ssh root@SERVER_IP_ADDRESS

If that doesn't work, but you know the root's password, you can add your key with the following command:

$ ssh-copy-id root@SERVER_IP_ADDRESS

Installing MRSK

So, now that we have our server, we can install MRSK on your local machine. Either as a Ruby Gem:

$ gem install mrsk -v 0.13.2

Or, you can try the ready-made official Docker image by aliasing the command in your shell like this:

$ alias mrsk='docker run --rm -it -v $HOME/.ssh:/root/.ssh -v /var/run/docker.sock:/var/run/docker.sock -v ${PWD}/:/workdir ghcr.io/mrsked/mrsk:v0.13.2'

Since MRSK is in active development and quite new, I suggest picking a version and sticking with it to prevent updates from breaking your deployments.

Configuring MRSK

For this next step, we will need a project to deploy. MRSK works with any codebase as long as it includes a Dockerfile. For this tutorial, I will be using the following demo project, which exposes an API-based addressbook with PostgreSQL for persistence:

TomFern/dockerizing-nodejs

In the project's root folder, run:

$ mrsk init

This will create three files:

config/deploy.yml: the main configuration file. It declares how to deploy the application to all your servers.
.env: for sensitive environment variables, including passwords and tokens. It should be added to .gitignore so it's never checked in the repository.
.mrsk/hooks: contains Git-like hooks. We won't be using any of these features in this tutorial.

The building block of MRSK deployments is called applications, which are containers running in one or more servers. They are connected to the Internet via Traefik, a reverse proxy.

Let's configure our deployment. To do that, open deploy.yml and edit the following values near the top of the file. Uncomment the lines as needed. You will, at the very least, need to define the following:

A name for the application.
The server IP address. This will be your application server.
The Docker image name without the tag.
Your Docker Hub username (you can also use a different registry).
In the registry.password add "DOCKER_PASSWORD". This is a reference to a variable defined in .env.
Environment variables for the application. The values under clear are stored in plaintext. Passwords should be stored in .env, and their variable name should be listed under secret in the config file.

I'm configuring an "addressbook" application in the example below. Its image name is TomFern/addressbook (notice the lack of tags, which are handled by MRSK). My application uses a PostgreSQL database, for which I set DB_HOST to the IP address of the db and reference the variable containing the password in .env.

service: addressbook
image: tomfern/addressbook

# Deploy to these servers.
servers:
 - 216.238.101.228

# Credentials for your image host.
registry:
 username: tomfern
 password:
 - DOCKER_PASSWORD

# Environment variables
env:
 clear:
 DB_HOST: 216.238.113.141
 secret:
 - DB_PASSWORD

MRSK can handle containerized dependencies like databases, caches, or search services. MRSK calls them accessories. If you need to define an accessory, scroll down the config file until you find the accessories section. Then, set the following values:

The db engine image name with its tag, e.g. postgres:14.
The db server IP address and the database port.
Any secrets or environment variables needed for database initialization.
One or more volume mappings for data persistence — so data is not wiped out when the container stops.
You can also define startup scripts to run during database initialization.

The following snippet shows how to configure a PostgreSQL database accessory for our demo app:

accessories:
 db:
 image: postgres:14
 host: 216.238.113.141
 port: 5432
 env:
 secret:
 - POSTGRES_PASSWORD
 volumes:
 - /var/lib/postgresql/data:/var/lib/postgresql/data

Now, open the .env file. Fill in the password of your Docker Hub account and for the "Postgres" user in your PostgreSQL server. The value of every secret environment variable defined in deploy.yml must be set here.

DOCKER_PASSWORD=YOUR_DOCKER_HUB_PASSWORD
POSTGRES_PASSWORD=THE_POSTGRES_ADMIN_PASSWORD
DB_PASSWORD=THE_ADDRESSBOOK_APP_PASSWORD

You may use the same password for POSTGRES_PASSWORD and DB_PASSWORD or create a dedicated user for the app in the database. If you do so, you'll also need to define DB_USER in the config file like this:

service: addressbook
image: tomfern/addressbook

# ...

env:
 clear:
 DB_HOST: 216.238.113.141
 DB_USER: my_app_username
 secret:
 - DB_PASSWORD

Add a healthcheck route

By default, MRSK checks that the application container is up by running curl on the /up route. You can change the healthcheck endpoint in the config file.

Since the demo does not have a healthcheck route, let's add one. In app.js, add the following lines:

// app.js

// ....

const healthRouter = require('./routes/health');
app.use('/up', healthRouter);

// ...

And create a new file called routes/health.js with the following content, which checks if the app can connect with the database:

// routes/health.js


const express = require('express');
const router = express.Router();
const db = require('../database');

router.get("/", function(req, res) {
 db.sequelize.authenticate()
 .then(() => res.status(200).send(JSON.stringify({ ok: true })))
 .catch( err => res.status(500).send(JSON.stringify(err)));
});

module.exports = router;

MRSK uses curl inside the container to perform the healthcheck, so ensure that your Dockerfile installs the tool. For example:

FROM node:18.16.0-alpine3.17
RUN apk update && apk add curl

...

Prepare servers and deploy

We're set to begin with the deployment. MRSK can do everything with a single command:

$ mrsk setup

Acquiring the deploy lock
Ensure curl and Docker are installed...
Log into image registry...
Build and push app image...
Ensure Traefik is running...
Start container with version c439617 using a 7s readiness delay (or reboot if already running)...
Releasing the deploy lock

This command does the following:

Installs Docker on all machines.
Starts Traefik in the app server.
Starts the PostgreSQL container in the database server.
Builds the Docker image for your application on your laptop and uploads it to Docker Hub.
Pulls the image and starts the application on the server.
Routes inbound traffic into the application container.

MRSK runs a health check to verify that the application is ready to work before being exposed to the Internet. Once the check passes, Traefik routes traffic into the app's container.

By default, the health check is a GET request to the /up route (expecting status code 200), but you can change that in deploy.yml.

MRSK deployment mechanism. The image is built into the developer's machine, uploaded to Docker Hub, and pulled into the server. Then, a health check ensures it started correctly. Once verified, Traefik routes traffic into the application container.

We can check what containers are running with the following:

$ mrsk details

Traefik Host: 216.238.101.228
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5d08d56b760b traefik:v2.9 "/entrypoint.sh --pr…" 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp traefik

App Host: 216.238.101.228
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
28acfd2cda02 tomfern/addressbook:3ecd87430ab7ab6cc30a1542784ddb75fbfd8e74 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 3000/tcp addressbook-web-3ecd87430ab7ab6cc30a1542784ddb75fbfd8e74


CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a843852686b6 postgres:14 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp addressbook-db

As you can see, MRSK tags every built image with a unique identifier, allowing us to track changes, rolling back and forwards as needed.

Before testing the application, we should run any database migration/setup scripts to initialize its contents. MRSK allows us to run commands in running containers with the following:

$ mrsk app exec "npm run migrate"

Finally, we can test the application, which should be ready to receive user requests.

$ curl -X PUT http://216.238.101.228/persons -H 'Content-Type: application/json' -d '{
"id": 1,
"firstName": "David",
"lastName": "Bowie"
}'

$ curl 216.238.101.228/persons/all
{
 "firstName": "David",
 "lastName": "Bowie",
 "id": 1,
 "updatedAt": "2023-04-30T22:44:29.115Z",
 "createdAt": "2023-04-30T22:44:29.115Z"
}

Deploying updates with MRSK

Let's make a change to the application so we can see how MRSK handles updates. In the demo code, we have a /persons API endpoint; it would be a good idea to version it. So let's change this line in app.js:

app.use('/persons', personsRouter);

Change the line so the base endpoint route is /persons/v1/.

app.use('/persons/v1', personsRouter);

The fastest way to update the container is with mrsk redeploy, which does away with a few tasks like checking whether Docker is installed in all servers.

$ mrsk redeploy

Acquiring the deploy lock
Running docker buildx build
Running docker image rm --force tomfern/addressbook:7db892 on 216.238.101.228
Running docker pull tomfern/addressbook:latest on 216.238.101.228
Health check against /up succeeded with 200 OK!
Finished all in 35.8 seconds
Releasing the deploy lock

The redeploy command will rebuild the image, upload it to Docker Hub and start it on the application server. As soon as the health check passes, the traffic is routed to the new version, and the old container shuts down.

MRSK redeploy mechanism. A new instance of the application is built and deployed. Once its health check passes, traffic is routed to the new instance, and the old one is shut down.

After deployment, we can check that the new route is working:

$ curl 216.238.101.228/persons/v1/all

[
 {
 "id": 1,
 "firstName": "David",
 "lastName": "Bowie",
 "createdAt": "2023-05-07T17:41:45.580Z",
 "updatedAt": "2023-05-07T17:41:45.580Z"
 }
]

Rolling back updates with MRSK

MRSK gives us a safe path for rolling back updates. If the new version is causing trouble, we can return to the last working version with a single command.

To rollback, first, we need to find which images are available on the server with the following:

$ mrsk app containers

App Host: 216.238.101.228

CONTAINER ID   IMAGE                                                          COMMAND                  CREATED         STATUS                     PORTS      NAMES

f648fec5f604   tomfern/addressbook:66347a86f8a123e35492dd43463540c23f7db892   "docker-entrypoint.s…"   4 minutes ago   Up 4 minutes               3000/tcp   addressbook-web-66347a86f8a123e35492dd43463540c23f7db892

b3b1d13b8a1c   9500e07b6387                                                   "docker-entrypoint.s…"   9 minutes ago   Exited (1) 4 minutes ago              addressbook-web-66347a86f8a123e35492dd43463540c23f7db892_05d41f3ba39d2b1b

ce4a5c31e6fc   tomfern/addressbook:f043325e3984ec245a94b21bd236afcc537a9739   "docker-entrypoint.s…"   3 hours ago     Exited (1) 9 minutes ago              addressbook-web-f043325e3984ec245a94b21bd236afcc537a9739

We can see that the previous version was tagged as "f043325e3984ec245a94b21bd236afcc537a9739". We can run mrsk rollback to go back to it. For example:

$ mrsk rollback f043325e3984ec245a94b21bd236afcc537a9739

If the container does not exist on the server — MRSK routinely prunes older images — you can always look up the last good version in Docker Hub or by checking the logs in your CI/CD platform.

MRSK rollback mechanism. The old version is restarted and health-checked. Once working, traffic is switched over to the old version, and the new container stops.

Once you're done with your application, you can remove everything, including the database, with mrsk remove. This will destroy the database, remove Docker and stop all applications.

MRSK’s limitations

MRSK development is ongoing, so we may expect behavior changes and new breaking features in the future. But do not mistake its newness for immaturity; 37signals has reportedly begun using it to move away from the cloud with great success already.

That being said, there are a few things MRSK cannot do, and that makes it a bad fit for some use cases:

The default behavior is to expose the application's HTTP port without SSL. You are supposed to set up an SSL terminator, a load balancer, or a CDN in front of the whole deployment.
Containers cannot communicate between themselves within the same server. This is by design. You can always configure Docker networks server-side manually. But it's better to stick to one server per role, so putting the application and the database on the same machine is a bad idea.
You will need to configure a firewall in front of your whole setup to ensure users can only access the web application.
If you want to distribute the load among many servers, you should set up a load balancer in front of your servers. MRSK does not provide this feature.

MRSK is designed for each server to have one role and does not provide load balancing or SSL termination. You need to handle it by yourself.

At the end of the day, MRSK offers a simplified workflow — especially when compared with beasts like Kubernetes — by not trying to handle every aspect of the system.

Deploying applications with MRSK and CI/CD

MRSK can be easily integrated into Continuous Integration and Delivery to get automated deployments. This section will show how to configure a Semaphore CI/CD workflow for continuous deployment. If you've never used Semaphore, check out our getting started guide.

First, we'll need to configure a couple of secrets. Secrets are Semaphore's way of storing sensitive data such as passwords or API keys.

Log in or create an account on Semaphore. Then, go to your organization in the top right corner and select Settings > Secrets.

Use the New Secret button to create a secret called "mask-deploy". Set the values as follows. - POSTGRES_PASSWORD: with the database "postgres" password. - Upload your SSH keys (public and private) to the path /home/semaphore/.ssh/ as shown in the picture below. These are needed for Semaphore to access your servers via SSH.

Next, create a second secret called "dockerhub" with your Docker Hub credentials, as shown below. This is required to push images into the Docker Registry.

Now open or add your project in Semaphore. This will connect Semaphore to your repository. If you don't already have a CI pipeline, you can select one of the examples or create your own.

Select +Add Promotion to create a continuous deployment pipeline. Select the new pipeline and type the following commands in the Prologue. This pins the MRSK and Ruby versions, so updates keep our automated deployment intact.

Now click on Add Block and type the following commands in the job. Then, enable the "dockerhub" and "mask-deploy" secrets.

echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin
checkout
mrsk build create
mrsk redeploy

Finally, we'll create a job to migrate the database, in case the new version includes schema changes. Select Add Block and type the following commands in the job.

checkout
mrsk app exec --reuse "npm run migrate"

Enable the "mask-deploy" secret and select Run the workflow. Wait for the CI pipeline to be done. Then, press the Deploy to Prod button to run the deployment pipeline.

The final continuous deployment pipeline.

And we're done! Congratulations on automating your deployment with CI/CD and MRSK. You can see the final state of the code and pipeline in the "mrsk" branch of the demo repository:

TomFern/dockerizing-nodejs

Troubleshooting

If you experience issues with deployment try the following:

If the healthcheck fails, ensure tht curl is installed in the application container image.
If it still fails after ensuring curl is installed, check that connectivity works between the application and the database. You may need to configure a VPS or set firewall rules.
You can try starting the images manually by running the docker exec commands in the machines to see the output of the logs. These will help you find the root cause of the problem.

If you get an error message stating that there is a lock, run mrsk lock release. This can happen when MRSK fails during deployment and the admin lock is not released.

Conclusion

MRSK is minimalistic to the point of elegance. By combining traditional servers with the flexibility of containers, it allows us to run our containerized services with ease on any server.

MRSK presents a compelling solution if you're seeking to optimize your software deployments, reduce vendor lock-in, and maintain flexibility in choosing deployment environments while enjoying the benefits of cloud-native architectures.

A First Look at Neon: A Postgres Database That Branches

Tomas Fernandez — Mon, 06 Mar 2023 21:25:05 +0000

Photo by Damir Kopezhanov on Unsplash

Relational databases have a long, long history. The first appeared in the 1970s, and while the technology has certainly evolved, the relational model has proved the most popular over the last 5 decades.

Is there room for innovation after 50 years of history? The folks at Neon are showing us that one can, in fact, teach an old dog new tricks.

What is Neon?

Neon is an open-source (Apache 2.0) alternative to AWS Aurora or Google's Cloud SQL for Postgres. Neon is a serverless, scalable implementation of PostgreSQL that you can run on-premise or try through its managed service.

Neon decomposes the PostgreSQL architecture into two layers: compute and storage. The compute layer consists of stateless PostgreSQL running on Kubernetes, allowing pods to be scaled on demand — even to zero.

Persistence is achieved with the Neon storage engine, a custom-made layer that handles both transactions and data. The transaction log is processed through a set of redundant safekeeper services, while data pages are managed on disk by the pageserver.

Neon architecture splits the database into scalable compute and storage planes. Compute nodes can be started and stopped depending on demand.

Currently, the managed service is running a free tech preview with some limitations that we'll discuss later.

Neon's killer feature: branches

Neon was launched in June 2021. Being a new project, the managed service may have fewer features than the competition. But Neon has one feature that, to my knowledge, no one else has: branches.

Every developer is familiar with branches. In Neon, branches work pretty much the same as in Git, except they cannot be merged (although there are plans to add schema-based merging in the future). You can, at any point, branch off the main trunk, effectively creating an “alternate timeline".

Since branches in Neon are writable, this feature allows us to do things no other database engine can do. For instance:

Freely experiment without impacting the main branch.
Instantly back up the database. So, if data is lost by mistake, we can switch to the last good branch.
Simplify integration testing. Developers can run tests in disposable test-specific branches.
Safely try out automated database migrations on production.
Run analytics or machine learning workloads in isolation.

Instantly duplicate all the databases that serve to a specific cluster of microservices.

You can't do any of these things on traditional database engines. Not easily at least. Some database engines like SQL Server have snapshots, which indeed can create instant copies of a database. But snapshots are read-only and this limits their utility. On most database engines, we have to resort to clunkier mechanisms like backup and restore or replication.

A few use cases for Neon branches.

Branches are per-project. And a project can have multiple databases. That means that creating a branch duplicates all the databases in that project. We can take advantage of this project > database hierarchy to clone a group of related databases in one operation.

Getting started with Neon

Let's try out Neon's managed service. To create a tech preview free account, just follow these steps:

Sign up at neon.tech/sign_in.
Click on Create a project.
Click on Download env.txt. This file contains everything you need to connect to the database instance.
Click on Settings and copy the project id.

We also need to generate an API Key, as shown below:

Click on your avatar and select Account > Developer Settings > Create new API key
Edit env.txt and add the following lines:
- export NEON_API_KEY=Your-API-Key
- export PROJECT_ID=Your-Project-ID
Add the keyword export before every variable.

The final env.txt file should look like this example:

# Connection details
export PGHOST=ep-random-name.us-east-2.aws.neon.tech
export PGDATABASE=neondb
export PGUSER=Tommy
export PGPASSWORD=sekret1

# Connection string
export DATABASE_URL=postgres://Tommy:sekret1@ep-random-name.us-east-2.aws.neon.tech/neondb

# Neon config
export NEON_API_KEY=MyApiKey
export NEON_PROJECT_ID=random-name-140532

We’ll need this file to connect to the Neon database and API.

The Neon UI

The Neon dashboard includes an SQL editor to run commands and controls for creating branches or endpoints.

The managed service dashboard.

On the Branches page, we’ll find options for creating a new branch. Here, you can select the what and when. You must choose the parent branch and how much data to include:

Head: the new branch is a copy of the current database.
Time: the branch has the parent's data up to a specified date and time.
LSN: the branch has the parent's data up to a specified log sequence number.

The create branch UI.

Endpoints for the branches can be created on the same screen or on the Endpoints page.

Connecting to Neon

Neon is a PostgreSQL database, so we'll need to install the client tools. Check which version yours is running with:

$ psql --version
psql (PostgreSQL) 15.1 (Ubuntu 15.1-1.pgdg20.04+1)

Neon works best with versions 14 and 15 of the client tools. So, if needed, head to postgresql.org/download to get the latest release.

Now, let's source the env.txt and try connecting.

$ source env.txt
$ psql

psql (15.1 (Ubuntu 15.1-1.pgdg20.04+1), server 14.6)

neondb=> SELECT version();
 version
---------------------------------------------------------------------------------------------------
 PostgreSQL 14.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

You can also do browser-based authentication by running the command shown below. This will open a browser window and let you select the project you want to connect to.

$ psql -h pg.neon.tech

We may also test the Neon API key with curl:

$ curl -s 'https://console.neon.tech/api/v2/projects' \
 -H 'accept: application/json' \
 -H "Authorization: Bearer $NEON_API_KEY"

{
 "projects": [
 {
 "id": "calm-guy-140532",
 "platform_id": "aws",
 "region_id": "aws-us-east-2",
 "name": "test-project",
 "provisioner": "k8s-pod",
 "pg_version": 14,
 "locked": false,
 "created_at": "2022-12-22T18:16:13Z",
 "updated_at": "2022-12-22T18:16:13Z",
 "proxy_host": "us-east-2.aws.neon.tech"
 }
 ]
}

Automated branching with CI/CD

One of the most delicate parts of deployment is the database migration step because there is always the possibility of data loss. There are several techniques to make this process safer. But with Neon, we can leverage branches to make the process infallible.

You can check out a live example for running branches with continuous integration and delivery by checking out this tutorial.

Technical preview limitations

At the time of writing Neon is in a technical preview stage. While the managed service is entirely free, it comes with some limitations:

You can only have one project per user. But a project can have multiple databases.
A project can have up to nine branches in addition to the main branch.
You can have up to three endpoints. One is always reserved for the main database. That leaves only two endpoints accessible for two other branches.
The size limit is 3GB per branch on the free tier.
Point-in-time branches can only go up to seven days into the past.
There are no backup or restore options on the UI. The only alternative seems to run pg_dump neondb and take a remote backup.
There are a few observations around importing data from another PostgreSQL instance or a backup.
There is a 100 concurrent connections limit. You can enable connection pooling to raise the limit to 1,000 connections.

The good news is that if you like the database, you can always run it on-premise or in your cloud of choice to remove these limitations.

Conclusion

Neon's branching feature presents new options for development and database management. Even in its current technical preview stage, I can see great potential for this engine. Of course, we'll have to see how the project evolves, especially how the final pricing model will turn out.

I hope you found this project interesting, and if so, you might want to consider contributing to Neon.

Thanks for reading!

Getting Started With Astro

Tomas Fernandez — Fri, 03 Mar 2023 22:03:38 +0000

Photo by Ju Guan on Unsplash

Getting Started With Astro

Web development, like most technologies, moves in waves and cycles. Static websites were all we had in the beginning. But pretty soon developers were hacking together the first server-generated sites thanks to Perl and PHP. This was the advance that would eventually kickstart frameworks like Laravel, Django, or Rails.

Mobile devices would come to change how people consume the web. So long server-generated websites, hello client-rendered applications. The next wave brought frameworks that could give users a more app-like experience — without reloads — like React or AngularJS.

But while single-page applications did provide a smoother experience, they have their drawbacks. Namely, longer page loads caused by all the extra JavaScript that has to be parsed and executed. Not to mention all the work it took to optimize for search engines.

Astro is a prime example of the current wave that we have been in since Next.js, taking us full circle: a web framework that combines server and client rendering to get the best of both worlds.

What is Astro?

Astro is an open-source framework for generating web applications on top of popular UI frameworks like React, Preact, Vue, or Svelte. An Astro page is composed of several independent components. To improve load times, Astro strips away all JavaScript and prerenders pages in the server unless developers mark a component as interactive, in which case Astro will send the minimum amount of JavaScript required for interactivity.

Astro favors generating static and server-rendered sites without JavaScript in the browser. Developers can opt-in to client-side rendering for given routes or parts of the page.

Thanks to this strategy, Astro pages load quickly, as no JavaScript needs to execute for the first render. In a process called hydration, Astro will “pour” JavaScript into components to make them dynamic.

Why Astro?

Astro stands out from the other web frameworks in many ways:

Speed: Astro is statically and server-rendered when possible, making it an excellent fit for content-first, SEO-friendly websites like blogs, courseware, or marketing sites.
UI agnostic: unlike Next.js, which only works with React, or Nuxt.js–which requires Vue expertise, Astro doesn't force any UI framework on you. You can use none, roll your own, or use any of the many supported integrations.
Themes: Astro has an impressive array of ready-made themes and templates to get started quickly.
Easy to pick up: you don't need to learn React or Vue to get started with Astro. Its templating engine looks like plain HTML, and the code is cleanly separated from the presentation. You can pick an empty project and build it up at your own pace.
Batteries included: Astro comes with the features we expect from a modern JavaScript/TypeScript framework like top-level awaits, Markdown and MDX support, and ESM includes.
Islands: the island architecture allows us to mix static content, server-rendered, and client-rendered components on the same page without conflicts, allowing us to build interactive sites with minimal overhead. We can even mix different frameworks in the same page, giving us yet another way of building microfrontends. We'll see more about how islands work in a bit.

Getting started with Astro

To get started with Astro, install Node version 16.12.0 or higher and run the following command. Follow the on-screen wizard and choose to create an empty project when prompted:

$ npm create astro@latest

 astro   v1.9.1 Launch sequence initiated.

✔ Where would you like to create your new project? … awesome-website
✔ How would you like to setup your new project? › an empty project
✔ Template copied!
✔ Would you like to install npm dependencies? (recommended) … yes
✔ Packages installed!
✔ Would you like to initialize a new git repository? (optional) … yes
✔ Git repository created!
✔ How would you like to setup TypeScript? › Relaxed
✔ TypeScript settings applied!

  next   Liftoff confirmed. Explore your project!

Next, you can start the website in developer mode by entering the directory of the project you just created and running:
npm run dev and visiting http://localhost:3000.

Pages and routes

The interesting stuff in Astro happens inside the src folder. Checking what's there, we see a single directory called pages with an index.astro file.

Astro pages are a mix of HTML, Javascript, or TypeScript. This is the default index.astro:

---
---

<html lang="en">
    <head>
        <meta charset="utf-8" />
        <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
        <meta name="viewport" content="width=device-width" />
        <meta name="generator" content={Astro.generator} />
        <title>Astro</title>
    </head>
    <body>
        <h1>Astro</h1>
    </body>
</html>

You may have noticed that Astro files start with fences delineated with ---. Whatever code we put inside these fences is executed on the server and prerendered before serving the page.

Below the frontmatter, we find the content for this route, an augmented HTML form that allows for variables. We can, for example, define a variable in the frontmatter and use it in HTML like this:

---
// src/pages/index.astro
const title = "Astro";
---

<html lang="en">
    <head>
        <meta charset="utf-8" />
        <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
        <meta name="viewport" content="width=device-width" />
        <meta name="generator" content={Astro.generator} />
        <title>{title}</title>
    </head>
    <body>
        <h1>{title}</h1>
    </body>
</html>

Astro uses a file-based routing, so every file in the pages folder is mapped to a route in the website. For example, if we create a file called greetings.astro, we should see its contents at http://localhost:3000/greetings.

    ---
    const greeting = "Hello, world!";
    ---

    <html lang="en">
        <head>
            <meta charset="utf-8" />
            <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
            <meta name="viewport" content="width=device-width" />
            <meta name="generator" content={Astro.generator} />
            <title>Astro</title>
        </head>
        <body>
            <h1>{greeting}</h1>
        </body>
    </html>

In addition to .astro files, Astro can parse Markdown, MDX, JSX JavaScript, and TypeScript files. For instance, if we want to write a blog post in Markdown, we create a post.md file under the pages folder. Visiting the route will make Astro convert it into HTML on the fly:

---
title: 'Learning Astro'
pubDate: 2023-01-10
description: 'A post written in Markdown.'
author: 'Tommy'
---

# Learning Astro

This Markdown file should be rendered as HTML when I visit http://localhost:3000/post

Components

Astro components are *.astro files with reusable code and HTML. We can use components to write elements like headers, footers, navigation bars, buttons, and forms — anything that can be expressed as HTML can constitute a component.

Let's create our first component in src/components/Header.astro:

---
// src/components/Header.astro
---
<meta charset="utf-8" />
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<meta name="viewport" content="width=device-width" />
<meta name="generator" 
<title>Astro</title>

Once defined, we can import it to any page (or other components) and use it like this:

---
import Header from "../components/Header.astro";
---

<html lang="en">
    <head>
        <Header />
    </head>
    <body>
    </body>
</html>

Astro components are no different from pages. Any code defined between fences is executed on the server. JavaScript is stripped out before sending the content to the browser.

Layouts

Layouts are used to refine reusable UI structures. They are technically components, so the syntax stays the same.

Let's replace the content of index.astro with a layout:

---
// src/pages/index.astro
import SiteLayout from "../layouts/SiteLayout.astro";
---

<SiteLayout></SiteLayout>

As you can see, layouts are stored in the src/layouts folder by convention.

Layouts, like components, can include other components. Here we have extracted the structure in index.astro and added a Footer component:

---
// src/layouts/SiteLayout.astro
import Header from "../components/Header.astro";
import Footer from "../components/Footer.astro";
---

<html lang="en">
    <head>
        <Header />
    </head>
    <body>
        <Footer />
    </body>
</html>

Props and slots

So far, our website has been completely static. In order to pass data
between pages and components, we need to understand how props and
slots work.

Components and layouts can define and accept props(short for properties) via the global Astro.props. Values passed through props are accessible to the component before rendering.

We can read props in our component like this:

---
// src/components/Header.astro
const { title } = Astro.props;
---

<meta charset="utf-8" />
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<meta name="viewport" content="width=device-width" />
<meta name="generator" 
<title>{title}</title>

The value of title can be supplied when the component is called, which in the following example happens through the SiteLayout layout.

---
// src/layouts/SiteLayout.astro
import Header from "../components/Header.astro";
import Footer from "../components/Footer.astro";
---
<html lang="en">
    <head>
        <Header title = "Welcome my Astro Blog!" />
    </head>
    <body>
        <Footer />
    </body>
</html>

⚠️ Note that you need spaces surrounding the equal sign, i.e. title="Hello" is NOT correct. Instead, it should be: title = "Hello".

Slot elements create placeholders for content to be injected later. To see how it works, we can add a <slot /> element in src/layouts/SiteLayout.astro:

---
// src/layouts/SiteLayout.astro
import Header from "../components/Header.astro";
import Footer from "../components/Footer.astro";
---

<html lang="en">
    <head>
        <Header title = "Welcome my Astro Blog!" />
    </head>
    <body>
        <slot />
        <Footer />
    </body>
</html>

Now the HTML inside <SiteLayout> is injected into the point where the slot is located.

---
// src/pages/index.astro
import SiteLayout from "../layouts/SiteLayout.astro";
---

<SiteLayout>
    <p>This content is rendered in the slot</p>
</SiteLayout>

Nesting components, layouts, props, and slots give us the flexibility to create reusable UI files across the website.

Islands and hydration

Up to this point, we have not shipped any JavaScript to the user; everything is prerendered and served as pure HTML+CSS. How do we send make Astro send JavaScript to the browser? For that, we need to understand the islands architecture.

The island architecture pattern aims to reduce the amount of JavaScript needed browser-side. Less JavaScript means less data sent and less computational power required on the user's device. An island is an autonomous component that bundles HTML, CSS, and — optionally —
JavaScript. In the islands pattern, a page is made up of several independent islands.

The islands architecture allows combining static HTML, server-rendered content, and interactive client-rendered components on one page without conflicts.

Each island is prerendered, so there is no interactivity right after the page loads. Once the initial page is ready, the islands are converted into interactive content in a process called hydration. Hydration is a technique that converts static content delivered via static hosting or server-side rendering into a dynamic page by attaching event handlers to the HTML elements.

How hydration works. The first contentful paint happens right after all the HTML has been sent to the browser, which typically happens very quickly. The browser then requests and executes all the client-side JavaScript needed to make the page fully interactive.

Using Islands

An example showing how Astro implements islands is in order. Let's first add a UI integration to the project. The following command installs @astrojs/preact and preact.

$ npx astro add preact

Let's create a simple button to test the integration:

// src/components/MyButton.jsx

export default function MyButton() {

const clicked = () => { console.log('Hello!') };

return (
    <div>
      <button style={{ color: 'purple' }} onClick={clicked}>Click me</button>
    </div>
  )
}

As usual, Astro will try to strip away any JavaScript. So nothing would happen if we instantiated the component with <MyButton />. We need to tell Astro that we want this component treated as an island and hydrated accordingly by adding the template directive client:load:

---
import MyButton from "../components/MyButton.jsx";
---

<html lang="en">
    <body>
        <MyButton client:load />
    </body>
</html>

Clicking the button should print "Hello!" in the browser console.

The client directive caused Astro to hydrate the component. There are five levels of hydration with different priorities:

client:load hydrates the component as soon as possible.
client:idle hydrates the component when the page is done loading. Useful for low-priority components that don't need immediate interactivity.
client:visible={string} hydrates the component as soon as it appears in the viewport.
client:media={string} takes a CSS query as an argument and loads the component as soon as it's fulfilled.
client:only skips HTML rendering entirely and renders the component in-browser.

Deploying Astro application with CI/CD

Being a Node application, setting up an Astro build with CI/CD is very straightforward. We only need a couple of jobs.

A possible continuous integration pipeline for Astro.

The first job runs npm ci to populate node_modules. We use sem-version to select a current Node version.

checkout
sem-version node 19.4
npm ci
cache store

The second job runs npm run build and stores the built application (located in the dist folder) as a workflow
artifact.

checkout
sem-version node 19.4
cache restore
npm run build
artifact push workflow dist/

Once the site is built, we can configure continuous
deployment.

Deployment targets

Depending on the nature of the web application, Astro can be deployed as a static site à la Hugo or Gatsby, in which case we only need something as simple as an S3 bucket or GitHub pages, or as a full-fledged server-side rendered (SSR) application, where we need JavaScript- or TypeScript-enabled endpoints.

Astro has built-in support for various popular deployment targets such as Netlify, Firebase, Vercel, and Deno. Some only support either SSR or static hosting, while others can do both.

Once we have chosen our deployment method, we can add a continuous deployment pipeline to automatically deploy the website on every change.

Here's an example deployment pipeline that targets a Netlify static site.

The complete CI/CD workflow with continuous deployment to Netlify.

For reference, the job commands are as follows. This supposes that we have already obtained an API token and saved it as a secret on Semaphore with the env variables NETLIFY_TOKEN and NETLIFY_SITE.

checkout
artifact pull workflow dist
npm install -g netlify-cli
netlify deploy --dir=dist --prod --auth $NETLIFY_TOKEN --site $NETLIFY_SITE

Conclusion

The popularity of this project has been nothing short of astronomical: in the first five months since the Astro 1.0 release, the project has amassed over 25,000 stars on GitHub. The success of Astro is not accidental. The Astro team has made a web framework that offers great ergonomics for developers and fast-loading websites for users, even if they are on low-power devices or slow connections.

Thanks for reading, and happy building!

Ruby Adds Support for WebAssembly: What is WebAssembly and how it benefits Ruby devs?

Tomas Fernandez — Fri, 10 Feb 2023 21:18:38 +0000

Photo by Joshua Fuller on Unsplash

Ruby has joined the ranks of languages capable of targeting WebAssembly with its latest 3.2 release. This seemingly minor update might be the biggest thing that has happened to the language since Rails, as it lets Ruby developers go beyond the backend. By porting their code to WebAssembly, they can run it anywhere: on the frontend, on embedded devices, as serverless functions, in place of containers, or on the edge. WebAssembly has the potential to make Ruby a universal language.

What is WebAssembly?

WebAssembly (commonly shortened as Wasm) is a binary low-level instruction format that runs on a virtual machine. The language was designed as an alternative to JavaScript. Its aim is to run applications on any browser at near-native speeds. Wasm can be targeted from any high-level language like C, Go, Rust, and now also Ruby.

Wasm became a W3C standard in 2019, opening the path to writing high-performing applications for the Web. The standard itself is still evolving, and its ecosystem is growing. Currently, this technology is receiving a lot of focus from the Cloud Native Computing Foundation (CNCF), with several projects under development.

Wasm's design sits on two pillars: portability and security. The Wasm binary can run on any modern browser, even mobile devices. For security, Wasm programs run in a sandboxed, memory-safe VM. As such, they cannot access any system resources: they can’t change the filesystem or access the network or memory.

WebAssembly brings portability to the next level

Let’s say you want to build an application targeting many systems, e.g. Linux, Windows, and macOS. What are your options?

You could use a compiled language like C and build a binary for each target.

Compiler portability creates multiple executable files

Or, if you can rely on having the appropriate runtime installed you could choose an interpreted language like JavaScript or one that compiles to bytecode like Java.

Code is compiled into an intermediate bytecode. This system relies on having a runtime environment, or VM, installed on the client.

What if you have a container runtime in the client? In that case, you could build a Docker image for each platform type.

Code is compiled into platform-dependent images. A container runtime is required for clients, which pulls the correct image automatically.

For Ruby developers historically, the only option was to distribute the code. That meant that users had to install the Ruby interpreter (or developers had to package the interpreter along with the application) to run the application.

Code is shipped directly to users, who must have the interpreter installed in their systems to be able to run it.

All these mechanisms provide portability, but at a cost: you must build, test, and distribute many images. Sometimes, you must also ship a suitable runtime with the release or tell the user to install it independently.

WebAssembly (shortened as Wasm) takes portability to the next level: it allows you to build ONE binary and run it in any modern browser.

WebAssembly compiles into a low-level assembly that every modern browser can execute. As a result, the same Wasm binary can run, unmodified, on every platform (even mobile).

The ability to run code at native speed has allowed developers to build sites like Figma, and Google Earth or even run Vim in the browser.

Ruby adds support for WebAssembly

The latest Ruby release ships with a Wasm port of the interpreter. Therefore, we can run Ruby code directly in the browser without the need for a backend.

As you can see in the example below, all it takes to get started with the Ruby Wasm port is a couple of lines. The script downloads ruby.wasm and instantiates the interpreter in the browser. After that, it takes the text of text/ruby type and feeds it into the WebAssembly program.

<html>
  <script src="https://cdn.jsdelivr.net/npm/ruby-head-wasm-wasi@0.5.0/dist/browser.script.iife.js"></script>
  <script type="text/ruby">
    puts "Hello, world!"
  </script>
</html>

You can confirm that Ruby is running from the browser, i.e. not connecting with a backend, by opening the developers' tools. Here, you'll find once ruby.wasm is downloaded, no further connections are needed.

Traditionally, JavaScript has been touted as the best language to learn because you have it everywhere. With WebAssembly, everyone can learn and experiment with Ruby using a browser. The output is printed in the developer's console.

You can even see the contents of ruby.wasm disassembled into text format in the “Sources” tab:

We can see the downloaded Wasm file in the browsers web developer tools.

You can check out the Wasm port online at the Ruby playground.

Working with the sandbox

As said, Wasm programs run in a sandboxed VM that lacks access to the rest of the system. Therefore, Wasm applications do not have access to the browser, filesystem, memory or the network. We'll need some JavaScript code to send and receive data from the sandbox.

The following example shows how to read the output of a Ruby program and make changes to the page using the ruby-head-wasm-wasi NPM package:

<html>
  <script src="https://cdn.jsdelivr.net/npm/ruby-head-wasm-wasi@latest/dist/browser.umd.js"></script>
  <script>
    const { DefaultRubyVM } = window["ruby-wasm-wasi"];
    const main = async () => {
      const response = await fetch(
        "https://cdn.jsdelivr.net/npm/ruby-head-wasm-wasi@latest/dist/ruby.wasm"
      );
      const buffer = await response.arrayBuffer();
      const module = await WebAssembly.compile(buffer);
      const { vm } = await DefaultRubyVM(module);

      vm.printVersion();
      vm.eval(`
        require "js"
        luckiness = ["Lucky", "Unlucky"].sample
        JS::eval("document.body.innerText = '#{luckiness}'")
      `);
    };

    main();
  </script>
  <body></body>
</html>

The same package can also run Ruby code inside a Node project, allowing you to mix Ruby and JavaScript on the backend. You'll need to install the NPM package ruby-head-wasm-wasi for the example to work:

import fs from "fs/promises";
import { DefaultRubyVM } from "ruby-head-wasm-wasi/dist/node.cjs.js";

const main = async () => {
 const binary = await fs.readFile(
 // Tips: Replace the binary with debug info if you want symbolicated stack trace.
 // (only nightly release for now)
 // "./node_modules/ruby-head-wasm-wasi/dist/ruby.debug+stdlib.wasm"
 "./node_modules/ruby-head-wasm-wasi/dist/ruby.wasm"
 );
 const module = await WebAssembly.compile(binary);
 const { vm } = await DefaultRubyVM(module);

 vm.eval(`
 luckiness = ["Lucky", "Unlucky"].sample
 puts "You are #{luckiness}"
 `);
};

main();

Running ruby.wasm outside the browser

While Wasm's primary design goal is running binary code in the browser, developers quickly realized the potential of a fast, safe, and universally portable binary format for software delivery. Wasm has the potential to become as big a Docker, greatly simplifying application deployment for embedded systems, serverless functions, edge computing, or as a replacement for containers on Kubernetes.

Running a Wasm application outside the browser requires an appropriate runtime that implements the WebAssembly VM and provides interfaces to the underlying system. There are a few competing solutions in this field, the most popular being wasmtime, wasmer, and WAMR.

The Ruby repository provides a complete example for bundling your application code into a custom Ruby image.

Limitations

Let’s remember that this is all cutting-edge tech. The whole Wasm ecosystem is moving fast. Right now, Ruby Wasm has a few limitations which significantly limit its usability in big projects:

No thread support.
Spawning processes does not work.
No network support.
The garbage collector can create memory leaks.
Gems and modules are unavailable unless you build a custom Wasm image.

The future is bright

WebAssembly opens a world of exciting possibilities. It allows Ruby developers to escape the backend. As tooling around WebAssembly improves, Ruby will be able to reach new frontiers: the browser is no longer off-limits, and there will be new opportunities to run Ruby on the edge and as serverless applications.

With the latest release, Ruby developers can begin experimenting with WebAssembly. It's the first step, for sure, and there is much more work to do before we see complex Ruby applications running in with this technology.

Thanks for reading, and happy assembling!

Release Management: Is Your Product Ready for Success?

Tomas Fernandez — Wed, 08 Feb 2023 19:24:42 +0000

Photo by Afif Ramdhasuma on Unsplash

Release Management: is your product ready for success?

You never get a second chance to make a first impression. So how do you make sure it’s a good one? This matter is constantly in the mind of release managers and product owners, because they know that once a product is set loose into the world all bets are off.

Although it may be tempting to release a product or new feature as soon as it is functional, users expect (and deserve) quality and polish. Before even reaching the beta test stage, a lot of non-technical work must be done: documentation must be written, security must be assessed, and business objectives must be established. These are some of the things we have learned about software release at Semaphore.

Project stages

From idea to release, we can break a software project into roughly three stages:

Design and development: comprises everything from idea to MVP or working feature. We’ve discussed design patterns in other blog posts, so I won’t spend too much time on that at this point. All I will say is that there is no shortage of practices to help us uncover good designs, like writing acceptance criteria and following Test-Driven Development, Behavior-Driven Development, or Domain-Driven Development methodologies.
Tech preview: the sharp blade of the reality check. We bring a few users to get their early feedback so we can make improvements and reduce uncertainty at the release.
Release: also known as General Availability (GA). In this last stage, we make the release available to all users.

The stages are separated by checkpoints. Of which we have two kinds Definition of Ready and Definition of Done:

Definition of Ready (DoR): these are all required tasks before users can access the system or experience the feature.
Definition of Done (DoD): consists of all the follow-up tasks closing a stage, like gathering user feedback, completing outstanding tickets, or doing a retrospective.

We cannot enter a new project stage or consider it done until all the tasks in DoR and DoD are ticked off.

The ready checkpoint (DoR) dictates if users can be allowed into our system. Thus, the technical preview can begin only once we meet the DoR. It does not end until we’ve completed all the tasks in the DoD. The same thing happens during GA release.

The technical preview

The technical preview is an opportunity to get the user's feedback before the final release. We must bring them into the mix before evaluating whether a feature or product works. During the preview stage, we deploy the latest release in production and use feature flags to control who can see the previewed feature.

We can deploy the latest release in production and use feature flags to select which users access the previewed feature.

Most users understand that they may find some kinks and will be happy to help us sort them out. Therefore, we can start the tech preview even if the application is not perfect (hint: it never will be).

That being said, there are some precautions we need to take before any user is let into our systems. Here's where the DoR for the technical release comes into focus.

Definition of Ready for the technical preview

Let’s say we’re shipping a new feature in our product. Early feedback from real users is essential for its success. We can reach out to users who may benefit from the feature or be interested in testing it out.

How do we know when we’re ready to invite users to check it out? We define a list of tasks and conditions that must happen before we can define our system as ready. These are tasks that are required to be complete for every feature before it can be shown to the users:

Performance metrics: every new component shipped should be tracked and added to the monitoring dashboard to observe its performance and stability. Semaphore, for instance, monitors the four golden signals (latency, traffic, errors, and saturation).
Utilization metrics: we need to make sure if the new features are being used and how.
Audit Logs: shipping anything to production that's not covered in audit logs is a security issue. Every event should be logged to have full audit compliance.
Security Assessment: we must ensure the new components don’t have any critical vulnerabilities. This is important for two reasons. First, to protect users from security exploits. And second, to comply with any security standards the company follows, such as ISO 27001.
Deployment to production: the release should be deployed to production. This can be the primary productive environment or a staging copy that's as close as possible to the real thing.
Preliminary documentation: the users will need instructions about what changed and how to use it. It doesn’t need to be full-blown documentation yet.

The list can be expanded to include more items as needed.

Once we have our DoR defined and fulfilled, we’re ready to start the technical preview. The feature passes to the “in-review” state, and we can begin the feedback-gathering process that may last weeks or months.

💡 Instead of technical previews, the engineering team at Semaphore used to have two kinds of betas: public and private, which had different processes and requirements. Having two sets of similar-but-not-quite-the-same DoRs and DoDs ended up being unnecessarily complicated. So, the processes were streamlined into one. Users are invited to the technical preview based on needs and interest in a given feature.

Definition of Done for the technical preview

The technical preview ends when we've gathered enough feedback and refined the system to the point we consider it ready for release.

Before we can close the technical preview, however, there are some tasks to complete, which will depend on the DoD for our project.

The items in the DoD bucket vary, as they will change from team to team or from company to company. For Semaphore, the list includes:

Gathering feedback: the whole point of this exercise is to gather actionable feedback from the users. Feedback lets us better incorporate the voice of the customer into the system. It lets us find bugs and improve usability.
Cleaning up any remaining tasks: address and close outstanding tickets. Ensure all the problems and feedback obtained during the technical preview have either been fixed or have mitigation plans in motion.
Communicating internally: everyone involved in supporting production should be notified about any problems found during the technical preview (and their solutions).
Setting up alerts: we need to be aware of a problem in production as soon as it happens. This includes configuring Slack/Teams alerts, setting up pagers, and ensuring that the on-call personnel are on the same page.

General availability release

The moment of truth we’ve prepared for so hard is close at hand. Users will not be as forgiving during a general availability (GA) release as during a tech preview. Problems might disrupt their work. That’s why we went to such lengths to minimize the chance of problems in the release. However, no amount of preparation will guarantee a successful release. So, we need to prepare for any issues that might arise.

Definition of Ready for release

The software might be ready to release, but that doesn't mean we are ready for the release. So, we must define and fulfill the GA release DoR checklist before going into the next step.

As always, every project has different needs. As a guideline, here’re are some of the tasks that are considered necessary at Semaphore:

Writing the public docs: public docs must be updated with the details of the new feature. In addition, it should be clear which plans have support for it.
Skilling up internally: everyone in the company, especially support and customer-facing staff, should be briefed on the upcoming changes.
Setting KPIs: based on technical preview utilization, we can set business objectives and key performance indicators (KPI) for the feature in question. This may include new revenue expected, utilization metrics, or feature adoption.
Writing a change log: the change log should include everything that has changed since the last release.
Setting up alerts: we need to be aware of any problem with the new features shipped as soon as it happens. This includes Slack/Teams alerts and pager notifications for the on-call personnel.
Planning a marketing strategy: the marketing strategy communicates the changes and generates user interest. A good campaign can bring new customers or convert free users into paid ones.

Definition of Done for GA

With some luck, the release went smoothly, and users are happy with the new features. What now? Can we move to the next project? Not yet. This project is not done until all items in the DoD are checked off.

What's typically left at this point is doing some cleanup work and tidying up loose ends:

Clean leftover tickets: after shipping the new features to GA, there might be some remaining low-priority tasks. They should be addressed or scheduled to be worked on later.
CLI and API Support: provide support to the new feature via public APIs or the CLI tools, because this makes it easier for users to access resources from the terminal and integrate them into their custom tooling. If this support is integral to the new feature, we can move this item to the DoR checklist.
Do the project retrospective: The retrospective is one of the most critical phases of a project. It’s a space for the team to discuss what went well and what can be improved for the next project. Also, it allows the team to identify potential pitfalls early and resolve conflict areas.

Conclusion

It's a mistake to think that the hard part of software development is coding. In reality, the real challenge comes when you open the floodgates and users pour in. Unless everything is prepared, issues can compound and overtake any engineering team. Having a formal definition of what constitutes being ready and done serves us to avoid rushing through a release.

Thanks for reading, and happy releasing!

Measuring Page Speed with Lighthouse

Tomas Fernandez — Tue, 10 Jan 2023 21:49:31 +0000

Photo by Paulius Dragunas on Unsplash

Page speed matters more than you think. According to research by Google, the probability of users staying on your site plummets as loading speed slows down. A site that loads in 10 seconds increases the bounce rate by a whopping 123%. In other words, speed equals revenue.

Source: find out how you stack up to new industry benchmarks for mobile page speed.

How can we ensure that our pages are loading at top speed? The answer is to measure them regularly with Lighthouse and CI/CD.

Measuring page speed with Lighthouse

Lighthouse is a page speed benchmark tool created by Google. It runs a battery of tests against your website and produces a report with detailed advice to improve performance.

Lighthouse running inside Google Chrome.

⚠️ You might be surprised at the low scores Lighthouse presents. This is because the tool simulates mid-tier mobile devices on a 4G connection.

While Lighthouse's primary focus is performance, it can assess other things like:

Accessibility: suggests opportunities to make the content more accessible. This covers ensuring that the page is optimized for screen readers, all elements have labels, or the site is browsable with the keyboard.
Best practices: checks for various sane practices that improve speed and security.
SEO: performs various checks to ensure that the page is SEO-optimized.
PWA: ensures the page passes progressive web application tests, which improves user experience on mobile devices.

4 ways of running Lighthouse

Lighthouse is an open-source project that you can run in different ways:

Since it is included in Google Chrome, you can run it directly from the browser. Click on More tools > Developer Tools and open the Lighthouse tab.
If you have Node installed, you can run npm install -g lighthouse and run the tool in the command line like this: lighthouse https://semaphoreci.com
You can include it in your code as a Node package.
And finally, Lighthouse has a CI version you can run in your continuous integration. We’ll use this method to schedule periodical benchmarks.

If you want to learn how to run Lighthouse with CI/CD to get a report on every change, check out this tutorial: Setting up Lighthouse CI

Setting up a dashboard

The Lighthouse CI project includes an optional dashboard that lets you browse historical data and find trends.

Lighthouse CI dashboard.

Installing the dashboard requires a separate server and database. You’ll need a dedicated machine and persistent storage to save historical data.

The downside of this approach is obvious — you need to manage yet another server. But it may be worth doing if you have a lot of sites to analyze.

Time is money

Users are drawn to fast and responsive websites. The problem is that measuring page speed reliably is challenging since you cannot assume that everyone is on a fast connection and uses a top-tier device. With Lighthouse in your CI/CD pipeline, you can get results closer to real-life conditions and insights to help you continually improve.

Thanks for reading!

Taming Cloud Costs with Infracost

Tomas Fernandez — Mon, 09 Jan 2023 18:25:43 +0000

Photo by Mathieu Stern on Unsplash

When we combine the cloud with IaC tools like Terraform and continuous deployment we get the almost magical ability to create resources on demand. For all its benefits, however, the cloud has also introduced a set of difficulties, one of which is estimating cloud costs accurately.

Cloud providers have complex cost structures that are constantly changing. AWS, for example, offers 536 types of EC2 Linux machines. Many of them have similar names and features. Take for example "m6g.2xlarge" and "m6gd.2xlarge" — the only difference is that the second comes with an SSD drive, which will add $60 dollars to the bill. Often, making a mistake in defining your infrastructure can cause your bill to balloon at the end of the month.

It’s so easy to go above budget.

We can set up billing alerts, but there are no guarantees that they will work. Alerts can happen during the weekend or be delayed, making us shoot past our budget in a few hours.

// Detect dark theme var iframe = document.getElementById('tweet-1542449611440328704-250'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1542449611440328704&theme=dark" }

So, how can we avoid this problem and use the cloud with confidence?

Enter Infracost

Infracost is an open-source project that helps us understand how and where we’re spending our money. It gives a detailed breakdown of actual infrastructure costs and calculates how changes impact them. Basically, Infracost is a git diff for billing.

Infracost has two versions: a VSCode addon and a command line program. Both do the same thing: parse Terraform code, pull the current cost price points from a cloud pricing API, and output an estimate.

You can use Infracost pricing API for free or host your own. The paid tier includes a cloud dashboard to track changes over time.

We can see the estimates right in the IDE:

Real-time cost estimation on VSCode.

Or as comments in pull requests or commits:

Cost change information in the PR.

Infracost also has an optional Infracost Cloud, which comes with a paid tier and includes features like Jira integration, custom price books, and a dashboard to keep track of costs over time.

The paid tier includes a dashboard to track spending over time.

Setting up Infracost

To try out Infracost, we’ll need the following:

An Infracost API key. You can get one by signing up for free at Infracost.io.
The Infracost CLI installed in your machine.
Some Terraform files.

Once the CLI tool is installed, run infracost auth login to retrieve the API key. Now we’re ready to go.

The first command we’ll try is infracost breakdown. It analyzes Terraform plans and prints out a cost estimate. The --path variable must point to the folder containing your Terraform files. For example, imagine we want to provision an "a1.medium" EC2 instance with the following:

provider "aws" {
 region = "us-east-1"
 skip_credentials_validation = true
 skip_requesting_account_id = true
}

resource "aws_instance" "myserver" {
 ami = "ami-674cbc1e"
 instance_type = "a1.medium"

 root_block_device {
 volume_size = 100
 }
}

At current rates, this instance costs $28.62 per month to run:

$ infracost breakdown --path .

 Name Monthly Qty Unit Monthly Cost

 aws_instance.myserver
 ├─ Instance usage (Linux/UNIX, on-demand, a1.medium) 730 hours $18.62
 └─ root_block_device
 └─ Storage (general purpose SSD, gp2) 100 GB $10.00

 OVERALL TOTAL $28.62

If we add some extra storage (600GB of EBS), the cost increases to $155.52, as shown below:

$ infracost breakdown --path .

 Name Monthly Qty Unit Monthly Cost

 aws_instance.myserver
 ├─ Instance usage (Linux/UNIX, on-demand, a1.medium) 730 hours $18.62
 ├─ root_block_device
 │ └─ Storage (general purpose SSD, gp2) 100 GB $10.00
 └─ ebs_block_device[0]
 ├─ Storage (provisioned IOPS SSD, io1) 600 GB $75.00
 └─ Provisioned IOPS 800 IOPS $52.00

 OVERALL TOTAL $155.62

Infracost can also calculate usage-based resources like AWS Lambda. Let's see what happens when we swap the EC2 instance for serverless functions:

provider "aws" {
 region = "us-east-1"
 skip_credentials_validation = true
 skip_requesting_account_id = true
}

resource "aws_lambda_function" "my_lambda" {
 function_name = "my_lambda"
 role = "arn:aws:lambda:us-east-1:account-id:resource-id"
 handler = "exports.test"
 runtime = "nodejs12.x"
 memory_size = 1024
}

Running infracost breakdown yields a total cost of 0 dollars:

$ infracost breakdown --path .

 Name Monthly Qty Unit Monthly Cost

 aws_lambda_function.my_lambda
 ├─ Requests Monthly cost depends on usage: $0.20 per 1M requests
 └─ Duration Monthly cost depends on usage: $0.0000166667 per GB-seconds

 OVERALL TOTAL $0.00

That can’t be right unless no one uses our Lambda function, which is precisely what the tool assumes by default. We can fix this by providing an estimate via a usage file.

We can create a sample usage file with this command:

$ infracost breakdown --sync-usage-file --usage-file usage.yml --path .

We can now provide estimates by editing usage.yml. The following example consists of 5 million requests with an average runtime of 300 ms:


resource_usage:
 aws_lambda_function.my_lambda:
 monthly_requests: 5000000 
 request_duration_ms: 300

We’ll tell infracost to use the usage file with --usage-file to get a proper cost estimate:

$ infracost breakdown --path . --usage-file usage.yml

 Name Monthly Qty Unit Monthly Cost

 aws_lambda_function.my_lambda
 ├─ Requests 5 1M requests $1.00
 └─ Duration 1,500,000 GB-seconds $25.00

 OVERALL TOTAL $26.00

That’s much better. Of course, this is accurate as long as our usage file is correct. If you’re unsure, you can integrate Infracost with the cloud provider and pull the utilization metrics from the source.

Git diff for cost changes

Infracost can save results in JSON by providing the --format json and --out-file options. This gives us a file we can check in source control and use as a baseline.

$ infracost breakdown --path . --format json --usage-file usage.yml --out-file baseline.json

We can now compare changes by running infracost diff. Let’s see what happens if the Lambda execution time goes from 300 to 350 ms:

$ infracost diff --path . --compare-to baseline.json --usage-file usage.yml

~ aws_lambda_function.my_lambda
 +$4.17 ($26.00 → $30.17)

 ~ Duration
 +$4.17 ($25.00 → $29.17)

Monthly cost change for TomFern/infracost-demo/dev
Amount: +$4.17 ($26.00 → $30.17)
Percent: +16%

As you can see, the impact is a 16% increase.

Integrating Infracost with CI/CD

We’ve seen how this tool can help us estimate cloud costs. That’s valuable information, but what role does Infracost take in continuous integration? To answer that, we must understand what infracost comment does.

The comment command takes a JSON file generated by infracost diff and posts its contents directly into GitHub, Bitbucket, or Gitlab. Thus, by running Infracost inside CI, we make relevant cost information available to everyone on the team.

Infracost comment on the cost difference in a GitHub commit.

If you want to learn how to setup CI/CD with Infracost, check out this tutorial: Running Infracost on CI/CD.

Working with monorepos

You will likely have separate Terraform files for each subproject if you work with a monorepo. In this case, you should add an infracost config file at the project's root. This allows you to specify the project names and where Terraform and usage files are located. You can also set environment variables and other options.

version: 0.1

projects:
 - path: dev
 usage_file: dev/infracost-usage.yml
 env:
 NODE_ENV: dev

 - path: prod
 usage_file: prod/infracost-usage.yml
 env:
 AWS_ACCESS_KEY_ID: ${PROD_AWS_ACCESS_KEY_ID}
 AWS_SECRET_ACCESS_KEY: ${PROD_AWS_SECRET_ACCESS_KEY}
 NODE_ENV: production

When the config file is involved, you must replace the --path argument with --config-file in all your commands.

Establishing policies

One more trick Infracost has up its sleeve is enforcing policies. Policies are rules that evaluate the output of infracost diff and stop the CI pipeline if a resource goes over budget. This feature allows managers and team leads to enforce limits. When the policy fails, the CI/CD pipeline stops with an error, preventing the infrastructure from being provisioned.

When a policy is in place, Infracost warns us if any limits are exceeded.

Infracost implements policies using Open Policy Agent (OPA), which uses the Rego language to encode policy rules.

Rego has a ton of features, and it’s worth digging in to learn it thoroughly, but for our purposes, we only need to learn a few keywords:

deny[out] defines a new policy rule that fails if the out object has failed: true
msg: defines the error message shown when the policy fails.
out: defines the logic that makes the policy pass or fails.
input: references the contents of the JSON object generated with infracost diff.

The following example shows a policy that fails when the total budget exceeds $1,000:

# policy.rego

package infracost

deny[out] {

 # define a variable
 maxMonthlyCost = 1000.0

 msg := sprintf(
 "Total monthly cost must be less than $%.2f (actual diff is $%.2f)",
 [maxMonthlyCost, to_number(input.totalMonthlyCost)],
 )

 out := {
 "msg": msg,
 "failed": to_number(input.totalMonthlyCost) >= maxMonthlyCost
 }
}

This is another example that fails if the cost difference is equal to or greater than $500.

package infracost

deny[out] {

 # maxDiff defines the threshold that you require the cost estimate to be below
 maxDiff = 500.0

 msg := sprintf(
 "Total monthly cost diff must be less than $%.2f (actual diff is $%.2f)",
 [maxDiff, to_number(input.diffTotalMonthlyCost)],
 )

 out := {
 "msg": msg,
 "failed": to_number(input.diffTotalMonthlyCost) >= maxDiff
 }
}

You can experiment and try several examples online on the OPA playground.
To enforce a policy, you must add the --policy-path option in any of the infracost comment commands like this:

curl -fsSL https://raw.githubusercontent.com/infracost/infracost/master/scripts/install.sh | sh
checkout
infracost diff --path . --usage-file usage.yml --compare-to baseline.json --format json --out-file /tmp/infracost-diff-commit.json
infracost comment github --path=/tmp/infracost-diff-commit.json --repo=$SEMAPHORE_GIT_REPO_SLUG --commit=$SEMAPHORE_GIT_SHA --github-token=$GITHUB_API_KEY --policy-path policy.rego --behavior=update

Conclusion

The power to spin up resources instantly is a double-edged knife: a typo in a Terraform file can be a costly mistake. Staying proactive when managing our cloud infrastructure is essential to sticking to the budget and avoiding nasty surprises at the end of the month. If you’re already automating deployment with continuous deployment and managing services with Terraform, you may as well add Infracost to the mix to make more informed decisions and impose spending limits. Setting this up takes only a few minutes and can save thousands of dollars down the road.