Forem: David Sola

Chatbot with Semantic Kernel - Part 6: AI Connectors 🔌

David Sola — Mon, 17 Feb 2025 23:07:43 +0000

One of Semantic Kernel's key features is the ability to easily swap between different AI providers. This allows us to compare different models and their performance to find the model that best suits our use case.

Semantic Kernel supports major AI providers including OpenAI, Google, Azure, Mistral, Meta, Hugging Face, and others. Although all these platforms provide Large Language Models, each model differs in its characteristics and capabilities:

Modality: Input and output formats (text, image, video, audio, etc.) differ between models and platforms.
Velocity: Some models are faster at generating responses to users.
Cost: Bigger and more powerful models (especially reasoning models) have a higher cost per token.
Structured output: This is the capability of a model to generate a predictable response following a defined JSON schema. This advanced feature is not enabled on all models.
Function calling: This is the ability of the model to invoke plugins (or tools) defined as native code. Although most major providers support function calling, it remains limited in the small language models ecosystem.

One of the supported platforms that I find particularly interesting because of its possibilities is Ollama.

Ollama

Ollama is an open-source tool for running Large Language Models locally. Although LLMs are complex and large models, some of them (especially Small Language Models) can run easily on a laptop. Running models locally is very useful for AI developers since it allows us to maintain everything offline, switch between open models effortlessly, and avoid unnecessary complexity and costs associated with cloud-based models.

Ollama can be installed on Windows, Linux, and macOS from its webpage. Once installed, we can explore their library of models to find a model that fits our purpose. Keep in mind that each model requires different hardware capacity to run locally. Although the reality is more complex, you can consider the size of the model (number of billions of parameters) to estimate its hardware needs. Additionally, each model in the library includes tags with useful information: function calling (tool) support, modality, etc. To run the model locally, simply open a terminal and run ollama run <model-name>:<flavour> (e.g., ollama run phi4 or ollama run llama3.2:1b).

Once the model is running, it can be accessed via terminal, api or via libraries/frameworks. In the next section, we will use models served from Ollama with Semantic Kernel.

Keep in mind that this blog only showcases the most basic usage of Ollama. It includes a much more complete list of features such as model customization, templated prompts, and more.

Using Ollama on Semantic Kernel

Although this section focuses on Ollama, most of its explanation can be easily adapted to any other AI Connector supported by Semantic Kernel. On this link you can find all the currently supported connectors for chat completion services.

First, we need to install the specific semantic-kernel package for Ollama:

pip install semantic-kernel[ollama]

Next, we define the settings for the connector. There are different ways of defining these settings; in this example, we use settings defined via environment variables. You can findhere all the defined settings in Semantic Kernel for the different connectors.

OLLAMA_CHAT_MODEL_ID        = "..."     # Completion Chat Model
OLLAMA_TEXT_MODEL_ID        = "..."     # Completion Text Model
OLLAMA_EMBEDDING_MODEL_ID   = "..."     # Embedding Model
OLLAMA_HOST                 = "..."     # Url of the Ollama server. If not defined it defaults to localhost

Finally, we inject the Ollama services into the Kernel.

# Import dependencies
from semantic_kernel.connectors.ai.ollama import (
    OllamaChatCompletion
)

...

# Inject service into Kernel
self.kernel.add_service(OllamaChatCompletion(service_id='chat_completion')) 

# Retrieve service from Kernel
self.chat_service = self.kernel.get_service(type=OllamaChatCompletion)  

# Get default settings for the service
self.chat_settings = self.kernel.get_prompt_execution_settings_from_service_id(service_id='chat_completion')  

# Define function tool behavior. Set to None if the model does not support function calling
if support_tool:
    self.chat_settings.function_choice_behavior = FunctionChoiceBehavior.Auto()
else:
    self.chat_settings.function_choice_behavior = FunctionChoiceBehavior.NoneInvoke()

Another scenario, particularly when we want to easily switch between providers to test them, is to have one single code base where changing between providers can be done easily. In that case, we can define an environment variable GLOBAL_LLM_SERVICE that specifies which provider we are going to use:

llm_service = os.environ['GLOBAL_LLM_SERVICE']

# Define services per AI connector
services = {  
    'AzureOpenAI': [  
        ('chat_completion', AzureChatCompletion),  
        ('audio_to_text_service', AzureAudioToText),  
        ('text_to_audio_service', AzureTextToAudio)  
    ],  
    'OpenAI': [  
        ('chat_completion', OpenAIChatCompletion),  
        ('audio_to_text_service', OpenAIAudioToText),  
        ('text_to_audio_service', OpenAITextToAudio)  
    ],  
    'Ollama': [  
        ('chat_completion', OllamaChatCompletion)  
    ]  
}

self.kernel = Kernel()

# Init services
for service_id, service_class in services.get(llm_service, []):  
    self.kernel.add_service(service_class(service_id=service_id))

# Set settings
self.chat_settings = self.kernel.get_prompt_execution_settings_from_service_id(service_id='chat_completion')
self.chat_settings.function_choice_behavior = (  
    FunctionChoiceBehavior.Auto() if support_tool else FunctionChoiceBehavior.NoneInvoke()
)

# Retrieve services
chat_completion = self.kernel.get_service(service_id='chat_completion')
if llm_service in ['AzureOpenAI', 'OpenAI']:
    audio_to_text_service = self.kernel.get_service(service_id='audio_to_text_service')
    text_to_audio_service = self.kernel.get_service(service_id='text_to_audio_service')

Summary

In this chapter, we have added several AI connectors to the chatbot so it can work with different models from different providers. Additionally, we have explored in more detail how to run models locally with Ollama.

Remember that all the code is already available on my GitHub repository 🐍 PyChatbot for Semantic Kernel.

Chatbot with Semantic Kernel - Part 5: Text-to-speech 📣

David Sola — Wed, 29 Jan 2025 11:55:43 +0000

In the last chapter, we added the first audio capability to our chatbot by allowing the user to interact with the model using their voice. In this chapter, we are going to add the opposite skill: giving a voice to our chatbot.

Text-to-speech

In recent times, models have vastly improved in creating audio based on text input. In some cases, model providers offer standalone models for Text-to-speech, like TTS from OpenAI. On the other hand, we have also the possibility of using more powerful models that support both multimodal input (text, image, video and audio) and output (text, image and voice). Some examples of these more powerful models are Gemini 2.0 flash from Google and GPT-4o-realtime from OpenAI.

The possibility of generating high quality audio thanks to these TTS models, combined with the potential of powerful text models (like GPT-4o), has enabled many use cases that were unimaginable just a few years ago. For example, in 2024, Google released NotebookLM, an application that generates podcasts based on sources uploaded by the user. If you are researching evaluation techniques for LLMs, you can upload materials such as papers or articles, and the application creates a podcast where two AI voices have a conversation summarizing and explaining your material.

Text-to-speech on Semantic Kernel

In November 2024, Microsoft added audio capabilities support to Semantic Kernel. For the Text-to-speech scenario, we will build the following workflow:

Introduce a text or audio input. You can check the previous article where we added Audio-to-text functionality.
Use a standard LLM to generate a response from the user's input.
Use the TTS model from OpenAI to convert the response into audio (WAVformat).
Reproduce the generated audio to the user.

Based on our previous chatbot, the two first steps are already accomplished. Let's now focus on converting the text response into an audio with the TTS model.

Generate audio

First of all, we need to inject a new service into our Kernel. In this case, we register an AzureTextToAudio service:

# Inject the service into the Kernel
self.kernel.add_service(AzureTextToAudio(
    service_id='text_to_audio_service'
))

# Get the service from the Kernel
self.text_to_audio_service:AzureTextToAudio = self.kernel.get_service(type=AzureTextToAudio)

Because the service is declared as an Azure service, it uses the following environment variables:

AZURE_OPENAI_TEXT_TO_AUDIO_DEPLOYMENT_NAME: the name of the model deployed in Azure OpenAI.
AZURE_OPENAI_API_KEY: the API key associated to the Azure OpenAI instance.
AZURE_OPENAI_ENDPOINT: the endpoint associated to the Azure OpenAI instance.

Similarly, Semantic Kernel has many AI connectors, like the OpenAITextToAudio service. In that case, the name of the variables would be:

OPENAI_AUDIO_TO_TEXT_MODEL_ID: the OpenAI audio to text model ID to use.
OPENAI_API_KEY: the API key associated to your organization.
OPENAI_ORG_ID: the unique identifier for your organization.

You can check all the settings used on Semantic Kernel on the official Github repository.

The TextToAudio service is quite simple to use. It has two important methods:

get_audio_contents: return a list of generated audio contents. Some models do not support generation of multiple audios from one single input, in that case the list will contain only one element.
get_audio_content: identical to previous method but always return the first element of the list.

Both methods have an optional argument OpenAITextToAudioExecutionSettings, to customize the behavior of the service. With the current version of Semantic Kernel, you can customize the speed of the playback, the voice used (with Alloy being the default one), and the output format. In this case, I have decided to use the echo voice in WAV format:

async def generate_audio(self, message: str) -> bytes:
    audio_settings = OpenAITextToAudioExecutionSettings(voice='echo', response_format="wav")
    audio_content = await self.text_to_audio_service.get_audio_content(message, audio_settings)
    return audio_content.data

The output generated by the method is a list of bytes containing the audio. Now we can easily use the output of the standard response from the LLM to generate the corresponding audio:

response = await assistant.generate_response(text)
add_message_chat('assistant', response)

if config['audio'] == 'enabled':
    audio = await assistant.generate_audio(response)

Reproducing the audio

Once we have the audio generated, we need some code to reproduce it on the user's computer. For that purpose, I have created a simple AudioPlayer class using pyaudio library:

import io
import wave
import pyaudio

class AudioPlayer:
    def play_wav_from_bytes(self, wav_bytes, chunk_size=1024):
        p = pyaudio.PyAudio()

        try:
            wav_io = io.BytesIO(wav_bytes)

            with wave.open(wav_io, 'rb') as wf:
                channels = wf.getnchannels()
                rate = wf.getframerate()

                stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                                channels=channels,
                                rate=rate,
                                output=True)

                data = wf.readframes(chunk_size)
                while len(data) > 0:
                    stream.write(data)
                    data = wf.readframes(chunk_size)

                stream.stop_stream()
                stream.close()

        finally:
            p.terminate()

Finally, we call the play_wav_from_bytes method to reproduce the audio generated by the model:

# Generate response with standard LLM
response = await assistant.generate_response(text)

# Add response to the user interface
add_message_chat('assistant', response)

if config['audio'] == 'enabled':
    # Generate audio from the text
    audio = await assistant.generate_audio(response)

    # Reproduce audio
    player = AudioPlayer()
    player.play_wav_from_bytes(audio)

Summary

In this chapter, we have provided a voice to our chatbot thanks to a Text-to-speech model. We have transformed our agent into a multimodal agent by supporting text and audio as input and output.

In the next chapter, we will integrate the chatbot with Ollama to enable the use of locally run models

Remember that all the code is already available on my GitHub repository 🐍 PyChatbot for Semantic Kernel.

Want to learn GenAI on 2025?

David Sola — Thu, 09 Jan 2025 21:27:09 +0000

8 tips to learn Generative AI in 2025

David Sola ・ Dec 20 '24

#ai #beginners #genai #learning

Chatbot with Semantic Kernel - Part 4: Speech-to-text with Whisper 👂

David Sola — Fri, 27 Dec 2024 15:02:06 +0000

On the previous chapters, we built a basic Librarian agent, enhanced with some specific skills via function calling, and a tool to inspect in real time the interactions of our agent with the plugins.

On this chapter, we are going to add some audio capabilities to our Librarian agent. Once we finish, our Librarian will start its multimodality journey, we will be able to communicate with it using our voice.

Whisper

Our goal is to make the Agent capable of listening to us. We will use the microphone of the computer to get back a response from the model. The process should work as if we had written the text.

In order to accomplish it, we will use an Automatic Speech Recognition (ASR) system, in our case, Whisper from OpenAI. Although this model uses a similar architecture to a Large Languange Model, it should not be defined as an LLM, as Yann LeCun states in this message. Whisper, or any ASR system, is able to transcript an audio input in multiple languages.

Whisper on Semantic Kernel

In November 2024, Microsoft added support to audio capabilities to Semantic Kernel. The workflow we will build is as follows:

Record the user's audio using the computer's microphone.
Use Whisper to convert the audio into text.
Provide the text as the agent's input.
Show the reply generated by the agent to the user.

Let's start by recording with the user's microphone on demand. On my chatbot, I have a button to start the recording. Once pressed, the recording starts and the user must click on it again to stop it. For that reason, I have created two methods: start_recording and stop_recording.

import os
import threading
import pyaudio
import wave
from typing import ClassVar

class AudioRecorder:
    FORMAT: ClassVar[int] = pyaudio.paInt16
    CHANNELS: ClassVar[int] = 1
    RATE: ClassVar[int] = 44100
    CHUNK: ClassVar[int] = 1024

    is_recording: bool = False
    output_filepath: str

    def start_recording(self):
        """Start the recording on a new thread to avoid blocking the UI"""
        if not self.is_recording:  
            self.is_recording = True  
            self.audio_thread = threading.Thread(target=self.record_audio)  
            self.audio_thread.start() 

    def stop_recording(self):
        """Stop the recording (if started)"""
        if self.is_recording:  
            self.is_recording = False  
            if self.audio_thread is not None:  
                self.audio_thread.join()

    def record_audio(self):
        """Record the audio in a output.wav file"""
        # Create output file path
        self.output_filepath = os.path.join(os.path.dirname(__file__), "output.wav")

        self.is_recording = True

        # Open the stream of audio
        audio = pyaudio.PyAudio()
        stream = audio.open(
            format=self.FORMAT,
            channels=self.CHANNELS,
            rate=self.RATE,
            input=True,
            frames_per_buffer=self.CHUNK,
        )
        frames = []

        # Read chunks while recording and append them to the list of frames
        while self.is_recording:
            data = stream.read(self.CHUNK)
            frames.append(data)

        # Stop and close the stream of audio
        stream.stop_stream()
        stream.close()

        # Store the audio as a WAV by joining all frames
        with wave.open(self.output_filepath, "wb") as wf:
            wf.setnchannels(self.CHANNELS)
            wf.setsampwidth(audio.get_sample_size(self.FORMAT))
            wf.setframerate(self.RATE)
            wf.writeframes(b"".join(frames))

        audio.terminate()

With that piece of code, we can record the user's voice as a wav file. Now, we will use Whisper to transcribe it, so we need to add the audio service AzureAudioToText to Semantic Kernel. Alternatively, you can use OpenAIAudioToText in case you want to connect directly with the OpenAI API.

self.kernel.add_service(AzureAudioToText(
    service_id='audio_service'
))

Once it is added to the kernel, it can be retrieved at any time.

self.audio_to_text_service = self.kernel.get_service(type=AzureAudioToText)

The usage of the audio service is quite straightfoward. First, we convert the audio file into an AudioContent. Then, we use the AudioContent to call the method get_text_content from the audio service:

async def transcript_audio(self, audio_file: str) -> str:
    # Conver the WAV file into AudioContent
    audio_content = AudioContent.from_audio_file(audio_file)

    # Use the audio service to trascript the AudioContent
    user_message = await self.audio_to_text_service.get_text_content(audio_content)

    # Return the message as text
    return user_message.text

The returned result from the method can be used then to be displayed on the chat interface, and to be ingested to the agent as any other user's input. You can checkout the other chapters of this series where I explain how to build the text-based chat.

async def transcript_audio_and_send_mesasge(self, audio_file: str) -> str:
    # Conver the WAV file into AudioContent
    audio_content = AudioContent.from_audio_file(audio_file)

    # Use the audio service to transcribe the AudioContent
    user_message = await self.audio_to_text_service.get_text_content(audio_content)

    # Add message to the history
    self.history.add_message(ChatMessageContent(role=AuthorRole.USER, content=user_message))

    # Invoke the agent with the updated history
    async for response in self.agent.invoke(self.history):
        # Add agent's reply to the history
        self.history.add_message(response)

        # Return the reply
        return str(response)

Summary

On this chapter, we have added the possibility to transform our audio into text using Whisper, and then ingest that text into the model to generate a response.

Remember that all the code is already available on my GitHub repository 🐍 PyChatbot for Semantic Kernel.

On the next chapter, we will add voice to our Librarian using a Text To Speech service.

8 tips to learn Generative AI in 2025

David Sola — Fri, 20 Dec 2024 12:20:36 +0000

Generative AI is everywhere and is evolving quicker than any other technology. If you want to learn GenAI in 2025, continue the reading :)

I started learning and working with Generative AI in Spring 2023, just after the launch of GPT-4 and the corresponding social boom it generated. At that time, I thought it was too late for me - experts would be everywhere, and there would be no room for me. However, I was clearly wrong.

During this year and a half, I have created my own recipe that has helped me to learn the topic from scratch and keep up with all the news and trends on the field.

1. Learn some Python 🐍

You don't need to be a Python expert to start your learning journey, but it will certainly allow you to quickly test what you are learning. There are many free courses on different formats you might find interesting. In my case, I went through this free course from exercism.org. Furthermore, try what you are learning about Python on your own. Jupyter Notebook is the perfect tool to experiment with your new acquired Python skills.

2. Understand how LLMs work 🔀

Large Language Models are at the core of any GenAI-based application. It is crucial to get the basics of how they are designed, created, trained, their underneath architecture, etc. If you understand how they work, you might understand their limitations. I think This one-hour introductory video from Andrej Karpathy is a masterpiece on the matter.

3. Prompt engineering 📝

Prompting is the based of building GenAI applications. It is the way you instruct the LLM to generate the content. Prompting is a very wide field, where you might learn new things every day. There are many free courses and websites that gather all about prompting, from basic topics to advanced ones like dynamic shots or prompt chaining. For example, webs like Prompt Engineering Guide or this one from Microsoft are quite interesting. Additionally, I found this 1-hour podcast about AI prompt engineering from Antrophic very stimulating: https://www.youtube.com/watch?v=T9aRN5JkmL8

4. Advanced topics ⬆️

Now that you know the basics it is time to explore more advanced topics like embeddings, RAG or function calling. These topics and techniques will open a bunch of new opportunities for your GenAI applications.

5. Use libraries 📚

Don't reinvent the wheel. It is important to understand the underlying concepts, but sometimes it is easier to just use a library that gives you a lot for free. Try those libraries which, in most cases, will simplify your journey in building GenAI applications. LangChain is the most famous one, but I think it is worth trying others, like Semantic Kernel, specially if you work in Microsoft ecosystem (Azure, .NET, etc).

6. Quality ✅

Quality is crucial in any application, and it is even more important on GenAI based application, to keep content created relevant and reduce hallucinations. Understand how you can create evaluation systems for your application. Also, concepts like LLM as a judge might be very interesting in some scenarios.

7. Stay updated 💡

The rapid pace of changes and improvements in GenAI is stunning. You might feel lost just after couple of weeks of disconnection. Stay updated with the latest trends and news through newsletters. I follow few newsletters that cover commercial and non-technical news, such as Artificial Ignorance and Why Try AI. I also find the Ahead of AI blog useful, as it delves into more technical and deep concepts. Additionally, the AI Explained YouTube channel is quite entertaining and informative.

8. Experiment 🧪

Work on your own ideas, build prototypes, and don't hesitate to discard them and try new features. Theory is important, but experience is crucial. Hands-on experiments is by far the best way to learn new concepts.

These 8 steps are a perfect summary of what has worked for me. I hope you have found this reading useful and interesting. Let me know in the comments how you have learned GenAI or any content, blog or video you have found useful on the topic.

You can also follow me on LinkedIn where I publish GenAI content as well.

Chatbot with Semantic Kernel - Part 3: Inspector & tokens 🔎

David Sola — Mon, 09 Dec 2024 18:18:04 +0000

On our previous chapters, we built a basic agent enhanced with specific skills via Plugins and Function calling.

On this third chapter, we will add a functionality to inspect and debug in real time the interactions between our agent and the plugins.

Why do we need an inspector?

In a basic agent-human interaction, the system receives a set of instructions and a history chat, and creates a reply accordingly. Although it might not be a mundane task, the number of variables are scoped. However, when we add skills to our agent in the form of plugins, the interactions are much more complex. We need to review descriptions, arguments, etc. In those scenarios, it is key to understand how our agent interacts with the different plugins in real time to be able to adjust the different plugins so the agent calls them when expected. That's the purpose of the Inspector we are going to build on this chapter.

Additionally, it is also important to identify the number of tokens our model consumes on each function call. That information is needed to estimate the cost of our agent, something that is critical on a business scenario when the usage of the agent could grow to hundreds or thousands of users. With these estimations, we can decide if the current solution is cost-effective, or if we need to improve our prompts, functions or just remove some functionalities.

Let's start by understanding what tokens are and how we can extract that usage from Semantic Kernel.

Tokens

A Large Language Model decomposes the text into tokens to analyze the semantics of the text and the connection between the different tokens. In a naive definition, tokens are how the models see the world. If you want to understand better how these models are created, decompose the text and create content, there is a wide amount of literature out there about it.

Model providers, such as OpenAI or Antrophic, make a difference between input and output tokens:

Input tokens (aka prompt tokens) are those sent to the model on each call. On a real scenario the system prompt, chat history and other data, such as function descriptions, are part of the input tokens.
Output tokens (aka completion tokens) are those generated by the model based on the input tokens. The cost of these tokens is usually ~4 times more than that of input tokens.

We can easily check in many online webs, like this tool from Hugging Face, how the text is converted into tokens and compared the differences between models. On each model, the decomposition is defined by the encoding used. For example, the gpt-4o family uses a o200k_base encoding, while the gpt3.5-turbo or text-embedding-ada-002 use cl100k-base.

Furthermore, it is possible to count the number of tokens directly in the code. In python, we can use the well-known library tiktoken from OpenAI.

import tiktoken

# Get encoder for a specific model
encoder = tiktoken.encoding_for_model("text-embedding-ada-002")

# Decompose text into tokens
tokens = encoder.encode('Some text here')

# Calculate number of tokens
print(f'Number of tokesn: {len(tokens)}')

In some scenarios, it might be useful to be able to count the number of tokens before actually doing a call to the agent. For example, if you need to calculate multiple embeddings, you can calculate the preferred batch size by counting the number of tokens per embedding and taking into account the model's limit. Other scenarios might be to keep the chat history below a threshold to control the cost, or calculate the optimal number of samples provided on a few-shot dynamic prompting.

Tokens on Semantic Kernel

On Semantic Kernel, we can easily get the tokens used on each agent call. Let's try to create some code to gather and track that information.

First, we create a class TokenUsage that we will use to collect the number of input (prompt) and output (completion) tokens per invokation:

class TokenUsage:
    def __init__(self, input_t, output_t):
        self.input_tokens = input_t
        self.output_tokens = output_t

The usage of tokens is part of the metadata of the messages from the ChatHistory:

for message in history.messages:
    usage =  TokenUsage(
        input_t=message.metadata['usage'].prompt_tokens,
        output_t=message.metadata['usage'].completion_tokens
    )

Alternatively, we can track the usage on each agent call but the response will only hold the last reply from the agent. You might need to inspect the ChatHistory that has been automatically updated with the call to invoke:

async for response in self.agent.invoke(self.history):
    self.history.add_message(response)
    usage = TokenUsage(
        input_t=response.metadata['usage'].prompt_tokens,
        output_t=response.metadata['usage'].completion_tokens
    )

Agent interactions

Our existing Librarian agent supports two types of interactions:

Non-function call: the model uses the chat history and the system prompt (or instructions) to reply directly to the user. These interactions are done when the user asks for something that is not directly related to any function. For example: Hello, how are you today?
Function call: the model invokes one or more functions to generate the response to the user. For example: Find some books about Harry Potter.

In Semantic Kernel, each of these types of interactions are mapped to a specific class type. For a simple user-agent interaction (non-function call) the message in the history is an instance of TextContent. For the user-agent-plugin interaction (function call) there are two different messages in the history: first, the FunctionCallContent that includes the function that has been called and the arguments; second, the FunctionResultContent that holds the result for the function call.

Now, to apply these concepts to our agent, we first create some classes to collect the information we want to show in the inspector of the chatbot for each type of call:

from abc import ABC

class AgentInvokation(ABC):
    role: str
    usage: TokenUsage

class AgentTextInvokation(AgentInvokation):
    text: str

    def __init__(self, role: str, text: str, usage: TokenUsage):
        self.role = role
        self.text = text
        self.usage = usage

class AgentFunctionInvokation(AgentInvokation):
    plugin_name: str
    function_name: str
    function_result: str
    function_arguments: str

    def __init__(self, role: str, plugin_name: str, function_name: str, arguments: str, usage: UsageRecord):
        self.role = role
        self.plugin_name = plugin_name
        self.function_name = function_name
        self.function_arguments = arguments
        self.usage = usage

    def add_invokation_result(self, result: str):
        self.function_result = result

Then, we can create a method that maps the current ChatHistory from Semantic Kernel into the class system we have created:

from agent.agent_record import AgentInvokation, AgentTextInvokation, AgentFunctionInvokation, TokenUsage
from semantic_kernel.contents.chat_message_content import ChatMessageContent

class LibirarianAssistant:
    def invokations(self, user_message: str) -> list[AgentInvokation]
        invokations = [
            TextInvokation(role='AuthorRole.SYSTEM', text=self.agent.instructions, usage=InvokationUsage(0,0))
        ]

        for message in self.history.messages:
            # Get role to differentiate between User and Agent
            role = message.role

            # Retrieve usage from metadata
            usage = self.__get_usage(message)

            for item in message.items:
                if isinstance(item, TextContent):
                    # For TextContent, we just get the text
                    self.records.append(AgentTextInvokation(role, item.text, usage))
                elif isinstance(item, FunctionCallContent):
                    # For FunctionCallContent we get the name of the plugin, the function, and its arguments
                    self.records.append(AgentFunctionInvokation(role, item.plugin_name, item.function_name, item.arguments, usage))
                elif isinstance(item, FunctionResultContent) and isinstance(records[-1], AgentFunctionInvokation):
                    # For FunctionResultContent, we update last record adding the function result
                    self.records[-1].add_invokation_result(item.result)

    def __get_usage(self, message: ChatMessageContent) -> TokenUsage:
        # If usage is not present, return 0
        if 'usage' in message.metadata:
            return TokenUsage(message.metadata['usage'].prompt_tokens, message.metadata['usage'].completion_tokens)
        else: 
            return TokenUsage(0, 0)

Now, we have all the pieces to present it to the user in the way we prefer. In my sample chatbot, I have decided to use two different tabs, one for the standard chatbot experience, and another one with the details about the function calls and tokens. On the latter, we separate the messages into four types: user messages, system prompt, agent text replies and agent function calls (or tools).

Summary

On this chapter, we have not added any functionality to the agent itself. However, we have enhanced our chatbot with a real time inspector, so it is easy to see how our agent interacts with the different plugins and estimate the usage.

Remember that all the code is already available on my GitHub repository 🐍 PyChatbot for Semantic Kernel.

On the next chapter, we will get back to the agent to include Voice capabilities, like speech recognition or text-to-speech.

Chatbot with Semantic Kernel - Part 2: Plugins 🧩

David Sola — Sun, 01 Dec 2024 21:04:23 +0000

On our previous chapter, we went through some of the basic concepts of Semantic Kernel, finishing with a working Agent that was able to respond to generic questions, but with a predefined tone and purpose using the instructions.

On this second chapter, we will add specific skills to our Librarian using Plugins.

What is a Plugin?

A Plugin is a set of functions exposed to the AI services. Plugins encapsulate functionalities, allowing the assistant to perform actions that are not part of its native behavior.

For example, with Plugins we could enable the assistant to fetch some data from an API or a Database. Additionally, the assistant could perform some actions on behalf of the user, tipically through APIs. Furthermore, the assistant would be enable to update some parts of the UI using a Plugin.

As I mentioned before, a Plugin is a composed by different functions. Each function is defined mainly by:

Description: the purpose of the function and when it should be invoked. It will help the model to decide when to call it as we will see in the section function calling.
Input variables: used to parametrize the function so it can be reusable.

Semantic Kernel supports different types of Plugins. In this post we will focus on two of them: Prompt Plugin and Native Plugin.

Prompt plugin

A Prompt Plugin is basically a specific prompt to be invoked under concrete circumstances. In a typical scenario, we might have a complex System Prompt, where we define the tone, purpose and general behavior of our agent. However, it is possible that we want the agent to perform some concrete actions where we need to define some specific restrictions and rules. For that case, we would try to avoid the System Prompt to grow to the infinite in order to reduce hallucinations and keep the model response relevant and controlled. That's a perfect case for a Prompt Plugin:

System Prompt: tone, purpose and general behavior.
Summarization Prompt: including rules and restrictions about how to do a summary. For example, it should not be longer than two paragraphs.

A Prompt Plugin is defined by two files:

config.json: configuration file including description, variables and execution settings:

{
    "schema": 1,
    "description": "Plugin description",
    "execution_settings": {
        "default": {
            "max_tokens": 200,
            "temperature": 1,
            "top_p": 0.0,
            "presence_penalty": 0.0,
            "frequency_penalty": 0.0
        }
    },
    "input_variables": [
        {
            "name": "parameter_1",
            "description": "Parameter description",
            "default": ""
        }
    ]
}

skprompt.txt: prompt content in plain text. Variables from the configuration file can be accessed using the syntax {{$parameter_1}}.

To add a Prompt Plugin into the Kernel we just need to specify the folder. For example, if we have the folder structure /plugins/plugin_name/skprompt.txt, the plugin is registered as follows:

self.kernel.add_plugin(parent_directory="./plugins", plugin_name="plugin_name")

Native plugin

A Native Plugin allows the model to invoke native code (python, C# or Java). A plugin is represented as a class, where any function can be defined as invokable from the Agent using annotations. The developer must provide some information to the model with the annotations: name, description and arguments.

To define a Native Plugin we must only create the class and add the corresponding annotations:

from datetime import datetime
from typing import Annotated
from semantic_kernel.functions.kernel_function_decorator import kernel_function

class MyFormatterPlugin():

    @kernel_function(name='format_current_date', description='Call to format current date to specific strftime format') # Define the function as invokable
    def formate_current_date(
        self,
        strftime_format: Annotated[str, 'Format, must follow strftime syntax'] # Describe the arguments
    ) -> Annotated[str, 'Current date on the specified format']: # Describe the return value
    return datetime.today().strftime(strftime_format)

To add a Native Plugin into the Kernel we need to create a new instance of the class:

self.kernel.add_plugin(MyFormatterPlugin(), plugin_name="my_formatter_plugin")

Function calling

Function calling, or planning, in Semantic Kernel is a way for the model to invoke a function registered in the Kernel.

For each user message, the model creates a plan to decide how to reply. First, it uses the chat history and the function's information to decide which function, if any, must be called. Once it has been invoked, it appends the result of the function to the history, and decides if it has completed the task from the user message or requires more steps. In case it is not finished, it starts again from the first step until it has completed the task, or it needs help from the user.

Thanks to this loop, the model can concatenate calls to different functions. For example, we might have a function that returns a user_session (including the id of the user) and another one that requires a current_user_id as argument. The model will make a plan where it calls the first function to retrieve the user session, parses the response and uses the user_id as argument for the second function.

In Semantic Kernel, we must tell the agent to use function calling. This is done by defining an execution settings with the function choice behavior as automatic:

# Create the settings
settings = AzureChatPromptExecutionSettings()

# Set the behavior as automatic
settings.function_choice_behavior = FunctionChoiceBehavior.Auto()

# Pass the settings to the agent
self.agent = ChatCompletionAgent(
    service_id='chat_completion',
    kernel=self.kernel,
    name='Assistant',
    instructions="The prompt",
    execution_settings=settings
)

It is important to emphasize that the more detailed the descriptions are, the more tokens are being used, so it is more costly. It is key to find a balance between good detailed descriptions and tokens used.

Plugins for our Librarian

Now that it is clear what a function is and its purpose, let's see how we can get the most out of it for our Librarian agent.

For learning purposes, we will define one Native Plugin and one Prompt Plugin:

Book repository plugin: it is a Native Plugin to retrive books from a repository.
Poem creator Plugin: it is a Prompt Plugin to create a poem from the first sentence of a book.

Book repository plugin

We use the Open library API to retrieve the books' information. The plugin returns the top 5 results for the search, including the title, author and the first sentence of the book.

Specifically, we use the following endpoint to retrieve the information: https://openlibrary.org/search.json?q={user-query}&fields=key,title,author_name,first_sentence&limit=5.

First, we define the BookModel that represents a book in our system:

class BookModel(TypedDict):
    author: str
    title: str
    first_sentence: str

And now, it is time for the function. We use a clear description of both the function and the argument. In this case, we use a complex object as response, but the model is able to use it later on further responses.

class BookRepositoryPlugin:
    @kernel_function(name='get_books_from_user_query', description='Get a list of books based ona  user query or search')
    async def get_books_from_user_query(
        self,
        user_query: Annotated[str, 'User query. No more than 5 words.'], # The model will extract the user_query from the user message. For example, if the user writes `Show me books about Harry Potter`. The model will call this function with the argument `user_query = Harry potter`.
    ) -> Annotated[list[BookModel], 'List of books']:    
        # Define the request based on user message
        url = 'https://openlibrary.org/search.json'  
        params = {  
            'q': user_query,  
            'fields': 'key,title,author_name,first_sentence',  
            'limit': 5  
        }  

        # Send the request
        response = requests.get(url, params=params)  
        response.raise_for_status()

        data = response.json()  
        books = []  

        # Parse the response into our BookModel
        for doc in data['docs']:  
            book = BookModel(  
                author=doc['author_name'][0],  
                title=doc['title'],  
                first_sentence='\n'.join(doc['first_sentence']) if 'first_sentence' in doc and doc['first_sentence'] else ""  
            )  
            books.append(book)  

        return books

Finally, we can add this plugin to the Kernel:

self.kernel.add_plugin(BookRepositoryPlugin(), plugin_name="BookRepositoryPlugin")

Poem creator plugin

We will define this plugin as a Prompt Plugin with some specific restrictions. This is how the prompt and its configuration look like:

/plugins/poem-plugin/poem-creator/config.json:

{
    "schema": 1,
    "description": "Rewrite a sentence from a book as a poem",
    "execution_settings": {
        "default": {
            "max_tokens": 1000,
            "temperature": 0.4,
            "top_p": 0.0,
            "presence_penalty": 0.0,
            "frequency_penalty": 0.0
        }
    },
    "input_variables": [
        {
            "name": "book_first_sentence",
            "description": "First sentence of the book",
            "default": ""
        }
    ]
}

/plugins/poem-plugin/poem-creator/skprompt.txt:

Rewrite as a poem the first sentence of a book <sentence>{{$book_first_sentence}}</sentence> following these restrictions:

- Response must be always in English.
- The poem must always have one stanza.

It is straightfoward to add the plugin to the Kernel:

self.kernel.add_plugin(parent_directory="./plugins", plugin_name="poem_plugin")

Good practices

Some suggestions based on the existing literature and my own experience:

Use python syntax to describe your function even in .NET or Java. Models are usually more skilled on python due to the trained data 🐍
Keep functions focused, specially the descriptions. One function, one purpose. Don't try to create one function that makes too many things, it will be counter productive 🎯
Simple arguments and low number of them. The simpler and fewer they are, the more reliable the call from the models to the functions will be 👇
If you have many functions, review the descriptions carefully to make sure there are no potential conflicts that might make the model get confused 🔎
Ask a model (via chatgpt or similar) feedback about the function descriptions. They are usually quite good to find improvements. By the way, this also applies to the development of prompts in general ❓
Test, test and test. Specially on business software cases, reliablity is key. Make sure the model is able to call the expected functions with the information you have provided to them via annotation 🧪

Summary

In this chapter, we have enhanced our librarian agent with some specific skills using Plugins and Semantic Kernel Planning.

Remember that all the code is already available on my GitHub repository 🐍 PyChatbot for Semantic Kernel.

In the next chapter, we will include some capabilities in the chat to inspect in real time how our model calls and interacts with our plugins by creating an Inspector.

Chatbot with Semantic Kernel - Part 1: Setup and first steps 👣

David Sola — Mon, 25 Nov 2024 09:32:45 +0000

Welcome to the very first post in a series of blogs about my journey building a chatbot with Semantic Kernel. In particular, I will work with the new experimental Agent framework.

This side project serves two main purposes. First, it's a learning opportunity to gain hands-on experience with Semantic Kernel, providing a place to test new features, models, and experiment with plugin development. Second, it serves as a rapid prototyping platform. With a ready-made User Interface integrated with Semantic Kernel, creating quick prototypes for new use cases becomes straightforward.

Before we dive into it, here are some important notes about this series.

These posts assume basic knowledge of Generative AI. We won't cover fundamental concepts like Large Language Models (LLMs) or embeddings 🎓
While this project uses Python, Semantic Kernel also supports .NET and Java. Feel free to experiment with your preferred language ⭐
The chat User Interface implementation details won't be covered. For those interested, I'm using the NiceGUI Python library 🗪
In order to make something practical, we will develop a Librarian chatbot. Throughout the different chapters, we will add new features, such as similarity search, abstract summarization, or book curation 📖
Initially, the chatbot is integrated with Azure OpenAI. Support for OpenAI and Hugging Face will be added in future posts 🤝
You can find a working version of the chatbot in my GitHub repository 🐍 PyChatbot for Semantic Kernel 👨‍💻

Let's begin this exciting journey together!

What is Semantic Kernel?

According to the official documentation:

Semantic Kernel is a lightweight, open-source development kit that lets you easily build AI agents and integrate the latest AI models into your C#, Python, or Java codebase. It serves as an efficient middleware that enables rapid delivery of enterprise-grade solutions.

In essence, Semantic Kernel is an SDK that simplifies AI agent development. The chatbot will be based on the new experimental Agent Framework, which is still in development phase.

Installation

To begin, install the semantic-kernel package using pip:

pip install semantic-kernel

For C# or Java implementations, you can refer to the official Semantic Kernel quickstart guide.

The Kernel

The Kernel is the core of Semantic Kernel. It is basically a Dependency Injection container that manages the services and plugins used for an AI application.

# Init kernel
kernel = Kernel()

In this chapter, we start by creating the most common AI service, a chat completion. This service type generates responses in conversational contexts, where the model not only use the last user message isolated, but within a context (the conversation history) so the response is coherent and relevant to the conversation.

# Add chat completion service
kernel.add_service(AzureChatCompletion(
    base_url='base_url' # For example, an Azure OpenAI instace url
    api_key='api_key', # The Api Key associated to the previous instace
    deplyoment_name='deployment_name' # A chat model like gpt-4o-mini
))

Alternatively, if the settings are not provided explicitly on the constructor, Semantic Kernel will try to load them from the environment based on predefined names. For example, Azure OpenAI related settings are always prefixed with AZURE_OPEN_AI (e.g: AZURE_OPENAI_BASE_URL, AZURE_OPENAI_API_KEY, AZURE_OPENAI_CHAT_DEPLOYMENT_NAME).

Once a service is added to the Kernel, it can be retrieved later by its type.

chat_service = kernel.get_service(type=ChatCompletionClientBase)

The agent

For this series of blogs, I will build a book assistant. Feel free to experiment with your preferred theme for your chatbot.

To start with it, we create a book_assistant.py file. In the constructor, we initialize the Kernel and the corresponding AI services.

from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

class BookAssistant:
    def __init__(self):
        # Initalize the kernel and the AI Services
        self.kernel = Kernel()
        self.chat_service = AzureChatCompletion(
            service_id="chat_completion"
        )

        # Add AI Services to the kernel
        self.kernel.add_service(self.chat_service)

For this conversational agent, we'll be using the Agent Framework from Semantic Kernel. We define it as a ChatCompletionAgent providing the AI Services through the Kernel. Semantic Kernel automatically selects the available ChatCompletion services from the Kernel. In our case, it will leverage the AzureChatCompletion service to reply to the user questions.

from semantic_kernel.agents import ChatCompletionAgent

class BookAssistant:
    def __init__(self):
        ... # More code

        self.agent = ChatCompletionAgent(
            service_id='chat_completion',
            name='BookAssistant',
            kernel=self.kernel
        )

Agent Framework is experimental, the code used here might not be compatible with future versions of the library.

Chat history

The chat history tracks and maintains the record of messages throughout a chat session, enabling context preservation and continuity of conversation.

For now, we just initialize it on the __init__ assistant method:

from semantic_kernel.contents import ChatHistory

class BookAssistant:
    def __init__(self):
        ... # More code

        self.history = ChatHistory()

Calling the model

Once we have the ChatCompletionAgent and the ChatHistory initialized, we are ready to interact with the agent. We add a new async method call in our BookAssistant class. The method will:

Receive the last user_message as an argument.
Add it to the ChatHistory as a user message.
Invoke the agent with the invoke method passing the ChatHistory.
The ChatCompletionAgent uses the model to generate a reply to the last user_message guided by the context provided in the ChatHistory.
Add the response to the ChatHistory as an assistant message.
Return back the response to the caller so it's shown in the chat interface.

from semantic_kernel.contents.utils.author_role import AuthorRole

async def call(self, user_message: str) -> str:
    self.history.add_message(ChatMessageContent(role=AuthorRole.USER, content=user_message))

    async for response in self.agent.invoke(self.history):
        self.history.add_message(response)
        return str(response)

With this simple piece of code, we already have an easy way of having a conversation with the chatbot. However, the agent does not act as a book assistant yet, it is just a generic one. For example, if we ask what kind of things can do, it replies with a vague and generic response:

Let's see how can we customize the behavior of the agent to act closer to a book assistant.

Adding instructions

We use the System Prompt to instruct the agent who/what it should be, how it should behave and how it should respond. In previous version of Semantic Kernel, System Prompt was defined using the Persona. However, as we are using the new experimental Agent Framework, the System Prompt is now provided on the Agent initialization:

self.agent = ChatCompletionAgent(
    service_id='chat_completion',
    name='BookAssistant',
    kernel=self.kernel,
    instructions="""
        You are a knowledgeable book assistant who helps readers explore and understand literature. You provide thoughtful analysis of themes, characters, and writing styles while avoiding spoilers unless explicitly requested.

        Your responses are concise but insightful, and you're careful to ask clarifying questions when needed to better understand readers' preferences and needs. When uncertain about details, you openly acknowledge limitations and present literary interpretations as possibilities rather than absolutes.
    """
)

With these simple instructions, we have adjusted the agent's tone, specifying its purpose, and adding some remarks about how we expect it to act under some circumstances. If we now repeat the previous question, the agent replies with a more concrete and precise answer.

If you want to know more about how to define a good system prompt, there are many free courses and blogs about the topic. For example, you can check out the Microsoft Prompt engineering techniques documentation.

Summary

In this chapter, we have accomplished the first steps on the development of a chatbot using Semantic Kernel with the new experimental Agent Framework. We have gone through some basic concepts, and provide some "personality" to the agent.

Remember that all the code is already available on my GitHub repository 🐍 PyChatbot for Semantic Kernel.

In the next chapter, we will add specific Librarian skills to our Agent through Plugins.

Azure Event Grid series: CloudEvents Schema

David Sola — Mon, 21 Dec 2020 12:08:51 +0000

This is a series of blogs to talk and discuss about good practices and tips for Event Grid. Some of the topics that will be discussed can be applied not only to Event Grid but also to any other event/message based service.

TL;DR

CloudEvents is an open source project which goal is to provide a common way for describing event data. It is possible to work with Azure Event Grid and CloudEvents but some processes are different, for example the Webhook Subscription Validation.

Event Schemas

An Event Schema defines the structure of an event, the formatting and some specific behaviors, for example, the way an event subscription is validated. The structure of an event is represented by the different properties that are present on an event. For each property, it must define if it is required, the type of the property (string, timestamp, object, etc), the purpose of the property and its constraints (max length, uniqueness, etc).

Azure Event Grid supports three different schemas. The schema must be specified on both Event Grid Topic and Event Grid Subscriber:

Event Grid Schema: it is the default schema. It is defined by Microsoft and the specification can be found here.
Custom Schema: it enables users to define its own schema. It can be used to integrate an existing event-based system.
CloudEvents Schema: it is an open specification for describing event data.

NOTE: By now, both schemas (topic and subscriber) must match.

CloudEvents Schema v1.0

CloudEvents is an open specification for describing event data in a common way. It is hosted by the Cloud Native Computing Foundation (CNCF). It is important to highlight this last point since having such a large foundation supporting and working on CloudEvents ensures a long term life for the open source project.

Working with CloudEvents provides some benefits:

Portability: it makes easier to switch from one event service provider to other when both support CloudEvents.
Accessibility: providing common SDKs in different languages to ease the usage of CloudEvents.

Event Structure

CloudEvents provides a detailed specification of the event data. It can be found on here.

{
    "specversion" : "1.0",
    "type" : "com.github.pull.create",
    "source" : "https://github.com/cloudevents/spec/pull",
    "subject" : "123",
    "id" : "A234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "text/xml",
    "data" : "<much wow=\"xml\"/>"
}

Http Webhook

As mentioned in a previous post from this series, a Webhook subscription must prove that it knows how to handle events before receiving them. This proving step is usually known as subscription validation process. Each event schema might define its own validation process, for example, when using EventGrid Schema the webhook endpoint receives a POST request with a validation code on the payload, that code must be returned on a specific response field.

CloudEvents defines the validation process in detail. In this case, it sends an OPTIONS request with a Webhook-Request-Origin header. It expects a response with that header value on a Webhook-Allowed-Origin header. Additionally, it can define the notification rate limitation using WebHook-Allowed-Rate header.

Code sample

You can find in this repository a code sample of an Azure Function subscribed to a Event Grid Topic using CloudEvents v1.0 schema: https://github.com/DavidGSola/event-grid-good-practices/tree/main/cloud-events-schema/CloudEventSubscriber

Azure Event Grid series: Authenticate Webhook subscriptions

David Sola — Thu, 10 Dec 2020 13:07:57 +0000

TL;DR

Webhook subscriptions are a common way to receive events from Azure Event Grid. It is essential to secure the endpoint and make Azure Event Grid capable of authenticating request to the subscription. Using a secret as query parameter is basic solution to provide authentication and security the Webhook subscription.

Webhook subscription

According to Wikipedia, Webhooks are HTTP callbacks that are usually triggered by some event. On Azure Event Grid, Webhooks are a way to receive events. Those events are delivered as POST requests with the event on the payload.

As in many other event based services, it is necessary to validate and prove that the user defined Webhook is able to receive and process events. On Event Grid, that validation process is usually done in a synchronous way, it sends a special request to the endpoint and expects a specific response from the Webhook. When the validation finishes, the subscription is created and events are started to being delivered to it. This validation process might vary depending on the event schema used on Azure Event Grid. For example, Cloud Event Schema receives the validation request on a OPTIONS request instead of a POST.

Securing a Webhook

As we have discussed before, Webhooks are endpoints where you can receive events, in this case from Azure Event Grid. If this endpoint is public it is crucial to secure it in order to avoid maliciousrequests. On a secured endpoint, unauthenticated requests are discarded and not processed on the Webhook, therefore the question here is: how is it possible to authenticate the delivered requests from Event Grid?

A very common way to secure an endpoint on Event Grid is using secrets as query parameters. It is quite simple, it is just necessary to ask Event Grid to deliver an extra parameter on every event as a query parameter, that extra parameter is simply a password that both services (Event Grid Topic and Event Grid Subscription) know.

This work can be split in three steps:

Create a common secret or password: it is possible to use a script or ARM template to create and store a secret on a KeyVault. That secret can be retrieved later when creating the Webhook subscription or when proving the authentication of the delivered event.
Create Webhook subscription with secret: when creating the subscription via Portal or ARM template it is possible to specify query parameters, so every event will be delivered along with the query parameter. Because it is common to have secrets on those query parameters, they are handled in a special way on Event Grid. By default, those parameters are hidden when retrieving subscription information and from service operators, they are encrypted, not logged, etc. For example, Webhook endpoint on the Azure Portal (picture 1) hides the query parameters. On the other hand, CLI command (picture 2) shows the query parameters when using --include-full-endpoint-url parameter.
Validate secret on subscription endpoint: last but not least, it is necessary to validate the received secret. The expected secret can be easily read from the KeyVault from step 1 and passed as configuration parameter to the subscription endpoint. If working with .NET core, the validation can be done in a custom ActionFilterAttribute:

Picture 1: Event Grid subscription - Portal

Picture 2: Event Grid subscription - CLI

public class WebhookAuthenticationAttribute : ActionFilterAttribute
{
    private static string Secret = "my-secret";

    public override async Task OnActionExecutionAsync(ActionExecutingContext context ActionExecutionDelegate next)
    {
        var queryKey = context.HttpContext.Request.Query["key"];

        if (queryKey != Secret)
        {
            context.Result = new UnauthorizedObjectResult("Authentication failed. Please use a valid key.");
        }

        await base.OnActionExecutionAsync(context, next);
    }
}

[WebhookAuthentication]
public class MyWebhookController : ControllerBase
{
    [HttpPost]
    public IActionResult NewEvent()
    ...
}

Alternatives

Secrets as query parameters is not the only way to secure a Webhook subscription. Alternatively, it is possible to secure the endpoint with Azure AD.

Furthermore, other techniques like Claim check pattern can add an extra layer of security. With this pattern events payload are stored in an external storage and the delivered event includes a reference to its storage. In this case, it will be necessary to authenticate the request to the external storage in order to download the information and process the full event.

Azure Event Grid series: handling big events

David Sola — Mon, 30 Nov 2020 09:30:39 +0000

This is a series of blogs to talk and discuss about good practices and tips for Azure Event Grid. Some of the topics that will be discussed can be applied not only to Event Grid but also to any other event/message based service.

TL;DR

Claim Check Pattern is a widely used pattern to keep events and messages small in order to make them fit into the service size limits. The idea is to use an intermediate storage to save the event/message payload and send the event/message with the stored reference.

Lightweight events

According to the official Microsoft documentation, an event is a lightweight notification of a condition or a state change. Accordingly, Event Grid limits the event's size to 1 MB and are billed on 64 KB slices. Therefore, if we want to keep things cost efficient the idea is to keep events light and under the limit.

But, what does it happen when you need to send some piece of information larger than the service limit?

Claim Check Pattern

The idea behind this pattern is to keep the event light storing the event payload into an external data store and send the reference to the stored payload on the event. Then, the subscriber (the service that receives the event) can download the payload if it is necessary.

For a given event, the pattern can be applied following two different approaches:

Original Event:

{
    "id": "c4a9175a-2f0e-11eb-adc1-0242ac120002",
    "eventType": "NewBlogPost",
    "time": "2020-10-15T14:11:39.2329267Z",
    "data": {
        "title": "My new blog",
        "text": "Really big text ..."
    }
}

Store the whole event including metadata and payload. Event metadata is formed by all the common fields that are shared between different events in an eco-system, for example the eventType or the eventTime. The sent event will contain just a reference to the stored information.

{
    "reference": "https://my-external-data-store.com/mydata/c4a9175a-2f0e-11eb-adc1-0242ac120002"
}

Store only the event payload and keep the event with its metadata. The sent event will contain the metadata and a reference to the stored payload.

{
    "id": "c4a9175a-2f0e-11eb-adc1-0242ac120002",
    "eventType": "NewBlogPost",
    "time": "2020-10-15T14:11:39.2329267Z",
    "reference": "https://my-external-data-store.com/mydata/c4a9175a-2f0e-11eb-adc1-0242ac120002"
}

On one hand, first approach creates a lighter event, however it forces every subscriber to download the event from the external data store. On the other hand, second approach lets subscribers to decide whether they need to download the event or not, depending on the event metadata.

Imagine a case where you have two subscribers for the same event (a new blog entry has been created). First subscriber is in charge of having a blog entries counter, so it won't need to download the event payload from the external data store. On the other hand, second subscriber must create a summary for each blog entry, in that case it will need to download the event payload to fulfill its use case.

Removing the event

When using this pattern is important to remove the event from the storage once it has been consumed. If sender-subscriber is always a one to one relationship, we can keep things simple moving the removal logic into the subscriber, once it finishes processing the event it can remove safely the event from the storage.

However, when using a event driven architecture it is more than usual to have a one to many relationship, so the subscribers can't remove the event or will make impossible to process the event for other subscribers. The solution is to remove asynchronously the event data by an independent service. For example, if you are using Blob Storage to store the event data, this removal can be done easily with a delete policy rule.

Conditional Claim Check Pattern

It is a small improvement to the original idea. In this case, the pattern will be applied only to those heavy events that do not fit the limits. So, instead of storing every single event into the external data store, it is necessary to check first the event size and use the storage only for those large events that do not fit into the limits.

Main advantage of this approach is that it keeps things cost efficient. Cloud data stores are usually expensive so the cost can be cut down if we reduce the number of events that use the storage. However, some extra logic is necessary on both sender and subscriber. On one side, sender must check the event size before sending it. On the other side, subscriber must verify if the received event is a complete one or must be download.

Code samples

You can find in this repository a code sample using Azure Functions, Event Grid and Blob Storage to apply the Claim Check Pattern: https://github.com/DavidGSola/event-grid-good-practices/tree/main/claim-check

A real-time Event Grid viewer with serverless SignalR

David Sola — Wed, 17 Jun 2020 07:43:51 +0000

A step by step guide to build an Event Grid viewer using serverless SignalR and Azure Functions

In a microservice architecture we might have several decoupled microservices driven by events (aka event-based architecture). At some point, a developer might want to see the current status of the system, just a big picture of what is going on. In a classic monolithic system it might be easy since we only have one big service to look at. In an event-based architecture we can use an event viewer, a simple tool for printing the different events that are occurring in our system in live time.

In this post you will find a step by step guide to build a real-time event viewer for Azure Event Grid based on a serverless SignalR service.

The technology

First, let's take a look to the different services and tools we are going to use in this guide:

Azure Event Grid: a service for managing events. It takes care of routing the incoming events to the specific subscription based on filters.
Azure Function: a serverless computing service.
Azure SignalR: a service for enabling real-time web functionality to applications.

So, how do all these pieces fit together?

SignalR Client application (in this case an Angular based app) must connect to the SignalR service. It makes use of a negotiation API exposed in an Azure Function to get the information to connect to the SignalR service, the SignalR url and a corresponding token.
Our Azure Function will be triggered by new events in the system through Event Grid. Using Functions output binding it will push each event into the serverless SignalR service.
Client application will receive in real-time those events using WebSockets.

Now let's move on to explaining the whole process step by step in detail.

1. Creating the Azure Function App

To create the Azure Function App we will use the Azure Function Core tools.

npm i -g azure-functions-core-tools@3 --unsafe-perm true

In a fresh new folder we will create the Function App and the Function:

func init
func new -> select http-trigger -> choose a name

Now we have a fresh new Azure Function ready to be launched and tested:

func start
curl http://localhost:7071/api/CloudEventSubscription

2. Subscribing to Event Grid

Azure Event Grid supports two different event schemas:

Event Grid Schema
Cloud Event Schema v1.0

In this guide, we are going to use Cloud Event Schema v1.0. If you want to use Event Grid Schema you will need to make some minor changes to the validation subscription logic.

Azure Functions have a built-in trigger for Event Grid. However, it only works with Event Grid Schema. Cloud Events subscriptions must be done using a http trigger and doing the subscription validation manually.

To receive events in our Function we need to update the HttpTrigger input binding to allow OPTIONS and POST requests. With CloudEvents schema, OPTIONS verb is used for validating the subscription and POST for receiving the events.

[HttpTrigger(AuthorizationLevel.Function, "options", "post", Route = null)] HttpRequest req

Then add the logic to validate the subscription.

/*
Handle EventGrid subscription validation for CloudEventSchema v1.0
Validation request contains a key in the Webhook-Request-Origin
header, that key must be set in the Webhook-Allowed-Origin response
header to prove to Event Grid that this endpoint is capable of handling CloudEvents events.
*/
if (HttpMethods.IsOptions(req.Method))
{
    if(req.Headers.TryGetValue("Webhook-Request-Origin", out var headerValues))
    {
        var originValue = headerValues.FirstOrDefault();
        if(!string.IsNullOrEmpty(originValue))
        {
            req.HttpContext.Response.Headers.Add("Webhook-Allowed-Origin", originValue);
            return new OkResult();
        }

        return new BadRequestObjectResult("Missing 'Webhook-Request-Origin' header");
    }
}

The Function is ready to be subscribed to Event Grid, but we don't have the Event Grid resource created in Azure yet.

We are going to create the Azure resources directly from the console using Azure CLI. Let's start creating a new resource group.

az group create --name event-grid-viewer-serverless --location northeurope

Now we can create the Event Grid Topic in that resource. To do that we need to make use of an Azure CLI extension.

az extension add -n eventgrid
az eventgrid topic create --resource-group event-grid-viewer-serverless \
   --name topic --location northeurope --input-schema cloudeventschemav1_0

To create the subscription we need to expose the Azure Function using ngrok or any other similar tool.

az eventgrid event-subscription create \ 
   --source-resource-id /subscriptions/<your subscription id>/resourceGroups/event-grid-viewer-serverless/providers/Microsoft.EventGrid/topics/topic \
   --name serverless-signalr-function \
   --endpoint <your public Azure Function url>/api/CloudEventSubscription \ 
   --endpoint-type webhook \
   --event-delivery-schema cloudeventschemav1_0

Then we can test our fresh new subscription sending an event through Event Grid. To make the command easier we can create a sample CloudEvent event in a JSON file.

{
   "specversion":"1.0",
   "type":"com.serverless.event",
   "source":"mysource/",
   "subject":"123",
   "id":"A234-1234-1234",
   "time":"2020-06-14T12:00:00Z",
   "datacontenttype":"application/json",
   "data":"{\"eventProperty\":\"eventValue\"}"
}

The event can be sent using curl but we need to specify the Topic URL and Key that can be retrieved directly from the command line.

// Get the Topic URL
az eventgrid topic show --name topic \
   -g event-grid-viewer-serverless \
   --query "endpoint" --output tsv

// Get the Topic key
az eventgrid topic key list --name topic \
   -g event-grid-viewer-serverless \
   --query "key1" --output tsv

// Send the event
curl --request POST \
   --header "Content-Type: application/cloudevents+json; charset=utf-8" \
   --header "aeg-sas-key: <Topic Key>" \
   --data @event.json <Topic URL>

3. Enabling real-time

As explained at the beginning of the post, we will use SignalR for enabling real-time in our web application. Since we don't have a back-end but a serverless Azure Function for handling the SignalR connection we will use SignalR in serverless mode. We can create it easily with Azure CLI.

az signalr create --name serverless-signalr \
   --resource-group event-grid-viewer-serverless \
   --sku Free_F1 --service-mode Serverless \
   --location northeurope

Then we must install the Azure Functions Binding for SignalR. It will be used to ease the interaction between the Function and SignalR.

func extensions install -p Microsoft.Azure.WebJobs.Extensions.SignalRService -v 1.0.0

Client applications need some credentials to connect to the SignalR service. We need a new negotiate Function that, by making use of the Azure Functions Binding for SignalR, will return the credentials information to connect to SignalR.

[FunctionName("negotiate")]
public static SignalRConnectionInfo GetSignalRInfo(
    [HttpTrigger(AuthorizationLevel.Anonymous, "post")] HttpRequest req,
    [SignalRConnectionInfo(HubName = "hubName")] SignalRConnectionInfo connectionInfo)
{
    return connectionInfo;
}

By default, these bindings expect the SignalR ConnectionString to be set in a specific value on the application settings. To enable it locally we must add a new property to the local.settings.json file.

{
    "IsEncrypted": false,
    "Values": {
        "AzureWebJobsStorage": "UseDevelopmentStorage=true",
        "FUNCTIONS_WORKER_RUNTIME": "dotnet",
        "AzureSignalRConnectionString": "<Your SignalR Connection String>"
    }
}

Then, we want to push the incoming events from Event Grid to SignalR. First, we need to update the original Function to add an output binding for SignalR.

[SignalR(HubName = "hubName")] IAsyncCollector<SignalRMessage> signalRMessages

And add the logic to push the event to the SignalR service.

if(HttpMethods.IsPost(req.Method))
{
    string @event = await new StreamReader(req.Body).ReadToEndAsync();
    await signalRMessages.AddAsync(new SignalRMessage
    {
        Target = "newEvent",
        Arguments = new[] { @event }
    });
}

4. Client application

The last piece of the puzzle is the client application. First, it will negotiate the credentials with the Function, and then it will wait for new incoming messages from SignalR. In this tutorial we are going to use Angular (and Angular CLI) to create the client application.

ng new viewer-app

There is a npm package @aspnet/signalr that makes the use of SignalR in the client side very easy.

npm i @aspnet/signalr --save

This package handles for you the negotiation step and the connection to SignalR. We can add some simple logic to receive the events from a specific hub and log it into the console.

import * as SignalR from '@aspnet/signalr';

export class AppComponent {

  title = 'viewer-app';

  private hubConnection: SignalR.HubConnection;

  constructor() {
    // Create connection
    this.hubConnection = new SignalR.HubConnectionBuilder()
      .withUrl("http://localhost:7071/api")
      .build();

    // Start connection. This will call the negotiate endpoint
    this.hubConnection
      .start();

    // Handle incoming events for the specific target
    this.hubConnection.on("newEvent", (event) => {
      console.log(event)
    });
  }
}

To wrap up

In this tutorial we have created a simple but effective real-time event viewer for Event Grid using some cool stuff like serverless SignalR. This client application can be easily enhanced to show the events in the page rather than the console.

You can find a full example working with some minor UI enhancements on the following repo: https://github.com/DavidGSola/serverless-eventgrid-viewer