<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Foteini Savvidou</title>
    <description>The latest articles on Forem by Foteini Savvidou (@sfoteini).</description>
    <link>https://forem.com/sfoteini</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F464337%2Fb26dab5c-7eb8-4b7e-991c-3d9760c8e2b5.jpg</url>
      <title>Forem: Foteini Savvidou</title>
      <link>https://forem.com/sfoteini</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sfoteini"/>
    <language>en</language>
    <item>
      <title>Building an educational game with AI tools and Azure Static Web Apps (Part 2)</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Wed, 08 Jan 2025 20:30:16 +0000</pubDate>
      <link>https://forem.com/sfoteini/building-an-educational-game-with-ai-tools-and-azure-static-web-apps-part-2-moh</link>
      <guid>https://forem.com/sfoteini/building-an-educational-game-with-ai-tools-and-azure-static-web-apps-part-2-moh</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/sfoteini/building-an-educational-game-with-ai-tools-and-azure-static-web-apps-part-1-4kl7"&gt;Part 1&lt;/a&gt;, we explored how Ren'Py, a visual novel engine built on Python, and AI tools like GitHub Copilot, Azure OpenAI Service, and Microsoft Designer can be used to create a prototype for an educational game. In this post, I will share how I:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Created a GitHub Actions workflow to automate the build and deployment process for the game.&lt;/li&gt;
&lt;li&gt;Used Azure Static Web Apps preview environments to review changes in the game before deploying them to production.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Curious to see the result? You can &lt;a href="https://nice-moss-005f18b03.4.azurestaticapps.net/" rel="noopener noreferrer"&gt;play the game online&lt;/a&gt; and find the source code at my &lt;a href="https://github.com/sfoteini/christmas-copilot-quest" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a CI/CD workflow
&lt;/h2&gt;

&lt;p&gt;Manually building and deploying the game with each update can quickly become tedious. Fortunately, Ren'Py includes a CLI tool for automation, and Azure Static Web Apps integrates seamlessly with GitHub Actions, making it possible to automate the entire process.&lt;/p&gt;

&lt;p&gt;My goal was to create a simple workflow that would meet the following requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically build the web version of the game and push it to a separate branch (e.g., the &lt;code&gt;gh-pages&lt;/code&gt; branch) for builds triggered by the &lt;code&gt;main&lt;/code&gt; branch.&lt;/li&gt;
&lt;li&gt;Deploy the game to the production environment after a successful build from the &lt;code&gt;main&lt;/code&gt; branch.&lt;/li&gt;
&lt;li&gt;Provide the option to manually trigger builds and deploy to a preview or production environment from any branch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below, I outline the approach I followed. While this is not necessarily the best practice, it serves as documentation for the process I followed, as this was one of my first experiences creating a complete CI/CD workflow. The diagram below provides a high-level overview of the workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyuea07emu1z4oovg8vx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyuea07emu1z4oovg8vx.png" alt="Workflow diagram for automating the build and deployment of the game to Azure Static Web Apps using GitHub Actions" width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workflow diagram for automating the build and deployment of the game to Azure Static Web Apps using GitHub Actions.&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The source code for the game is stored in the &lt;code&gt;main&lt;/code&gt; branch of my GitHub repository.&lt;/li&gt;
&lt;li&gt;When a new commit is pushed to the &lt;code&gt;main&lt;/code&gt; branch, a GitHub Actions workflow is triggered. This workflow builds the web version of the game and deploys it to Azure Static Web Apps.&lt;/li&gt;
&lt;li&gt;Additionally, the workflow can be triggered manually to build and deploy the game to a preview environment, allowing testing before merging changes into the &lt;code&gt;main&lt;/code&gt; branch.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What is Azure Static Web Apps?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/products/app-service/static?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure Static Web Apps (SWA)&lt;/a&gt; is a cloud service designed to simplify the deployment and hosting of modern web applications. A static website consists of pre-rendered HTML, CSS, JavaScript, and media files that don't require server-side rendering. Azure SWA automates the process of building and deploying the web app by integrating directly with GitHub Actions. Whenever changes are pushed to a monitored branch or a pull request is opened, a new version of the website can be built and deployed.&lt;/p&gt;

&lt;p&gt;Azure SWA offers additional, powerful features like Azure Functions integration for APIs, database integration, and support for custom domains that go beyond the scope of this project. You can explore the available features in the &lt;a href="https://learn.microsoft.com/azure/static-web-apps?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure SWA documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For this project, the Ren'Py CLI generates a web version of the game, which is then deployed to Azure SWA using the &lt;code&gt;static-web-apps-deploy&lt;/code&gt; GitHub Action. Below is an example of the default parameters for this action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to Azure Static Web Apps&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Azure/static-web-apps-deploy@v1&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure_static_web_apps_api_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AZURE_STATIC_WEB_APPS_API_TOKEN }}&lt;/span&gt;
    &lt;span class="na"&gt;repo_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;upload"&lt;/span&gt;
    &lt;span class="na"&gt;app_location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/"&lt;/span&gt;
    &lt;span class="na"&gt;api_location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
    &lt;span class="na"&gt;output_location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setting up a preview environment in Azure SWA
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/azure/static-web-apps/preview-environments?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Preview environments&lt;/a&gt; are an essential feature of Azure Static Web Apps (SWA) that enable you to review changes to your application before pushing them to production. Azure SWA supports three types of preview environment deployments:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pull requests&lt;/strong&gt;: Each pull request deploys a preview version of your app to a temporary URL. Once the pull request is merged or closed, the temporary environment is removed. If you use the default GitHub Actions template provided by Azure SWA, this process is configured by default.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Branch&lt;/strong&gt;: Changes made to branches other than the production branch can deploy a preview version of the app to a stable URL that includes the branch name. To enable branch preview environments, you can set the &lt;code&gt;production_branch&lt;/code&gt; input parameter in the &lt;code&gt;static-web-apps-deploy&lt;/code&gt; GitHub Action.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Named environment&lt;/strong&gt;: Named environments allow you to create stable preview deployments with a custom name. These environments are ideal for staging or manual testing. To enable named environments, you need to set the &lt;code&gt;deployment_environment&lt;/code&gt; input parameter in the &lt;code&gt;static-web-apps-deploy&lt;/code&gt; GitHub Action to the environment's name.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For this project, I used the named environment feature as it is suitable for manual, controlled deployments. To implement this, I added the &lt;code&gt;deployment_environment&lt;/code&gt; input parameter to the deployment task. This parameter is set to the preview environment's name or left empty for deploying to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bringing it all together
&lt;/h3&gt;

&lt;p&gt;The final GitHub Actions workflow that I created automates the game's build and deployment processes through the following jobs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build the game using Ren'Py CLI&lt;/strong&gt;: The web version of the game is generated using the Ren'Py CLI. The generated files are pushed to the &lt;code&gt;gh-pages&lt;/code&gt; branch. If the &lt;code&gt;gh-pages&lt;/code&gt; branch does not exist, it is created.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deploy the game to Azure Static Web Apps&lt;/strong&gt;: The &lt;code&gt;gh-pages&lt;/code&gt; branch is deployed to Azure Static Web Apps using the Azure Static Web Apps GitHub Action described in the previous section.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The diagram below provides a high-level overview of the steps executed in the workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4qienj3rhg1zaocehaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4qienj3rhg1zaocehaq.png" alt="High-level overview of the steps for building the web version of the game with the Ren'Py CLI and deploying it to Azure Static Web Apps" width="800" height="312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;High-level overview of the steps for building the web version of the game with the Ren'Py CLI and deploying it to Azure Static Web Apps.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There are two possible triggers for the workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automatic trigger on push to &lt;code&gt;main&lt;/code&gt;&lt;/strong&gt;: The workflow is triggered automatically when a new commit is pushed to the &lt;code&gt;main&lt;/code&gt; branch. The workflow builds the game and deploys it to the production environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Manual trigger&lt;/strong&gt;: The workflow can be triggered manually to build and deploy the game to either a preview or production environment from any branch. The input parameters required include the source branch (containing Ren'Py game files), target branch (for generated files), and environment name.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this post, I described the steps I took to create a GitHub Actions workflow for automating the build and deployment of a Ren'Py game to Azure Static Web Apps. I also showed how preview environments can be used to test changes before they are deployed to production.&lt;/p&gt;

&lt;p&gt;If you're interested in learning more about Azure Static Web Apps or GitHub Actions, check out the following courses on Microsoft Learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/training/paths/azure-static-web-apps?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure Static Web Apps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/training/paths/github-actions?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Automate your workflow with GitHub Actions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>githubcopilot</category>
      <category>staticwebapps</category>
      <category>azure</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Building an educational game with AI tools and Azure Static Web Apps (Part 1)</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Wed, 08 Jan 2025 20:23:04 +0000</pubDate>
      <link>https://forem.com/sfoteini/building-an-educational-game-with-ai-tools-and-azure-static-web-apps-part-1-4kl7</link>
      <guid>https://forem.com/sfoteini/building-an-educational-game-with-ai-tools-and-azure-static-web-apps-part-1-4kl7</guid>
      <description>&lt;p&gt;Have you ever wondered how games can transform the way we learn? For me, the idea of blending creativity, technology, and a touch of fun has always been fascinating. Recently, I had the chance to explore this by building an educational visual novel game. Although I had no prior experience in game development, I was eager to try something new, so I took my first steps with Ren'Py, a visual novel engine built on Python. Using AI tools to speed up development and Azure Static Web Apps for deployment, I created a prototype for an interactive game that teaches you how to use GitHub Copilot to assist with coding tasks. Since it was December, I added a festive touch to the game to make the experience more fun.&lt;/p&gt;

&lt;p&gt;In this post, I'll share how I:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built the game using the Ren'Py framework.&lt;/li&gt;
&lt;li&gt;Used AI tools like GitHub Copilot and Azure OpenAI Service to speed up development and generate the game's visual assets.&lt;/li&gt;
&lt;li&gt;Automated the build and deployment process using GitHub Actions and &lt;a href="https://azure.microsoft.com/products/app-service/static?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure Static Web Apps&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Curious to see the result? You can &lt;a href="https://nice-moss-005f18b03.4.azurestaticapps.net/" rel="noopener noreferrer"&gt;play the game online&lt;/a&gt; and find the source code at my &lt;a href="https://github.com/sfoteini/christmas-copilot-quest" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it all started
&lt;/h2&gt;

&lt;p&gt;There are several educational games about cloud technologies, covering everything from beginner to advanced users. For example, Microsoft offers &lt;a href="https://mtq.microsoft.com/" rel="noopener noreferrer"&gt;Microsoft Technical Quest&lt;/a&gt;, a card-based game where you build a reference architecture using Azure services. Other cloud providers, like AWS, also have &lt;a href="https://www.aboutamazon.com/news/aws/aws-game-training-cloud-skills" rel="noopener noreferrer"&gt;game-based training&lt;/a&gt; to help you learn how to build cloud solutions in a fun, gamified way.&lt;/p&gt;

&lt;p&gt;A few years ago, I played a game called &lt;a href="https://techcommunity.microsoft.com/blog/educatordeveloperblog/blast-off-with-azure-advocates-presenting-the-azure-space-mystery/2136640?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure Space Mystery&lt;/a&gt;, created by Microsoft Cloud Advocates for the International Day for Women and Girls in Science. It was a text-based game with rich graphics that included questions to move to the next level. The idea was to bring Microsoft Learn content closer to developers in a fun way.&lt;/p&gt;

&lt;p&gt;I wanted to try building something similar, even though I had no experience in game development. My idea was to create a text-based game that teaches a technical concept, provides short quizzes, and rewards players with achievements as they progress. Since GitHub Copilot has been getting so much attention lately, I decided to make it the main focus of the game. Because I started working on the game in December, I also added a festive theme.&lt;/p&gt;

&lt;p&gt;The result was &lt;strong&gt;Christmas Copilot Quest&lt;/strong&gt;, a game where players learn how to use GitHub Copilot in Visual Studio Code to help with coding tasks. Guided by GingerBot, Santa's friendly Copilot-powered assistant, players are introduced to the features of GitHub Copilot in an interactive way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5ekc58xckc8d8esgk8u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5ekc58xckc8d8esgk8u.png" alt="Screenshots of the game showcasing the main menu, an example of dialogue, and the learning resources page" width="800" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Screenshots of the game showcasing the main menu, an example of dialogue, and the learning resources page.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a text-based game
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The tech stack
&lt;/h3&gt;

&lt;p&gt;There are many great tools for building text-based games, but I had a few key requirements in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for writing non-linear stories with quizzes or choices that affect the flow of the game.&lt;/li&gt;
&lt;li&gt;The ability to customize the game's user interface.&lt;/li&gt;
&lt;li&gt;Flexibility to write custom components.&lt;/li&gt;
&lt;li&gt;Support for deploying the game as a web app.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since Python is my go-to language, I was naturally drawn to &lt;a href="https://www.renpy.org/" rel="noopener noreferrer"&gt;Ren'Py&lt;/a&gt;, a visual novel engine built on Python. It met all my requirements, offering an easy-to-use scripting language for writing the story, defining quizzes, and customizing the UI. It also provides the flexibility to extend its built-in functionality using Python and export the game to desktop, mobile, and web platforms. Additionally, the availability of the Ren'Py Command Line Interface (CLI) allowed me to automate the process of building and deploying the game.&lt;/p&gt;

&lt;h3&gt;
  
  
  The game structure
&lt;/h3&gt;

&lt;p&gt;The game is built around three core components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Script&lt;/strong&gt;: This includes the game's story and quizzes. The narrative can take the form of monologues or dialogues between characters. The story is organized into sections identified by labels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkbbt0wni6gb013382d8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkbbt0wni6gb013382d8.png" alt="Screenshots of the game showcasing an example of dialogue and a quiz for the player to answer" width="800" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Screenshots of the game showcasing an example of dialogue and a quiz for the player to answer.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graphical User Interface (GUI)&lt;/strong&gt;: This covers the screens and menus displayed throughout the game as well as the visual elements used. Ren'Py scripting language allows for both the customization of built-in screens (e.g., buttons and menus) and the creation of new ones, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A notification screen to display information to the player when an achievement is unlocked.&lt;/li&gt;
&lt;li&gt;A menu to show earned achievements and learning resources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2y83p44kv123ase5dzu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2y83p44kv123ase5dzu.png" alt="Custom game screens: character selection, achievement unlocked notification, and earned achievements screen" width="800" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Custom game screens: character selection (left), achievement unlocked notification (center), and earned achievements screen (right).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom Python code&lt;/strong&gt;: This provides additional functionality tailored to the game's needs. The latest version of Ren'Py supports scripting custom components using Python 3.9. Custom components in the game include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An achievement system, allowing players to earn rewards for completing specific tasks, with achievements being stored persistently.&lt;/li&gt;
&lt;li&gt;Character definitions and utility functions enabling players to select their preferred character and name.&lt;/li&gt;
&lt;li&gt;GUI utilities, such as image transformations (e.g., eye blinks) and custom fonts and text styles written in Python.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To keep the game maintainable and extensible, these components were kept separate. The script defines the story, while the logic is organized into Python modules, which are called stores in Ren'Py. For example, a Python function that determines the player's name based on their input can be called directly from the script as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;label introduction:
    felix "Ah, you must be the new coder Santa called for! What's your name?"

    $ player_input = renpy.input(
        _("(Type your name and press Enter, or press Enter to use the default name, [character_name].)")
    )
    $ player_name = character_utils.determine_player_name(player_input)

    player "I'm [player_name]."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Integrating AI tools in development
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Using GitHub Copilot for coding assistance
&lt;/h3&gt;

&lt;p&gt;I found GitHub Copilot very helpful for navigating and understanding Ren'Py, a framework I had never worked with before. Although Ren'Py is not a widely used framework, and likely has limited data available for training, GitHub Copilot correctly answered most of my questions and helped me understand how various Ren'Py components work. One example was its suggestion for implementing a character selection screen with image buttons.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanbz80xz1jlbha7adqvt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanbz80xz1jlbha7adqvt.png" alt="GitHub Copilot's suggested implementation for the character selection screen" width="628" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GitHub Copilot's suggested implementation for the character selection screen.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The suggestion was quite accurate and provided a solid starting point, allowing me to quickly develop the screen. However, Copilot did not provide a correct implementation for adding a hover transition to the buttons of the screen. This was somewhat expected, given that Ren'Py is not a widely used language.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using AI tools for image generation
&lt;/h3&gt;

&lt;p&gt;I wanted to create a simple game quickly, so I didn't have the time to design all the graphics myself. I used AI-powered image generation tools to create the characters and background images used in the game. My key requirements were consistent style across all images and a festive theme.&lt;/p&gt;

&lt;p&gt;I started with the DALL-E 3 model, available through the Azure OpenAI Service. While the images were decent, I found it challenging to keep a consistent style across all images.&lt;/p&gt;

&lt;p&gt;Then, I switched to Microsoft Designer, an AI-powered cloud-based graphic design application by Microsoft. This gave me more consistent results across different requests, making it a better fit for my needs. For character creation, I used the avatar text-to-image feature, and for the backgrounds, I used the default one. After experimenting with several styles, I settled on a "low poly" aesthetic because it delivered both consistent results and a look that matched the game's playful, gamified nature. Here's an example of a prompt I used to generate a character image:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Low-poly 3D portrait of a stylized woman with brown hair, wearing a blouse in christmas colors, featuring clean geometric shapes, flat colors, and soft lighting, in a minimalist futuristic style with white background."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The generated images were processed by applying filters to reduce noise, smooth the colors, remove the background, and highlight the edges of the polygons. In some cases, I combined two images to create the final character design and generated duplicates with closed eyes to produce the eye blinking effect in the game.&lt;/p&gt;

&lt;p&gt;Overall, I found these tools extremely helpful for quickly generating visual assets. While I created hundreds of versions for each character (okay, maybe I'm a bit of a perfectionist!), AI tools were invaluable for generating images that aligned with the game's theme.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this post, we explored how Ren'Py, a visual novel engine built on Python, and AI tools such as GitHub Copilot, Azure OpenAI Service, and Microsoft Designer can be used to create a prototype for an educational game. But what's next? Now, we need to deploy the app. Fortunately, Ren'Py offers a CLI tool to automate game builds, and Azure Static Web Apps integrates seamlessly with GitHub Actions. In the next post, I'll show you how I set up a GitHub Actions workflow to automate the build and deployment process for the game.&lt;/p&gt;

&lt;p&gt;In the meantime, you can check out the following resources to learn more about GitHub Copilot and the DALL-E 3 model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/training/paths/copilot?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;GitHub Copilot Fundamentals - Understand the AI pair programmer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/training/paths/accelerate-app-development-using-github-copilot?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Accelerate app development by using GitHub Copilot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/copilot/example-prompts-for-github-copilot-chat" rel="noopener noreferrer"&gt;GitHub Copilot Chat Cookbook&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/training/modules/generate-images-azure-openai?wt.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Generate images with Azure OpenAI Service&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>githubcopilot</category>
      <category>staticwebapps</category>
      <category>azure</category>
      <category>python</category>
    </item>
    <item>
      <title>Use HNSW index on Azure Cosmos DB for PostgreSQL for similarity search</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Thu, 14 Mar 2024 20:45:02 +0000</pubDate>
      <link>https://forem.com/sfoteini/use-hnsw-index-on-azure-cosmos-db-for-postgresql-for-similarity-search-5chk</link>
      <guid>https://forem.com/sfoteini/use-hnsw-index-on-azure-cosmos-db-for-postgresql-for-similarity-search-5chk</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/sfoteini/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search-7nh"&gt;previous post&lt;/a&gt;, you explored the IVFFlat (Inverted File with Flat Compression) index for approximate nearest neighbor search on Azure Cosmos DB for PostgreSQL. You observed that the IVFFlat index provides accurate results with lower search times compared to exact nearest neighbor search.&lt;/p&gt;

&lt;p&gt;The pgvector extension provides another indexing algorithm for approximate nearest neighbor search called Hierarchical Navigable Small World (HNSW) graphs. HNSW is one of the most popular and best-performing indexes for vector similarity search. HNSW index support was introduced in pgvector 0.5.0.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create an HNSW index in your Azure Cosmos DB for PostgreSQL table.&lt;/li&gt;
&lt;li&gt;Write SQL queries to detect similar images based on a text prompt or a reference image, utilizing the HNSW index.&lt;/li&gt;
&lt;li&gt;Investigate the execution plan of a similarity search query.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Azure subscription - Create an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure free account&lt;/a&gt; or an &lt;a href="https://azure.microsoft.com/free/students/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure for Students account&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Python 3.10, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Set-up your working environment
&lt;/h3&gt;

&lt;p&gt;In this guide, you'll learn how to query embeddings stored in an Azure Cosmos DB for PostgreSQL table to search for images similar to a search term or a reference image. The entire functional project is available in my &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. If want to follow along, just fork the repository and clone it to have it locally available.&lt;/p&gt;

&lt;p&gt;Before running the Jupyter Notebook covered in this post, you should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;a href="https://docs.python.org/3/library/venv.html" rel="noopener noreferrer"&gt;virtual environment&lt;/a&gt; and activate it.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the required Python packages using the following command:&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create vector embeddings for a collection of images by running the scripts found in the &lt;em&gt;data_processing&lt;/em&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upload the images to your Azure Blob Storage container by executing the script found in the &lt;em&gt;data_upload&lt;/em&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How the HNSW index works
&lt;/h2&gt;

&lt;p&gt;The HNSW index is based on the construction of a multi-layered graph structure, that is optimized for performing approximate nearest neighbor search. In this graph structure, the datapoints (also referred to as nodes or vertices) are connected to each other by edges, which make it possible to navigate through the graph by following these edges.&lt;/p&gt;

&lt;p&gt;The base layer of the multi-layer graph essentially represents the entire dataset, while the higher layers consist of fewer nodes, providing a simplified overview of the layers below. The higher layers contain longer links, allowing for longer jumps between nodes for faster search, while the lower layers contain shorter links, enabling more accurate search. Nearest neighbor search begins at the top layer, where the longest links are present. We then navigate to the nearest node and gradually move to lower layers until a local minimum is found. This process is illustrated in the following image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-hnsw-index-on-azure-cosmos-db-for-postgresql-for-similarity-search%2Fhnsw_search_process_hub0ae39f9f55fafbecdc38e78b851d2a1_397094_750x577_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-hnsw-index-on-azure-cosmos-db-for-postgresql-for-similarity-search%2Fhnsw_search_process_hub0ae39f9f55fafbecdc38e78b851d2a1_397094_750x577_fit_q95_h2_lanczos_3.webp" alt="The search process in an HNSW graph."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The search process in an HNSW graph.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The search process in an HNSW graph can be compared to the process of planning a trip between two cities. Much like how we start our journey with major roads and gradually transition to smaller ones as we approach our destination, the HNSW search process begins with longer links at the top layer and gradually moves to lower layers as we approach the desired data points.&lt;/p&gt;

&lt;p&gt;The HNSW algorithm is based on two fundamentals techniques: the probability skip list and the navigable small world (NSW) graphs. A detailed explanation of the process of constructing an index and searching through the graph is beyond the scope of this article. For further information, refer to the resources provided at the end of this article.&lt;/p&gt;

&lt;p&gt;Compared to the IVFFlat index, the HNSW index generally provides better query performance in terms of the tradeoff between recall and speed, but at the expense of higher build time and more memory usage. Additionally, it doesn't require a training step to build the index. This means you can create an HNSW index even before any data is inserted into the table, unlike the IVFFlat index, which needs to be rebuilt when data changes to accurately represent new cluster centroids.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create an HNSW index
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The code for creating an HNSW index and inserting data into a PostgreSQL table can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/data_upload/upload_data_to_postgresql_hnsw.py" rel="noopener noreferrer"&gt;data_upload/upload_data_to_postgresql_hnsw.py&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To create an HNSW index through the pgvector extension, three parameters need to be specified:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Distance&lt;/strong&gt;: The pgvector extension provides 3 methods for calculating the distance between vectors: Euclidean (L2), inner product, and cosine. These methods are identified by &lt;code&gt;vector_l2_ops&lt;/code&gt;, &lt;code&gt;vector_ip_ops&lt;/code&gt;, and &lt;code&gt;vector_cosine_ops&lt;/code&gt;, respectively. It is essential to select the same distance metric for both the creation and querying of the index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;m&lt;/strong&gt;: The parameter &lt;code&gt;m&lt;/code&gt; specifies the maximum number of connections with neighboring datapoints per point per layer. Its default value is &lt;code&gt;16&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ef_construction&lt;/strong&gt;: The parameter &lt;code&gt;ef_construction&lt;/code&gt; defines the size of list that holds the nearest neighbor candidates when building the index. The default value is &lt;code&gt;64&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To create an HNSW index in a PostgreSQL table, you can use the following statement:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;vector_column_name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;distance_method&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ef_construction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ef_construction&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Detect similar images using the pgvector HNSW index
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The code for image similarity search with the pgvector extension can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/vector_search_samples/image_search_hnsw_index.ipynb" rel="noopener noreferrer"&gt;vector_search_samples/image_search_hnsw_index.ipynb&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To search for similar images through the HNSW index of the pgvector extension, we can use SQL &lt;code&gt;SELECT&lt;/code&gt; statements and the built-in distance operators. The structure of a &lt;code&gt;SELECT&lt;/code&gt; statement was explained in the &lt;a href="https://dev.to/sfoteini/use-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql-2c30"&gt;Exact Nearest Neighbor Search&lt;/a&gt; blog post. For approximate nearest neighbor search, an additional parameter needs to be considered to use the HNSW index.&lt;/p&gt;
&lt;h3&gt;
  
  
  Approximate nearest neighbor search using the HNSW index
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;ef_search&lt;/code&gt; parameter specifies the size of the list that holds the nearest neighbor candidates during query execution. The default value is set to &lt;code&gt;40&lt;/code&gt;. The &lt;code&gt;ef_search&lt;/code&gt; parameter can be altered (for a single query or a session) using the following command:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ef_search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To check whether PostgreSQL utilizes the index in a query, you can prefix the &lt;code&gt;SELECT&lt;/code&gt; statement with the &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; keywords. An example of a query plan that utilizes the HNSW index is provided below:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Limit (cost=160.60..163.02 rows=12 width=72) (actual time=1.283..1.406 rows=12 loops=1)
  -&amp;gt;  Index Scan using paintings_hnsw_vector_idx on paintings (cost=160.60..2416.67 rows=11206 width=72) (actual time=1.281..1.403 rows=12 loops=1)
        Order By: (vector &amp;lt;=&amp;gt; '[0.001363333, ..., -0.0010466448]'::vector)
Planning Time: 0.183 ms
Execution Time: 1.439 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Additionally, it is important to note that pgvector only supports ascending-order index scans. This means that the following query does not utilize the HNSW index:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;image_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.003, …, 0.034]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_similarity&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;paintings&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cosine_similarity&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;One possible way to rewrite the SELECT statement to use the index is provided below:&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;image_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.003, …, 0.034]'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;paintings&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Code sample: Image similarity search with HNSW index&lt;br&gt;
&lt;/h3&gt;

&lt;p&gt;In the &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/vector_search_samples/image_search_hnsw_index.ipynb" rel="noopener noreferrer"&gt;Jupyter Notebook&lt;/a&gt; provided on my GitHub repository, you'll explore text-to-image and image-to-image search scenarios. You will use the same text prompts and reference images as in the Exact Nearest Neighbors search example, allowing for a comparison of the accuracy of the results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-hnsw-index-on-azure-cosmos-db-for-postgresql-for-similarity-search%2Fhnsw-search-results_hu5ff146a54ba415e848d96a5736e4a7a1_1078241_824x865_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-hnsw-index-on-azure-cosmos-db-for-postgresql-for-similarity-search%2Fhnsw-search-results_hu5ff146a54ba415e848d96a5736e4a7a1_1078241_824x865_fit_q95_h2_lanczos_3.webp" alt="Images retrieved by searching for paintings using the painting "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Images retrieved by searching for paintings using the painting "Still Life with Flowers" by Charles Ginner as a reference. The HNSW index retrieved all the paintings obtained with exact search.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Feel free to experiment with the notebook and modify the code to gain hands-on experience with the pgvector extension!&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;The pgvector extension and PostgreSQL provide additional features that you can leverage to build AI-powered search applications. For example, you can integrate vector search with conventional keyword-based search methods into hybrid search systems, which generally have better performance.&lt;/p&gt;

&lt;p&gt;If you want to learn more about the HNSW algorithm, check out these learning resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alexander Ponomarenko, Yury Malkov, Andrey Logvinov, Vladimir Krylov, &lt;a href="https://www.iiis.org/CDs2011/CD2011IDI/ICTA_2011/PapersPdf/CT175ON.pdf" rel="noopener noreferrer"&gt;Approximate Nearest Neighbor Search Small World Approach&lt;/a&gt; (2011)&lt;/li&gt;
&lt;li&gt;Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, Vladimir Krylov, &lt;a href="https://www.sciencedirect.com/science/article/abs/pii/S0306437913001300" rel="noopener noreferrer"&gt;Approximate nearest neighbor algorithm based on navigable small world graphs&lt;/a&gt; (2014)&lt;/li&gt;
&lt;li&gt;Yury Malkov, Dmitry Yashunin, &lt;a href="https://dl.acm.org/doi/10.1109/TPAMI.2018.2889473" rel="noopener noreferrer"&gt;Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs&lt;/a&gt; (2020)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/playlist?list=PLIUOU7oqGTLhlWpTz4NnuT3FekouIVlqc" rel="noopener noreferrer"&gt;Vector Similarity Search and Faiss Course&lt;/a&gt; by James Briggs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://towardsdatascience.com/similarity-search-part-4-hierarchical-navigable-small-world-hnsw-2aad4fe87d37" rel="noopener noreferrer"&gt;Similarity Search, Part 4: Hierarchical Navigable Small World (HNSW)&lt;/a&gt; by Vyacheslav Efimov – Towards Data Science&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cosmos-db/postgresql/howto-optimize-performance-pgvector?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;How to optimize performance when using pgvector on Azure Cosmos DB for PostgreSQL – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;Official GitHub repository of the pgvector extension&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineer and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/" rel="noopener noreferrer"&gt;Blog &lt;/a&gt;| &lt;a href="https://github.com/sfoteini" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>cosmosdb</category>
      <category>postgres</category>
      <category>ai</category>
    </item>
    <item>
      <title>Use IVFFlat index on Azure Cosmos DB for PostgreSQL for similarity search</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Sun, 03 Mar 2024 11:45:39 +0000</pubDate>
      <link>https://forem.com/sfoteini/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search-7nh</link>
      <guid>https://forem.com/sfoteini/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search-7nh</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/sfoteini/use-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql-2c30"&gt;previous post&lt;/a&gt;, you developed an image similarity search app using Jupyter Notebook. Utilizing the pgvector extension on Azure Cosmos DB for PostgreSQL, you were able to detect images that are semantically similar to a reference image or a text prompt.&lt;/p&gt;

&lt;p&gt;By default, pgvector performs exact nearest neighbor search, calculating the similarity between the query vector and every vector in the database. While this type of search provides perfect recall, it often leads to longer search times. To enhance efficiency for large datasets, you should create indexes to enable approximate nearest neighbor search, which trades off result quality for speed. Pgvector supports two types of approximate indexes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inverted File with Flat Compression (IVFFlat) index&lt;/li&gt;
&lt;li&gt;Hierarchical Navigable Small World (HNSW) index&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we will explore similarity search using an IVFFlat index. In the next post, we will work with the HNSW index, which is one of the best performing indexes for vector similarity search.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create an IVFFlat index in your Azure Cosmos DB for PostgreSQL table.&lt;/li&gt;
&lt;li&gt;Write SQL queries to detect similar images based on a text prompt or a reference image, utilizing the IVFFlat index.&lt;/li&gt;
&lt;li&gt;Investigate the execution plan of a similarity search query.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Azure subscription - Create an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971"&gt;Azure free account&lt;/a&gt; or an &lt;a href="https://azure.microsoft.com/free/students/?WT.mc_id=AI-MVP-5004971"&gt;Azure for Students account&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Python 3.10, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Set-up your working environment
&lt;/h3&gt;

&lt;p&gt;In this guide, you'll learn how to query embeddings stored in an Azure Cosmos DB for PostgreSQL table to search for images similar to a search term or a reference image. The entire functional project is available in the &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql"&gt;GitHub repository&lt;/a&gt;. If want to follow along, just fork the repository and clone it to have it locally available.&lt;/p&gt;

&lt;p&gt;Before running the Jupyter Notebook covered in this post, you should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;a href="https://docs.python.org/3/library/venv.html"&gt;virtual environment&lt;/a&gt; and activate it.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the required Python packages using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create vector embeddings for a collection of images by running the scripts found in the &lt;em&gt;data_processing&lt;/em&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upload the images to your Azure Blob Storage container by executing the script found in the &lt;em&gt;data_upload&lt;/em&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How the IVFFlat index works
&lt;/h2&gt;

&lt;p&gt;The IVFFlat algorithm accelerates vector search by grouping the vectors in the dataset into clusters (also known as &lt;em&gt;Voronoi regions&lt;/em&gt; or &lt;em&gt;cells&lt;/em&gt;) and limiting the search scope to the few nearest clusters for each query rather than the entire dataset.&lt;/p&gt;

&lt;p&gt;Let’s gain an intuitive understanding of how IVFFlat works. Consider that we place our high-dimensional vectors in a two-dimensional vector space. We then apply k-means clustering to compute the cluster centroids. After identifying the centroids, we assign each vector in our dataset to the closest centroid based on proximity. This process results in the partition of the vector space into several non-intersecting regions (&lt;em&gt;Voronoi Diagram&lt;/em&gt;), as depicted in the following image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8CL4FJid--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://sfoteini.github.io/images/post/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search/ivfflat_creation_hub5092a18b03d371849b3bc13630ce472_692033_850x506_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8CL4FJid--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://sfoteini.github.io/images/post/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search/ivfflat_creation_hub5092a18b03d371849b3bc13630ce472_692033_850x506_fit_q95_h2_lanczos_3.webp" alt="The process of constructing the Voronoi diagram. In this scenario, we have four centroids, resulting in four Voronoi cells. Each vector is assigned to its closest centroid." width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The process of constructing the Voronoi diagram. In this scenario, we have four centroids, resulting in four Voronoi cells. Each vector is assigned to its closest centroid.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now, each vector falls within a region. In the context of similarity search, each region consists of vectors that are semantically similar. Since vectors within the same region are more likely to be similar to each other than those in different regions, IVFFlat makes the search process more efficient.&lt;/p&gt;

&lt;p&gt;Let’s consider the query vector, as shown in the following image. To find the nearest neighbors to this query vector using IVFFlat, we perform the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Calculate the distance between the query vector and each centroid.&lt;/li&gt;
&lt;li&gt;Identify the cell whose centroid is closest to the query vector and limit the search scope to that cell.&lt;/li&gt;
&lt;li&gt;Compute the distance between the query vector and every vector within the selected cell.&lt;/li&gt;
&lt;li&gt;Choose the vectors with the smallest distance as the nearest neighbors to the query vector.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jzNanDuE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://sfoteini.github.io/images/post/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search/ivfflat_search_hu2c8d195042bce3d407643f475f0efe1e_729960_850x715_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jzNanDuE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://sfoteini.github.io/images/post/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search/ivfflat_search_hu2c8d195042bce3d407643f475f0efe1e_729960_850x715_fit_q95_h2_lanczos_3.webp" alt="When using the IVFFlat index, errors may occur when searching for nearest neighbors to a vector located at the edge of two regions in the vector space. In this scenario, by considering only one cell during the search, only the orange region is examined. Despite the query vector being close to a datapoint in the blue region, these vectors will not be compared." width="800" height="673"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;When using the IVFFlat index, errors may occur when searching for nearest neighbors to a vector located at the edge of two regions in the vector space. In this scenario, by considering only one cell during the search, only the orange region is examined. Despite the query vector being close to a datapoint in the blue region, these vectors will not be compared.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Although the IVFFlat algorithm accelerates the search process and provides good search quality, it can lead to errors. For example, the above image illustrates the scenario where the query vector resides at the edge of two cells. Despite the query vector being close to a datapoint in the blue region, this vector will not be considered a nearest neighbor candidate since the search scope is limited to the orange region.&lt;/p&gt;

&lt;p&gt;To address this issue and enhance search quality, we can expand the search scope by selecting several regions to search for nearest neighbor candidates. However, this approach comes with a trade-off: it increases search time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create an IVFFlat index
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The code for creating an IVFFlat index and inserting data into a PostgreSQL table can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/data_upload/upload_data_to_postgresql_ivfflat.py"&gt;data_upload/upload_data_to_postgresql_ivfflat.py&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To create an IVFFlat index using pgvector, two parameters need to be specified:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Distance&lt;/strong&gt;: The pgvector extension provides 3 methods for calculating the distance between vectors: Euclidean (L2), inner product, and cosine. These methods are identified by &lt;code&gt;vector_l2_ops&lt;/code&gt;, &lt;code&gt;vector_ip_ops&lt;/code&gt;, and &lt;code&gt;vector_cosine_ops&lt;/code&gt;, respectively. It is essential to select the same distance metric for both the creation and querying of the index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Number of clusters&lt;/strong&gt;: The &lt;code&gt;lists&lt;/code&gt; parameter specifies the number of clusters that will be created. Pgvector suggests that an appropriate number of &lt;code&gt;lists&lt;/code&gt; is &lt;code&gt;rows/1000&lt;/code&gt; for datasets with up to 1 million rows and &lt;code&gt;sqrt(rows)&lt;/code&gt; for larger datasets. It is also advisable to create at least 10 clusters.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To create an IVFFlat index in a PostgreSQL table, you can use the following statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;vector_column_name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;distance_method&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;number_of_clusters&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is important to note that the index should be created once the table is populated with data.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If new vectors are added to the table, the index should be rebuilt to update the cluster centroids and accurately represent the new dataset.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Detect similar images using the pgvector IVFFlat index
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The code for image similarity search with the pgvector extension can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/vector_search_samples/image_search_ivfflat_index.ipynb"&gt;vector_search_samples/image_search_ivfflat_index.ipynb&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To search for similar images through the IVFFlat index of the pgvector extension, we can use SQL &lt;code&gt;SELECT&lt;/code&gt; statements and the built-in distance operators. The structure of a &lt;code&gt;SELECT&lt;/code&gt; statement was explained in the &lt;a href="https://dev.to/sfoteini/use-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql-2c30"&gt;Exact Nearest Neighbor Search&lt;/a&gt; blog post. For approximate nearest neighbor search, some additional parameters need to be considered to use the IVFFlat index.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approximate nearest neighbor search using the IVFFlat index
&lt;/h3&gt;

&lt;p&gt;The number of regions to consider during search is determined by the &lt;code&gt;probes&lt;/code&gt; parameter. According to pgvector documentation, the recommended value for the &lt;code&gt;probes&lt;/code&gt; parameter is &lt;code&gt;sqrt(lists)&lt;/code&gt;. The default value is &lt;code&gt;1&lt;/code&gt;. To specify the value of the &lt;code&gt;probes&lt;/code&gt; parameter, you can execute the following statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;probes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;number_of_probes&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To specify the number of probes to use for a single query, you should use the &lt;code&gt;LOCAL&lt;/code&gt; keyword.&lt;/p&gt;

&lt;p&gt;It's important to note that PostgreSQL does not guarantee the use of an approximate index, as it may determine that a sequential scan could be more efficient for a query. To check whether PostgreSQL utilizes the index in a query, you can prefix the &lt;code&gt;SELECT&lt;/code&gt; statement with the &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; keywords.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;image_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;paintings&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.001363333, …, -0.0010466448]'&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An example of a query plan that utilizes the IVFFlat index is provided below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Limit  (cost=1537.64..1539.15 rows=12 width=72) (actual time=1.803..1.984 rows=12 loops=1)
  -&amp;gt;  Index Scan using paintings_ivfflat_vector_idx on paintings (cost=1537.64..2951.85 rows=11206 width=72) (actual time=1.801..1.981 rows=12 loops=1)
        Order By: (vector &amp;lt;=&amp;gt; '[0.001363333, ..., -0.0010466448]'::vector)
Planning Time: 0.135 ms
Execution Time: 2.014 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, you may need to rewrite (or simplify) your queries in order to use an approximate index. For example, since pgvector only supports ascending-order index scans, the following query does not utilize the IVFFlat index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;image_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.003, …, 0.034]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_similarity&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;paintings&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cosine_similarity&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we want to utilize the index, one possible way to rewrite the &lt;code&gt;SELECT&lt;/code&gt; statement is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;image_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.003, …, 0.034]'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;paintings&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Code sample: Image similarity search with IVFFlat index
&lt;/h3&gt;

&lt;p&gt;In the &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/vector_search_samples/image_search_ivfflat_index.ipynb"&gt;Jupyter Notebook&lt;/a&gt; provided on my GitHub repository, you'll explore text-to-image and image-to-image search scenarios. You will use the same text prompts and reference images as in the Exact Nearest Neighbors search example, allowing for a comparison of the accuracy of the results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3-IkoGS---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://sfoteini.github.io/images/post/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search/ivfflat-search-results_hu95792e5d77c608b77f6cec1e54f3d8b6_1015819_833x849_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3-IkoGS---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://sfoteini.github.io/images/post/use-ivfflat-index-on-azure-cosmos-db-for-postgresql-for-similarity-search/ivfflat-search-results_hu95792e5d77c608b77f6cec1e54f3d8b6_1015819_833x849_fit_q95_h2_lanczos_3.webp" alt='Images retrieved by searching for paintings using the painting "Still Life with Flowers" by Charles Ginner as a reference. The IVFFlat index with probes=1 retrieved 9 out of the 12 paintings obtained with exact search.' width="800" height="815"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Images retrieved by searching for paintings using the painting "Still Life with Flowers" by Charles Ginner as a reference. The IVFFlat index with probes=1 retrieved 9 out of the 12 paintings obtained with exact search.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Feel free to experiment with the notebook and modify the code to gain hands-on experience with the pgvector extension!&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;In this post, you used the IVFFlat indexing algorithm of the pgvector extension to search for paintings that closely match a reference image or a text prompt. In the upcoming post, we will explore the workings of the HNSW index and use it for similarity searches.&lt;/p&gt;

&lt;p&gt;If you want to explore pgvector's features, check out these learning resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cosmos-db/postgresql/howto-optimize-performance-pgvector?WT.mc_id=AI-MVP-5004971"&gt;How to optimize performance when using pgvector on Azure Cosmos DB for PostgreSQL – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector"&gt;Official GitHub repository of the pgvector extension&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/playlist?list=PLIUOU7oqGTLhlWpTz4NnuT3FekouIVlqc"&gt;Vector Similarity Search and Faiss Course&lt;/a&gt; by James Briggs&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineer and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/"&gt;Blog &lt;/a&gt;| &lt;a href="https://github.com/sfoteini"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>postgres</category>
      <category>vectordatabase</category>
      <category>cosmosdb</category>
    </item>
    <item>
      <title>Use pgvector for searching images on Azure Cosmos DB for PostgreSQL</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Wed, 07 Feb 2024 18:28:48 +0000</pubDate>
      <link>https://forem.com/sfoteini/use-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql-2c30</link>
      <guid>https://forem.com/sfoteini/use-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql-2c30</guid>
      <description>&lt;p&gt;Welcome to the next part of the "Image similarity search with pgvector" learning series!&lt;/p&gt;

&lt;p&gt;In the previous articles, you used the multi-modal embeddings APIs of Azure AI Vision for generating embeddings for a collection of images of paintings and stored the embeddings in an Azure Cosmos DB for PostgreSQL table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;If you have followed the previous posts, you should have successfully created a table in your Azure Cosmos DB for PostgreSQL cluster, populated it with data, and uploaded the images to a container in your Azure Storage account. Now, you are fully prepared to search for similar images utilizing the vector similarity search features of the pgvector extension.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Develop an image similarity search app using Jupyter Notebook.&lt;/li&gt;
&lt;li&gt;Write SQL queries to detect similar images based on a text prompt or a reference image.&lt;/li&gt;
&lt;li&gt;Apply a simple metadata filtering method to narrow down search results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Azure subscription - Create an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure free account&lt;/a&gt; or an &lt;a href="https://azure.microsoft.com/free/students/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure for Students account&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Python 3.10, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Set-up your working environment
&lt;/h3&gt;

&lt;p&gt;In this guide, you'll learn how to query embeddings stored in an Azure Cosmos DB for PostgreSQL table to search for images similar to a search term or a reference image. The entire functional project is available in the &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. If you're keen on trying it out, just fork the repository and clone it to have it locally available.&lt;/p&gt;

&lt;p&gt;Before running the Jupyter Notebook covered in this post, you should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;a href="https://docs.python.org/3/library/venv.html" rel="noopener noreferrer"&gt;virtual environment&lt;/a&gt; and activate it.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the required Python packages using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create vector embeddings for a collection of images by running the scripts found in the &lt;em&gt;data_processing&lt;/em&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upload the images to your Azure Blob Storage container, create a PostgreSQL table, and populate it with data by executing the scripts found in the &lt;em&gt;data_upload&lt;/em&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Detect similar images using the pgvector extension
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The code for image similarity search with the pgvector extension can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/vector_search_samples/image_search.ipynb" rel="noopener noreferrer"&gt;vector_search_samples/image_search.ipynb&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The image similarity search workflow that we will follow is summarized as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use the Azure AI Vision Vectorize Image API or the Vectorize Text API to generate the vector embedding of a reference image or text prompt, respectively. It is crucial to employ the same embedding model for queries as the one used to generate embeddings for the images in the dataset.&lt;/li&gt;
&lt;li&gt;To calculate similarity and retrieve images, utilize SQL &lt;code&gt;SELECT&lt;/code&gt; statements and the built-in vector operators of the PostgreSQL database. Specifically, cosine similarity will be used as the similarity metric.&lt;/li&gt;
&lt;li&gt;The similarity search will produce a list of vectors that are most similar to the query vector. The raw data associated with each vector can then be accessed.&lt;/li&gt;
&lt;li&gt;Download the images from the Azure Blob Storage container and display them using the matplotlib package.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This workflow is illustrated in the following diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql%2Fvector-search-flow_hu9a9107796f4f3c86b6c2b16f829064d6_98384_1590x300_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql%2Fvector-search-flow_hu9a9107796f4f3c86b6c2b16f829064d6_98384_1590x300_fit_q95_h2_lanczos_3.webp" alt="Image similarity search workflow."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Nearest neighbor search using pgvector
&lt;/h3&gt;

&lt;p&gt;Given the vector embedding of the query, we can use SQL &lt;code&gt;SELECT&lt;/code&gt; statements to search for similar images. Let’s understand how a simple &lt;code&gt;SELECT&lt;/code&gt; statement works. Consider the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.003, …, 0.034]'&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query computes the cosine distance (&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt;) between the given vector (&lt;code&gt;[0.003, …, 0.034]&lt;/code&gt;) and the vectors stored in the table, sorts the results by the calculated distance, and returns the five most similar images (&lt;code&gt;LIMIT 5&lt;/code&gt;). Additionally, you can obtain the cosine similarity between the query vector and the retrieved vectors by modifying the &lt;code&gt;SELECT&lt;/code&gt; statement as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;image_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.003, …, 0.034]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_similarity&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cosine_similarity&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pgvector extension provides 3 operators that can be used to calculate similarity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operator&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Euclidean distance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;#&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Negative inner product&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cosine distance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Code sample: Image similarity search
&lt;/h3&gt;

&lt;p&gt;In the &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/vector_search_samples/image_search.ipynb" rel="noopener noreferrer"&gt;Jupyter Notebook&lt;/a&gt; provided on my GitHub repository, you'll explore the following scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Text-to-image search&lt;/strong&gt;: You will use a text prompt to search for and identify paintings that are semantically similar, relying solely on the vector embeddings without utilizing image metadata, such as the title or description of the painting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image-to-image search&lt;/strong&gt;: You will use a painting as a reference to search for similar ones by comparing the vector embedding of the reference image with those in the collection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata filtering&lt;/strong&gt;: Filtering enables users to narrow down search results, such as searching for paintings by a specific artist. However, implementing accurate and fast metadata filtering in vector search systems is a challenging task. You can read the article &lt;a href="https://www.pinecone.io/learn/vector-search-filtering/" rel="noopener noreferrer"&gt;The Missing WHERE Clause in Vector Search&lt;/a&gt; on the Pinecone blog to learn about the two fundamental approaches for metadata filtering and understand the complexities involved in implementing such filters into vector search applications.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql%2Fsearch-results-example_hu12f86062fccea6ea61227c970dbf49e4_1035308_837x846_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-pgvector-for-searching-images-on-azure-cosmos-db-for-postgresql%2Fsearch-results-example_hu12f86062fccea6ea61227c970dbf49e4_1035308_837x846_fit_q95_h2_lanczos_3.webp" alt="Twelve paintings depicting flowers by Vincent van Gogh. These artworks were retrieved through a search using the text prompt 'flowers by Vincent van Gogh'."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Images retrieved by searching for paintings using the text prompt "flowers by Vincent van Gogh".&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Feel free to experiment with the notebook and modify the code to gain hands-on experience with the pgvector extension!&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;In this post, you explored the basic vector similarity search features offered by the pgvector extension. This type of vector search is referred to as exact nearest neighbor search, as it computes the similarity between the query vector and every vector in the database. In the upcoming post, you will explore approximate nearest neighbor search, which trades off result quality for speed.&lt;/p&gt;

&lt;p&gt;If you want to explore pgvector's features, check out these learning resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cosmos-db/postgresql/howto-use-pgvector?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;How to use pgvector on Azure Cosmos DB for PostgreSQL – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;Official GitHub repository of the pgvector extension&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineer and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/" rel="noopener noreferrer"&gt;Blog &lt;/a&gt;| &lt;a href="https://github.com/sfoteini" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>cosmosdb</category>
      <category>vectordatabase</category>
      <category>ai</category>
    </item>
    <item>
      <title>Store embeddings in Azure Cosmos DB for PostgreSQL with pgvector</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Mon, 29 Jan 2024 19:44:19 +0000</pubDate>
      <link>https://forem.com/sfoteini/store-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector-2b1c</link>
      <guid>https://forem.com/sfoteini/store-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector-2b1c</guid>
      <description>&lt;p&gt;Welcome to the third part of the "Image similarity search with pgvector" learning series!&lt;/p&gt;

&lt;p&gt;In the previous articles, you learned how to describe vector embeddings and vector similarity search. You also used the multi-modal embeddings APIs of Azure AI Vision for generating embeddings for a collection of images of paintings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this learning series, we will create a search system that lets users provide a text description or a reference image to find similar paintings. We have already generated vector embeddings for the images in our dataset using the multi-modal embeddings API of Azure AI Vision. In this post, we will use Azure Blob Storage to store the images and Azure Cosmos DB for PostgreSQL to store our vector embeddings using the pgvector extension. In the next tutorials, we will perform a similarity search on our embeddings. &lt;/p&gt;

&lt;p&gt;The workflow is illustrated in the following image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fstore-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector%2Fvector-search-flow_hud6a75ebcd189abb7fe953c451f5a6508_85062_1270x425_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fstore-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector%2Fvector-search-flow_hud6a75ebcd189abb7fe953c451f5a6508_85062_1270x425_fit_q95_h2_lanczos_3.webp" alt="Image similarity search workflow."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this tutorial, you will learn how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload images to an Azure Blob Storage container using the Python SDK.&lt;/li&gt;
&lt;li&gt;Activate the pgvector extension on Azure Cosmos DB for PostgreSQL.&lt;/li&gt;
&lt;li&gt;Store vector embeddings on an Azure Cosmos DB for PostgreSQL table.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Azure subscription - Create an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure free account&lt;/a&gt; or an &lt;a href="https://azure.microsoft.com/free/students/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure for Students account&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Python 3.10, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Set-up your working environment
&lt;/h3&gt;

&lt;p&gt;In this guide, you'll learn how to upload a collection of paintings' images to an Azure Blob Storage container and insert vector embeddings into an Azure Cosmos DB for PostgreSQL table. The entire functional project is available in the &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. If you're keen on trying it out, just fork the repository and clone it to have it locally available.&lt;/p&gt;

&lt;p&gt;Before running the scripts covered in this post, you should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;a href="https://docs.python.org/3/library/venv.html" rel="noopener noreferrer"&gt;virtual environment&lt;/a&gt; and activate it.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the required Python packages using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create vector embeddings for a collection of images by running the scripts found in the &lt;em&gt;data_processing&lt;/em&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Upload images to Azure Blob Storage
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The code for uploading images to an Azure Blob Storage container can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/data_upload/upload_images_to_blob.py" rel="noopener noreferrer"&gt;data_upload/upload_images_to_blob.py&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Azure Blob Storage is a cloud storage service that is optimized for storing large amounts of unstructured data, such as images. It offers three types of resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The storage account that contains all your Azure Storage data objects. Every object that is stored in Azure Storage is identified by a unique address.&lt;/li&gt;
&lt;li&gt;Containers in the storage account which are similar to directories in a file system.&lt;/li&gt;
&lt;li&gt;Blobs that are organized in the containers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following diagram illustrates the relationship between these resources:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fmedia%2Fstorage-blobs-introduction%2Fblob1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fmedia%2Fstorage-blobs-introduction%2Fblob1.png" alt="Azure Blob Storage resources: storage account, container, and blobs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Azure Blob Storage resources: storage account, container, and blobs. &lt;a href="https://learn.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-python?tabs=connection-string%2Croles-azure-cli%2Csign-in-azure-cli&amp;amp;WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Image source: Azure Blob Storage Object model – Microsoft Docs&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create an Azure Storage account
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open the Azure CLI.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create an Azure Storage Account using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az storage account create &lt;span class="nt"&gt;--name&lt;/span&gt; account-name &lt;span class="nt"&gt;--resource-group&lt;/span&gt; your-group-name &lt;span class="nt"&gt;--location&lt;/span&gt; your-location &lt;span class="nt"&gt;--sku&lt;/span&gt; Standard_LRS &lt;span class="nt"&gt;--allow-blob-public-access&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="nt"&gt;--min-tls-version&lt;/span&gt; TLS1_2
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Upload images to an Azure Blob Storage container
&lt;/h3&gt;

&lt;p&gt;The Azure Blob Storage client library for Python provides the following classes to manage blobs and containers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;BlobServiceClient&lt;/code&gt;: We will use the &lt;code&gt;BlobServiceClient&lt;/code&gt; class to interact with the Azure Storage account and create a container.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ContainerClient&lt;/code&gt;: We will use the &lt;code&gt;ContainerClient&lt;/code&gt; class to interact with our container and the blobs inside the container.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BlobClient&lt;/code&gt;: We will use the &lt;code&gt;BlobClient&lt;/code&gt; class to upload a blob to our container.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The process of uploading our images to Azure Blob Storage can be summarized as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new container to store our images.&lt;/li&gt;
&lt;li&gt;Retrieve the filenames of the images in the dataset.&lt;/li&gt;
&lt;li&gt;Upload the images in the container, utilizing multiple threads via the &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; class. Additionally, use the &lt;code&gt;tqdm&lt;/code&gt; library to display progress bars for better visualizing the image uploading process.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.storage.blob&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BlobServiceClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ContainerClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ContentSettings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.core.exceptions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ResourceExistsError&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;

&lt;span class="c1"&gt;# Constants
&lt;/span&gt;&lt;span class="n"&gt;MAX_WORKERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="n"&gt;IMAGE_FILE_CSV_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Directories
&lt;/span&gt;&lt;span class="n"&gt;current_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;realpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;parent_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load environemt file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;override&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Azure Blob Storage credentials
&lt;/span&gt;&lt;span class="n"&gt;blob_account_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOB_ACCOUNT_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;blob_account_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOB_ACCOUNT_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;blob_endpoint_suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOB_ENDPOINT_SUFFIX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;blob_connection_string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DefaultEndpointsProtocol=https;AccountName=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;blob_account_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AccountKey=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;blob_account_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;;EndpointSuffix=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;blob_endpoint_suffix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;container_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONTAINER_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Dataset's folder
&lt;/span&gt;&lt;span class="n"&gt;dataset_folder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dataset_filepath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset_embeddings.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Images' folder
&lt;/span&gt;&lt;span class="n"&gt;images_folder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semart_dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Content-Type for blobs
&lt;/span&gt;&lt;span class="n"&gt;content_settings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ContentSettings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Create Azure Blob Storage client
&lt;/span&gt;    &lt;span class="n"&gt;blob_service_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BlobServiceClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_connection_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn_str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;blob_connection_string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create a new container
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;container_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blob_service_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_container&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;public_access&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ResourceExistsError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A container with name &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;container_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; already exists.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Find the URLs of the images in the dataset
&lt;/span&gt;    &lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_image_filenames&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Number of images in the dataset: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Uploading images to container &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;container_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Upload images to blob storage
&lt;/span&gt;    &lt;span class="nf"&gt;upload_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_image_filenames&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;csv_reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DictReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delimiter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skipinitialspace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;image_filenames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IMAGE_FILE_CSV_COLUMN_NAME&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;csv_reader&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image_filenames&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_blob_from_local_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image_filepath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;container_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ContainerClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;blob_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;blob_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;container_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_blob_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;blob_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;blob_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overwrite&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_settings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content_settings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Couldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t upload image &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;blob_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; to Azure Storage Account due to error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;container_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ContainerClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MAX_WORKERS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;upload_blob_from_local_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;image_filepath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images_folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;container_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Store embeddings in Azure Cosmos DB for PostgreSQL
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The code for inserting vector embeddings into an Azure Cosmos DB for PostgreSQL table can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/data_upload/upload_data_to_postgresql.py" rel="noopener noreferrer"&gt;data_upload/upload_data_to_postgresql.py&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Create an Azure Cosmos DB for PostgreSQL cluster
&lt;/h3&gt;

&lt;p&gt;Let's use the Azure portal to create an Azure Cosmos DB for PostgreSQL cluster.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for &lt;em&gt;“Azure Cosmos DB for PostgreSQL”&lt;/em&gt; and then select &lt;strong&gt;+Create&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Fill out the information on the &lt;strong&gt;Basics&lt;/strong&gt; tab:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Select your subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource group&lt;/strong&gt;: Select your resource group.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster name&lt;/strong&gt;: Enter a name for your Azure Cosmos DB for PostgreSQL cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Location&lt;/strong&gt;: Choose your preferred region.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale&lt;/strong&gt;: You can leave &lt;strong&gt;Scale&lt;/strong&gt; as its default value or select the optimal number of nodes as well as compute, memory, and storage configuration. &lt;strong&gt;Burstable, 1 vCores / 2 GiB RAM, 32 GiB storage&lt;/strong&gt; is sufficient for this demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL version&lt;/strong&gt;: Choose a PostgreSQL version such as 15.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database name&lt;/strong&gt;: You can leave database name at its default value &lt;em&gt;citus&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Administrator account&lt;/strong&gt;: The admin username must be &lt;em&gt;citus&lt;/em&gt;. Select a password that will be used for &lt;em&gt;citus&lt;/em&gt; role to connect to the database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fstore-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector%2Fpostgresql-creation_hu76b4eb24219e8c7e11bd7191e9097da5_45162_698x574_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fstore-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector%2Fpostgresql-creation_hu76b4eb24219e8c7e11bd7191e9097da5_45162_698x574_fit_q95_h2_lanczos_3.webp" alt="Information for creating an Azure Cosmos DB for PostgreSQL cluster."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On the &lt;strong&gt;Networking&lt;/strong&gt; tab, select &lt;strong&gt;Allow public access from Azure services and resources within Azure to this cluster&lt;/strong&gt; and create your preferred firewall rule.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fstore-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector%2Fpostgresql-networking_hu5d6833bb9be42544cf14c495cb7ed0a3_47771_687x571_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fstore-embeddings-in-azure-cosmos-db-for-postgresql-with-pgvector%2Fpostgresql-networking_hu5d6833bb9be42544cf14c495cb7ed0a3_47771_687x571_fit_q95_h2_lanczos_3.webp" alt="Networking settings for creating an Azure Cosmos DB for PostgreSQL cluster."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Review + Create&lt;/strong&gt;. Once the deployment is complete, navigate to your resource.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Activate the pgvector extension
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt; extension adds vector similarity search capabilities to your PostgreSQL database. To use the extension, you have to first create it in your database. You can install the extension, by connecting to your database and running the &lt;code&gt;CREATE EXTENSION&lt;/code&gt; command from the &lt;code&gt;psql&lt;/code&gt; command prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;SELECT CREATE_EXTENSION&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'vector'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pgvector extension introduces a data type called &lt;code&gt;VECTOR&lt;/code&gt; that can be used during the creation of a table to indicate that a column will hold vector embeddings. When creating the column, it's essential to specify the dimension of the vectors. In our scenario, Azure AI Vision generates 1024-dimensional vectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Insert data into a PostgreSQL table
&lt;/h3&gt;

&lt;p&gt;To insert data into an Azure Cosmos DB for PostgreSQL table, we will proceed as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a table to store the filenames of the images, their embeddings, and their associated metadata. All information is saved in a CSV file, as presented in &lt;a href="https://sfoteini.github.io/blog/generate-embeddings-with-azure-ai-vision-multi-modal-embeddings-api" rel="noopener noreferrer"&gt;Part 2&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Insert the data from the CSV file into the table using the PostgreSQL &lt;code&gt;COPY&lt;/code&gt; command.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="c1"&gt;# Constants
&lt;/span&gt;&lt;span class="n"&gt;IMAGE_FILE_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;DESCRIPTION_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;AUTHOR_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;TITLE_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;TECHNIQUE_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technique&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;TYPE_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;TIMEFRAME_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeframe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;VECTOR_COLUMN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Directories
&lt;/span&gt;&lt;span class="n"&gt;current_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;realpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;parent_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load environemt file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;override&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Azure CosmosDB for PostgreSQL credentials
&lt;/span&gt;&lt;span class="n"&gt;postgres_host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_HOST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;postgres_database_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_DB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;postgres_user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;postgres_password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sslmode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;require&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_TABLE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;postgres_connection_string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;postgres_host&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; user=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;postgres_user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbname=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;postgres_database_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;postgres_password&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; sslmode=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sslmode&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Dataset's folder
&lt;/span&gt;&lt;span class="n"&gt;dataset_folder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dataset_filepath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset_embeddings.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;postgresql_pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SimpleConnectionPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postgres_connection_string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postgresql_pool&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connection pool created successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get a connection from the connection pool
&lt;/span&gt;    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;postgresql_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getconn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Creating a table...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DROP TABLE IF EXISTS &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE TABLE &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;IMAGE_FILE_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; TEXT PRIMARY KEY,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DESCRIPTION_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; TEXT NOT NULL,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;AUTHOR_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; TEXT NOT NULL,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;TITLE_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; TEXT NOT NULL,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;TECHNIQUE_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; TEXT,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;TYPE_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; TEXT,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;TIMEFRAME_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; TEXT,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;VECTOR_COLUMN_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; VECTOR(1024) NOT NULL);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Saving data to table...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy_expert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COPY &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; FROM STDIN WITH &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(FORMAT csv, DELIMITER &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, HEADER MATCH);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;csv_file&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Fetch all rows from table
&lt;/span&gt;    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Number of records in the table: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Close the connection
&lt;/span&gt;    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;In this post, you uploaded the paintings’ images into an Azure Blob Storage container, configured the Azure Cosmos DB for PostgreSQL database as a vector database using the pgvector extension, and inserted the data into a table. In the subsequent posts, you will leverage the pgvector extension to perform a similarity search on the embeddings.&lt;/p&gt;

&lt;p&gt;If you want to explore pgvector's features, check out these learning resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cosmos-db/postgresql/howto-use-pgvector?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;How to use pgvector on Azure Cosmos DB for PostgreSQL – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;Official GitHub repository of the pgvector extension&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineer and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/" rel="noopener noreferrer"&gt;Blog &lt;/a&gt;| &lt;a href="https://github.com/sfoteini" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>postgres</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>Generate embeddings with Azure AI Vision multi-modal embeddings API</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Mon, 22 Jan 2024 18:11:06 +0000</pubDate>
      <link>https://forem.com/sfoteini/generate-embeddings-with-azure-ai-vision-multi-modal-embeddings-api-3bp9</link>
      <guid>https://forem.com/sfoteini/generate-embeddings-with-azure-ai-vision-multi-modal-embeddings-api-3bp9</guid>
      <description>&lt;p&gt;Welcome to the second part of the “Image similarity search with pgvector” learning series! In the previous article, you learned how to describe vector embeddings and vector similarity search. You also used the multi-modal embeddings APIs of Azure AI Vision for generating embeddings for images and text and calculated the cosine similarity between two vectors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this learning series, we will create an application that enables users to search for paintings based on either a reference image or a text description. We will use the &lt;a href="https://researchdata.aston.ac.uk/id/eprint/380/"&gt;SemArt Dataset&lt;/a&gt;, which contains approximately 21k paintings gathered from the Web Gallery of Art. Each painting comes with various attributes, like a title, description, and the name of the artist.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prepare the data for further processing.&lt;/li&gt;
&lt;li&gt;Generate vector embeddings for a collection of images of paintings using the Vectorize Image API of Azure AI Vision.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Azure subscription - Create an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971"&gt;Azure free account&lt;/a&gt; or an &lt;a href="https://azure.microsoft.com/free/students/?WT.mc_id=AI-MVP-5004971"&gt;Azure for Students account&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;An Azure AI Vision resource - For instructions on creating an Azure AI Vision resource, see &lt;a href="https://dev.to/sfoteini/use-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval-3p11"&gt;Part 1&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Python 3.10, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Set-up your working environment
&lt;/h3&gt;

&lt;p&gt;In this article, you will find instructions on how to generate embeddings for a collection of images using Azure AI Vision. The complete working project can be found in the &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql"&gt;GitHub repository&lt;/a&gt;. If you want to follow along, you can fork the repository and clone it to have it locally available.&lt;/p&gt;

&lt;p&gt;Before running the scripts, you should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download the &lt;a href="https://researchdata.aston.ac.uk/id/eprint/380/"&gt;SemArt Dataset&lt;/a&gt; into the &lt;em&gt;semart_dataset&lt;/em&gt; directory.&lt;/li&gt;
&lt;li&gt;Create a &lt;a href="https://docs.python.org/3/library/venv.html"&gt;virtual environment&lt;/a&gt; and activate it.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the required Python packages using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Data preprocessing
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The code for data preprocessing can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/data_processing/data_preprocessing.ipynb"&gt;data_processing/data_preprocessing.ipynb&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For our application, we'll be working with a subset of the original dataset. Alongside the image files, we aim to retain associated metadata like the title, author's name, and description for each painting. To prepare the data for further processing and eliminate unnecessary information, we will take several steps as outlined in the Jupyter Notebook available on my GitHub repository:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clean up the text descriptions by removing special characters to minimize errors related to character encoding.&lt;/li&gt;
&lt;li&gt;Clean up the names of the artists, addressing encoding issues for some artists' names.&lt;/li&gt;
&lt;li&gt;Exclude artists with fewer than 15 paintings from the dataset, along with other data we won't be using.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After these steps, the final dataset will comprise 11,206 images of paintings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create vector embeddings with Azure AI Vision
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The code for vector embeddings generation can be found at &lt;a href="https://github.com/sfoteini/vector-search-azure-cosmos-db-postgresql/blob/main/data_processing/generate_embeddings.py"&gt;data_processing/generate_embeddings.py&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To generate embeddings for the images, our process can be summarized as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve the filenames of the images in the dataset.&lt;/li&gt;
&lt;li&gt;Divide the data into batches, and for each batch, perform the following steps:

&lt;ol&gt;
&lt;li&gt;Compute the vector embedding for each image in the batch using the Vectorize Image API of Azure AI Vision.&lt;/li&gt;
&lt;li&gt;Save the vector embeddings of the images along with the filenames into a file.&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;
&lt;li&gt;Update the dataset by inserting the vector embedding of each image.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the following sections, we will discuss specific segments of the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compute embeddings for the images in the dataset
&lt;/h3&gt;

&lt;p&gt;As discussed in &lt;a href="https://dev.to/sfoteini/use-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval-3p11"&gt;Part 1&lt;/a&gt;, computing the vector embedding of an image involves sending a POST request to the Azure AI Vision &lt;code&gt;retrieval:vectorizeImage&lt;/code&gt; API. The binary image data (or a publicly available image URL) is included in the request body, and the response consists of a JSON object containing the vector embedding of the image. In Python, this can be achieved by utilizing the &lt;code&gt;requests&lt;/code&gt; library to send a POST request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_image_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Generates a vector embedding for an image using Azure AI Vision 4.0
    (Vectorize Image API).

    :param image: The image filepath.
    :return: The vector embedding of the image.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/octet-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ocp-Apim-Subscription-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;vision_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorize_img_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;image_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image_vector&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred while processing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error code: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred while processing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;compute_embeddings&lt;/code&gt; function computes the vector embeddings for all the images in our dataset. It uses the &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; object to generate vector embeddings for each batch of images efficiently, utilizing multiple threads. The &lt;code&gt;tqdm&lt;/code&gt; library is also utilized in order to provide progress bars for better visualizing the embeddings generation process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_names&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Computes vector embeddings for the provided images and saves the embeddings
    alongside their corresponding image filenames in a CSV file.

    :param image_names: A list containing the filenames of the images.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;image_names_batches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;image_names&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_names&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_names_batches&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Computing embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_names_batches&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MAX_WORKERS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;get_image_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                            &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images_folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processing batch &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;leave&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;save_data_to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the embeddings for all the images in a batch are computed, the data is saved into a CSV file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_data_to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Appends a list of image filenames and their associated embeddings to
    a CSV file.

    :param data: The data to be appended to the CSV file.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings_filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure AI Vision API rate limits
&lt;/h3&gt;

&lt;p&gt;Azure AI Vision API imposes rate limits on its usage. In the free tier, only 20 transactions per minute are allowed, while the standard tier allows up to 30 transactions per second, depending on the operation (&lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/faq?WT.mc_id=AI-MVP-5004971"&gt;Source: Microsoft Docs&lt;/a&gt;). If you exceed the default rate limit, you'll receive a &lt;code&gt;429&lt;/code&gt; HTTP error code.&lt;/p&gt;

&lt;p&gt;For our application, it is recommended to use the standard tier during the embeddings generation process and limit the number of requests per second to approximately 10 to avoid potential issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generate the dataset
&lt;/h3&gt;

&lt;p&gt;After computing the vector embeddings for all images in the dataset, we proceed to update our dataset by inserting the vector embedding for each image. In the &lt;code&gt;generate_dataset&lt;/code&gt; function, the &lt;code&gt;merge&lt;/code&gt; method of &lt;code&gt;pandas.DataFrame&lt;/code&gt; is used for merging the dataset with a database-style join.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_dataset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Appends the corresponding vectors to each column of the original dataset
    and saves the updated dataset as a CSV file.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;dataset_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;embeddings_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;embeddings_filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IMAGE_FILE_CSV_COLUMN_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EMBEDDINGS_CSV_COLUMN_NAME&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;final_dataset_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;embeddings_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;IMAGE_FILE_CSV_COLUMN_NAME&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;final_dataset_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_dataset_filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;In this post, we computed vector embeddings for a set of images featuring paintings using the Azure AI Vision Vectorize Image API. The code shared here serves as a reference, and you can customize it to suit your particular use case.&lt;/p&gt;

&lt;p&gt;Here are some additional learning resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cognitive-services/computer-vision/concept-image-retrieval?WT.mc_id=AI-MVP-5004971"&gt;Azure AI Vision Multi-modal embeddings - Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/how-to/image-retrieval?WT.mc_id=AI-MVP-5004971"&gt;Call the multi-modal embeddings APIs – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineer and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/"&gt;Blog &lt;/a&gt;| &lt;a href="https://github.com/sfoteini"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>embeddings</category>
    </item>
    <item>
      <title>Use the Azure AI Vision multi-modal embeddings API for image retrieval</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Mon, 22 Jan 2024 17:58:04 +0000</pubDate>
      <link>https://forem.com/sfoteini/use-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval-3p11</link>
      <guid>https://forem.com/sfoteini/use-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval-3p11</guid>
      <description>&lt;p&gt;Welcome to a new learning series about image similarity search with pgvector, an open-source vector similarity search extension for PostgreSQL databases.&lt;/p&gt;

&lt;p&gt;I find vector search an intriguing technology, and I’ve decided to explore it! Throughout this series, I will discuss the basic concepts of vector search, introduce you to the multi-modal embeddings API of Azure AI Vision, and guide you in building an image similarity search application using Azure Cosmos DB for PostgreSQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Conventional search systems rely on exact matches on properties like keywords, tags, or other metadata, lexical similarity, or the frequency of word occurrences to retrieve similar items. Recently, vector similarity search has transformed the search process. It leverages machine learning to capture the meaning of data, allowing you to find similar items based on their content. The key idea behind vector search involves converting unstructured data, such as text, images, videos, and audio, into high-dimensional vectors (also known as embeddings) and applying nearest neighbor algorithms to find similar data.&lt;/p&gt;

&lt;p&gt;In this tutorial, you learn how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Describe vector embeddings and vector similarity search.&lt;/li&gt;
&lt;li&gt;Use the multi-modal embeddings API of Azure AI Vision for generating vectors for images and text.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An Azure subscription - Create an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure free account&lt;/a&gt; or an &lt;a href="https://azure.microsoft.com/free/students/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure for Students account&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Python 3.x, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Concepts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vector embeddings
&lt;/h3&gt;

&lt;p&gt;Comparing unstructured data is challenging, in contrast to numerical and structured data, which can be easily compared by performing mathematical operations. What if we could convert unstructured data, such as text and images, into a numerical representation? We could then calculate their similarity using standard mathematical methods.&lt;/p&gt;

&lt;p&gt;These numerical representations are called vector embeddings. An embedding is a high-dimensional and dense vector that summarizes the information contained in the original data. Vector embeddings can be computed using machine learning algorithms that capture the meaning of the data, recognize patterns, and identify similarities between the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval%2Fword-embeddings_huced6e72325e780413be6b89abd730b85_16841_392x334_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval%2Fword-embeddings_huced6e72325e780413be6b89abd730b85_16841_392x334_fit_q95_h2_lanczos_3.webp" alt="Visualization of word embeddings in a 2-dimensional vector space. Words that are semantically similar are located close together, while dissimilar words are placed farther apart."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Visualization of word embeddings in a 2-dimensional vector space. Words that are semantically similar are located close together, while dissimilar words are placed farther apart.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector similarity
&lt;/h3&gt;

&lt;p&gt;The numerical distance between two embeddings, or equivalently, their proximity in the vector space, represents their similarity. Vector similarity is commonly calculated using distance metrics such as Euclidean distance, inner product, or cosine distance.&lt;/p&gt;

&lt;p&gt;Cosine is the similarity metric used by Azure AI Vision. This metric measures the angle between two vectors and is not affected by their magnitudes. Mathematically, cosine similarity is defined as the cosine of the angle between two vectors, which is equal to the dot product of the vectors divided by the product of their magnitudes.&lt;/p&gt;

&lt;p&gt;Vector similarity can be used in various industry applications, including recommender systems, fraud detection, text classification, and image recognition. For example, systems can use vector similarities between products to identify similar products and create recommendations based on a user's preferences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector similarity search
&lt;/h3&gt;

&lt;p&gt;A vector search system works by comparing the vector embedding of a user’s query with a set of pre-stored vector embeddings to find a list of vectors that are the most similar to the query vector. The diagram below illustrates this workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval%2Fvector_search_flow_hu7f1fcfedeafde237e8a715b3613b8c91_181621_1706x803_fit_q95_h2_lanczos_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fuse-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval%2Fvector_search_flow_hu7f1fcfedeafde237e8a715b3613b8c91_181621_1706x803_fit_q95_h2_lanczos_3.webp" alt="Overview of vector similarity search flow."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Overview of vector similarity search flow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Vector embeddings are usually stored in a vector database, which is a specialized type of database that is optimized for storing and querying vectors with a large number of dimensions. You will learn more about vector databases in one of the following posts in this series.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create vector embeddings with Azure AI Vision
&lt;/h2&gt;

&lt;p&gt;Azure AI Vision provides two APIs for vectorizing image and text queries: the Vectorize Image API and the Vectorize Text API. This vectorization converts images and text into coordinates in a 1024-dimensional vector space, enabling users to search a collection of images using text and/or images without the need for metadata, such as image tags, labels, or captions.&lt;/p&gt;

&lt;p&gt;Let's learn how the multi-modal embeddings APIs work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create an Azure AI Vision resource
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open the Azure CLI.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a resource group using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az group create &lt;span class="nt"&gt;--name&lt;/span&gt; your-group-name &lt;span class="nt"&gt;--location&lt;/span&gt; your-location
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create an Azure AI Vision in the resource group that you have created using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az cognitiveservices account create &lt;span class="nt"&gt;--name&lt;/span&gt; ai-vision-resource-name &lt;span class="nt"&gt;--resource-group&lt;/span&gt; your-group-name &lt;span class="nt"&gt;--kind&lt;/span&gt; ComputerVision &lt;span class="nt"&gt;--sku&lt;/span&gt; S1 &lt;span class="nt"&gt;--location&lt;/span&gt; your-location &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The multi-modal embeddings APIs are available in the following regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before using the multi-modal embeddings APIs, you need to store the key and the endpoint of your Azure AI Vision resource in an environment (&lt;em&gt;.env&lt;/em&gt;) file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use the Vectorize Image API
&lt;/h3&gt;

&lt;p&gt;Let's review the following example. Given the filename of an image, the &lt;code&gt;get_image_embedding&lt;/code&gt; function sends a POST API call to the &lt;code&gt;retrieval:vectorizeImage&lt;/code&gt; API. The binary image data is included in the HTTP request body. The API call returns a JSON object containing the vector embedding of the image.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VISION_ENDPOINT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;computervision/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VISION_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_image_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Vectorize Image API
&lt;/span&gt;    &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?api-version=2023-02-01-preview&amp;amp;modelVersion=latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;vectorize_img_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval:vectorizeImage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;

    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/octet-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ocp-Apim-Subscription-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorize_img_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;image_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image_vector&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred while processing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Error code: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred while processing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;image_filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images/image (1).jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;image_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_image_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Info&lt;/strong&gt;: To vectorize a remote image, you would put the URL of the image in the request body.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Use the Vectorize Text API
&lt;/h3&gt;

&lt;p&gt;Similarly to the example above, the &lt;code&gt;get_text_embedding&lt;/code&gt; function sends a POST API call to the &lt;code&gt;retrieval:vectorizeText&lt;/code&gt; API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_text_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Image retrieval API
&lt;/span&gt;    &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?api-version=2023-02-01-preview&amp;amp;modelVersion=latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;vectorize_txt_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval:vectorizeText&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;

    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Content-type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ocp-Apim-Subscription-Key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorize_txt_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;text_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_vector&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred while processing the prompt &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. Error code: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred while processing the prompt &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;text_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a blue house&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;text_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_text_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Calculate image similarity
&lt;/h3&gt;

&lt;p&gt;The following code calculates the cosine similarity between the vector of the image and the vector of the text prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dot&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;numpy.linalg&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;norm&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;In this article, you’ve learned the basics of vector search and explored the multi-modal embeddings API of Azure AI Vision. In the next post, you will create vector embeddings for a collection of paintings' images.&lt;/p&gt;

&lt;p&gt;Here are some helpful learning resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cognitive-services/computer-vision/concept-image-retrieval?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure AI Vision Multi-modal embeddings - Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/training/modules/improve-search-results-vector-search/2-vector-search?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;What is vector search? - Microsoft Learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/training/modules/improve-search-results-vector-search/4-understand-embedding?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Understand embedding - Microsoft Learn&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineer and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/" rel="noopener noreferrer"&gt;Blog &lt;/a&gt;| &lt;a href="https://github.com/sfoteini" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>embeddings</category>
    </item>
    <item>
      <title>Building a vector similarity search app with Azure AI Vision and PostgreSQL</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Sun, 05 Nov 2023 11:22:51 +0000</pubDate>
      <link>https://forem.com/sfoteini/building-a-vector-similarity-search-app-with-azure-ai-vision-and-postgresql-4fgf</link>
      <guid>https://forem.com/sfoteini/building-a-vector-similarity-search-app-with-azure-ai-vision-and-postgresql-4fgf</guid>
      <description>&lt;p&gt;Have you ever found yourself in a situation where you remember what happened in a movie but struggle to recall its name and can’t come up with the right words to search for it? Or, maybe you’ve had that moment when you have a picture of a product you wish to purchase but aren't sure how to find it online.&lt;/p&gt;

&lt;p&gt;Unlike traditional search systems that rely on mentions of keywords, tags, or other metadata, vector search leverages machine learning to capture the meaning of data, allowing you to search by what you mean. Consider the second scenario from the paragraph above as an example. In a vector search system, you can simply use the image of a product to find similar ones instead of struggling to come up with the proper search words. This method is also known as "reverse image search."&lt;/p&gt;

&lt;p&gt;Vector search works by transforming data, such as text, images, videos, and audio, into a numerical representation that is called vector embedding and applying nearest neighbor algorithms to find similar data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A few days ago, I had a wonderful time delivering a session about building a vector search system using Azure Cosmos DB for PostgreSQL and Azure AI Vision at the monthly Azure Cosmos DB Usergroup’s virtual show hosted by Jay Gordon and Microsoft Reactor. In this article, I will document the main points of my session and provide you with learning resources and code examples.&lt;/p&gt;

&lt;p&gt;You will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand the concept of vector embeddings and how a vector search similarity system works.&lt;/li&gt;
&lt;li&gt;Use the Azure AI Vision Image Retrieval APIs for converting images and text into a vector representation.&lt;/li&gt;
&lt;li&gt;Build a vector search system with Azure Cosmos DB for PostgreSQL and Azure AI Vision.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before you start, follow these steps to set up your workspace:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sign up for either an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971"&gt;Azure free account&lt;/a&gt; or an &lt;a href="https://azure.microsoft.com/free/students/?WT.mc_id=AI-MVP-5004971"&gt;Azure for Students account&lt;/a&gt;. If you already have an active subscription, you can use it.&lt;/li&gt;
&lt;li&gt;Install Python 3.x, Visual Studio Code, Jupyter Notebook and Jupyter Extension for Visual Studio Code.&lt;/li&gt;
&lt;li&gt;Create a Cognitive Services resource in the Azure portal.&lt;/li&gt;
&lt;li&gt;Create an &lt;a href="https://learn.microsoft.com/azure/cosmos-db/postgresql/quickstart-create-portal?tabs=portal-search&amp;amp;WT.mc_id=AI-MVP-5004971"&gt;Azure Cosmos DB for PostgreSQL cluster&lt;/a&gt; and an &lt;a href="https://learn.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal?WT.mc_id=AI-MVP-5004971"&gt;Azure Blob Storage container&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What are vector embeddings?
&lt;/h2&gt;

&lt;p&gt;In simple terms, vector embeddings are numerical representations of data, such as images, text, videos, and audio. These vectors are high-dimensional dense vectors, with each dimension containing information about the original content. By translating data into vectors, computers can capture the meaning of the data and understand the semantic similarity between two objects. We can quantify semantic similarity of two objects by their proximity in a vector space.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkufewhvd2nfht69ru5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkufewhvd2nfht69ru5v.png" alt="Vector embeddings and vector similarity" width="800" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can measure the semantic similarity by using a distance metric such as Euclidean distance, inner product, or cosine distance. In the following examples, we will use the cosine similarity, which is defined as the cosine of the angle between the two vectors.&lt;/p&gt;

&lt;p&gt;There are numerous embedding models available, including OpenAI, Hugging Face, and Azure AI Vision. Azure AI Vision (formerly known as Azure Computer Vision) provides two Image Retrieval APIs for vectorizing image and text queries: the Vectorize Image API and the Vectorize Text API. This vectorization converts images and text into coordinates in a 1024-dimensional vector space, enabling users to search a collection of images using text and/or images without the need for metadata, such as image tags, labels, or captions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How vector similarity search works?
&lt;/h2&gt;

&lt;p&gt;A vector search system works by comparing the vector embedding of a user’s query with a set of pre-stored vector embeddings to find a list of vectors that are the most similar to the query vector. The diagram below illustrates the workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihwt8fi4dz8v6724ntwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihwt8fi4dz8v6724ntwi.png" alt="Overview of vector similarity search flow" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vector embeddings can be stored in a vector database, which is a specialized type of database optimized for storing and querying vectors with a large number of dimensions.&lt;/p&gt;

&lt;p&gt;In the following video, you can learn more about vector embeddings and vector similarity search with Azure Cosmos DB for PostgreSQL and Azure AI Vision and walk through the Jupyter Notebooks that are available on my &lt;a href="https://github.com/sfoteini/image-vector-search-azure-postgresql"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/EfUACR34yoo"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Python code samples
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://github.com/sfoteini/image-vector-search-azure-postgresql"&gt;GitHub repository&lt;/a&gt;, you can find some Jupyter Notebooks to help you gain hands-on experience with the concepts introduced in the above video. You may want to refer to the README of the project for instructions on how to set up your Azure resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quickstart
&lt;/h3&gt;

&lt;p&gt;In the &lt;a href="https://github.com/sfoteini/image-vector-search-azure-postgresql/tree/main/quickstart"&gt;quickstart&lt;/a&gt;, you will explore the Image Retrieval APIs of Azure AI Vision and the basics of the pgvector extension. You will build a simple app to search a collection of images from a wide range of natural scenes. The images were taken from &lt;a href="https://www.kaggle.com/datasets/nitishabharathi/scene-classification"&gt;Kaggle&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example scenario: Vector search with Azure Cosmos DB for PostgreSQL, Azure Blob Storage and Azure AI Vision
&lt;/h3&gt;

&lt;p&gt;In this &lt;a href="https://github.com/sfoteini/image-vector-search-azure-postgresql/tree/main/vector-search-postgresql-blob-storage"&gt;extended scenario&lt;/a&gt;, you will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Azure Blob Storage to store approximately 8.2k images of paintings that were taken from the &lt;a href="https://www.robots.ox.ac.uk/~vgg/data/paintings/"&gt;Paintings Dataset&lt;/a&gt; of the Visual Geometry Group - University of Oxford.&lt;/li&gt;
&lt;li&gt;Generate the vector embedding of each image in the container using the Vectorize Image API of Azure AI Vision.&lt;/li&gt;
&lt;li&gt;Store the vector embeddings along with a reference to the corresponding image file in a PostgreSQL table.&lt;/li&gt;
&lt;li&gt;Execute SQL queries to search for images that are most similar to a reference image or a text prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;If you’d like to dive deeper into this topic, here are some helpful resources.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cognitive-services/computer-vision/concept-image-retrieval?WT.mc_id=AI-MVP-5004971"&gt;Azure Computer Vision Image Retrieval – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cosmos-db/postgresql/howto-use-pgvector?WT.mc_id=AI-MVP-5004971"&gt;How to use pgvector on Azure Cosmos DB for PostgreSQL – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector"&gt;Official GitHub repository of the pgvector extension&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>azure</category>
      <category>ai</category>
      <category>vectordatabase</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Automate customer reviews processing with Form Recognizer and Azure OpenAI</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Mon, 10 Jul 2023 16:07:58 +0000</pubDate>
      <link>https://forem.com/sfoteini/automate-customer-reviews-processing-with-form-recognizer-and-azure-openai-5063</link>
      <guid>https://forem.com/sfoteini/automate-customer-reviews-processing-with-form-recognizer-and-azure-openai-5063</guid>
      <description>&lt;p&gt;Organizations of all types rely on automated document processing and data extraction to streamline their operations. This article outlines a potential solution for automating PDF form processing, utilizing Azure Form Recognizer for data extraction and Azure OpenAI for "intelligent" data enrichment.&lt;/p&gt;

&lt;p&gt;Data enrichment uses Artificial Intelligence (AI) to extract information, uncover patterns, and gain a deeper understanding of the data. This can be achieved through techniques such as key phrase extraction, named entity recognition, sentiment analysis, opinion mining, and custom models to identify essential information. To enrich the data, you can use the pretrained models of the Azure Cognitive Services for Language or train and deploy a custom model in Azure Machine Learning. In this post, I’ll demonstrate how to utilize the Davinci model of the Azure OpenAI Service to perform sentiment analysis and extract key phrases from customer service review forms.&lt;/p&gt;

&lt;p&gt;In the two-part series "Automate document processing with Form Recognizer and Logic Apps" you learned how to train custom models in Azure Form Recognizer for extracting key-value pairs from documents and build an end-to-end form processing solution using Form Recognizer, Logic Apps, Azure Cosmos DB, and Power BI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://sfoteini.github.io/blog/automate-document-processing-with-form-recognizer-and-logic-apps-part-1/" rel="noopener noreferrer"&gt;Part 1: Create a custom template model in Azure Form Recognizer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sfoteini.github.io/blog/automate-document-processing-with-form-recognizer-and-logic-apps-part-2/" rel="noopener noreferrer"&gt;Part 2: End-to-end document processing automation solution&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we will extend this pipeline by incorporating the Azure OpenAI service to enrich the extracted data. You will learn how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create an Azure OpenAI resource and deploy a model.&lt;/li&gt;
&lt;li&gt;Build a pipeline in Logic Apps to post a request to the Azure OpenAI endpoint, receive and evaluate the response.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Access to the Azure OpenAI service is currently limited. You can request access to the service by filling out the &lt;a href="https://aka.ms/oaiapply" rel="noopener noreferrer"&gt;application form&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The following architecture diagram illustrates the main components involved in the "intelligent" form processing solution that we are building and the information flow. In this post, we will focus on how to integrate the Azure OpenAI service into the pipeline that was built in the previous articles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Farchitecture_hu68eaf647221ca6aaba16f880c08546c1_57712_790x401_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Farchitecture_hu68eaf647221ca6aaba16f880c08546c1_57712_790x401_fit_q95_h2_box_3.webp" alt="Architecture diagram that shows an architecture for automating form processing."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Dataflow
&lt;/h3&gt;

&lt;p&gt;The information flow corresponding to the above architecture diagram is described as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;PDF forms or scanned images are (manually or programmatically) uploaded to a container in Azure Storage Account.&lt;/li&gt;
&lt;li&gt;Whenever a form is uploaded to the specified container, an Event Grid event will trigger the Logic App to start processing the form.&lt;/li&gt;
&lt;li&gt;The Logic App sends the URL of the file to the Form Recognizer API and receives the response data.&lt;/li&gt;
&lt;li&gt;The Logic App posts a request to the Azure OpenAI service to analyze the sentiment and extract key phrases from the customer's review, and then receives the response data.&lt;/li&gt;
&lt;li&gt;The extracted data are saved into a NoSQL database in Azure Cosmos DB.&lt;/li&gt;
&lt;li&gt;Power BI is connected to Azure Cosmos DB to ingest the extracted data and provide dashboards.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Create an Azure OpenAI resource
&lt;/h2&gt;

&lt;p&gt;You will create an Azure OpenAI resource through the Azure portal.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign in to the Azure portal and search for &lt;strong&gt;OpenAI&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the Create Azure OpenAI page provide the following information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Your Azure subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource group&lt;/strong&gt;: Select an existing resource group or create a new one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region&lt;/strong&gt;: Choose any available region, for example, &lt;strong&gt;West Europe&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name&lt;/strong&gt;: Enter a unique name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing tier&lt;/strong&gt;: &lt;strong&gt;Standard (S0)&lt;/strong&gt;.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fcreate-openai_hu9c72d32942d9399bae87bfb94160de88_35756_741x550_fit_q95_h2_box_3.webp" alt="Create an OpenAI resource."&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Review + Create&lt;/strong&gt; and then &lt;strong&gt;Create&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once the deployment is complete, navigate to the &lt;a href="https://oai.azure.com/" rel="noopener noreferrer"&gt;Azure OpenAI Studio&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Deploy a model
&lt;/h2&gt;

&lt;p&gt;Before you can use the Azure OpenAI service to generate text, you need to deploy a model. There are several available &lt;a href="https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;models&lt;/a&gt; in the Azure OpenAI Studio, each of which is tailored to a specific use case. In this post, you will deploy the &lt;code&gt;text-davinci-003&lt;/code&gt; model.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Azure OpenAI Studio, select &lt;strong&gt;Deployments&lt;/strong&gt; under &lt;strong&gt;Management&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click on &lt;strong&gt;+ Create new deployment&lt;/strong&gt;, and deploy a new model with the following settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model name&lt;/strong&gt;: &lt;code&gt;text-davinci-003&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment name&lt;/strong&gt;: Choose a memorable name for your deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once the model is deployed, you can test it in the &lt;strong&gt;Completions playground&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-playground_hu6b54be682893c3615912aa7714aa9686_48772_1098x622_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-playground_hu6b54be682893c3615912aa7714aa9686_48772_1098x622_fit_q95_h2_box_3.webp" alt="The Completions playground of the Azure OpenAI Studio."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the &lt;strong&gt;Completions playground&lt;/strong&gt;, click &lt;strong&gt;View code&lt;/strong&gt;. In the &lt;strong&gt;Sample Code&lt;/strong&gt; window, save the endpoint of your deployed model. You will need this URL in the next step.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-endpoint_hu718247121b948338b593b880ed25474b_17320_549x201_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-endpoint_hu718247121b948338b593b880ed25474b_17320_549x201_fit_q95_h2_box_3.webp" alt="The endpoint of a deployed model in Azure OpenAI."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Build the workflow
&lt;/h2&gt;

&lt;p&gt;In a previous article, you’ve built the workflow below. This pipeline receives a form that is uploaded in an Azure Storage container, extracts the fields from the form and saves the extracted data in Azure Cosmos DB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fworkflow-1_hu12eb0bdeb5033ec1849efccfcc3bdf0a_17601_501x454_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fworkflow-1_hu12eb0bdeb5033ec1849efccfcc3bdf0a_17601_501x454_fit_q95_h2_box_3.webp" alt="Azure Logic App workflow that automates data extraction from forms using Form Recognizer."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You are going to extend this pipeline by adding an HTTP request action to send a request to the Azure OpenAI service. The workflow is illustrated in the following image.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fworkflow-2_hu9456e0aa5f18221c5874bc8df7cde606_106273_1118x458_fit_q95_h2_box.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fworkflow-2_hu9456e0aa5f18221c5874bc8df7cde606_106273_1118x458_fit_q95_h2_box.webp" alt="Azure Logic App workflow that automates data extraction from forms and enrichment using Form Recognizer and Azure OpenAI."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will build the form processing workflow using the Logic App Designer.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;After the &lt;strong&gt;When a resource event occurs&lt;/strong&gt; trigger, add two &lt;strong&gt;Initialize variable&lt;/strong&gt; actions. Create two variables named &lt;code&gt;openai-api-key&lt;/code&gt; and &lt;code&gt;openai-url&lt;/code&gt; to store the Key and the URL of your Azure OpenAI resource, respectively.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fset-variables_hu01b5e3ca0e73ee11353abea9b6c41fe7_18804_565x435_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fset-variables_hu01b5e3ca0e73ee11353abea9b6c41fe7_18804_565x435_fit_q95_h2_box_3.webp" alt="Initialize variables to store the key and the URL of the Azure OpenAI resource."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The key of your Azure OpenAI service can be found by going to the &lt;strong&gt;Keys and Endpoint&lt;/strong&gt; page of your resource in the Azure portal.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Under the &lt;strong&gt;Parse Fields&lt;/strong&gt; block, select the plus (+) sign to add a new action. Select the &lt;strong&gt;HTTP&lt;/strong&gt; action and enter the following information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Method&lt;/strong&gt;: &lt;strong&gt;POST&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URI&lt;/strong&gt;: &lt;code&gt;openai-url&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headers&lt;/strong&gt;: Content-Type: application/json, api-key: &lt;code&gt;openai-api-key&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Body&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You must extract the following information from the review below:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;1. Sentiment (key: Sentiment) (possible values: positive, negative, neutral, mixed)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;2. Opinion for the customer service in an array (key: KeyPhrases)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Answer the fields briefly and provide the results in JSON format using the keys above. For the second field, summarize the opinions if needed. If the review is empty, use &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;NA&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; for the first field and an empty array for the second field.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Review: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;IF EXPRESSION&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"top_p"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"frequency_penalty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"presence_penalty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"best_of"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"stop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Replace &lt;code&gt;IF EXPRESSION&lt;/code&gt; with the following expression, which extracts the value associated with the field &lt;code&gt;Comments&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Comments&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Comments&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-request_hu32e123502b158d5fdfb43b266b62490a_30742_482x484_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-request_hu32e123502b158d5fdfb43b266b62490a_30742_482x484_fit_q95_h2_box_3.webp" alt="Post a request to the OpenAI API."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The output from the completions API will look as follows:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ID of your call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text_completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1680202256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text-davinci-003"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text generated by the OpenAI API"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"finish_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"logprobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;121&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;162&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add a &lt;strong&gt;Parse JSON&lt;/strong&gt; action to extract the response of your OpenAI model. Specify the following details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content&lt;/strong&gt;: Select &lt;strong&gt;Add dynamic content&lt;/strong&gt; and find the block called &lt;code&gt;Body&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema&lt;/strong&gt;: To generate the schema, use a sample JSON response of the OpenAI API.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-response_hud981d345ea9b3fb8429354a8c8064745_17683_592x405_fit_q95_h2_box_3.webp" alt="Use a Parse JSON action to extract the response of the OpenAI API."&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Then, use a second &lt;strong&gt;Parse JSON&lt;/strong&gt; action to extract the values generated by the OpenAI API for the keys named &lt;code&gt;Sentiment&lt;/code&gt; and &lt;code&gt;KeyPhrases&lt;/code&gt;. Enter the following information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content&lt;/strong&gt;: Select &lt;strong&gt;Add dynamic content&lt;/strong&gt; and find the block called &lt;code&gt;text&lt;/code&gt; under &lt;strong&gt;Parse OpenAI response&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Schema&lt;/strong&gt;: To generate the schema, use a sample JSON generated by the OpenAI API or the following schema:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"KeyPhrases"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-values_hu51484d59e926014ed6899069a9679d19_19346_621x443_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fopenai-values_hu51484d59e926014ed6899069a9679d19_19346_621x443_fit_q95_h2_box_3.webp" alt="Use a Parse JSON action to extract the values generated by the OpenAI API."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Logic App automatically adds a &lt;strong&gt;For each&lt;/strong&gt; block around the &lt;strong&gt;Parse JSON&lt;/strong&gt; block.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the existing &lt;strong&gt;Compose&lt;/strong&gt; action to include the information generated by Azure OpenAI. To extract the values associated with the fields &lt;code&gt;Sentiment&lt;/code&gt; and &lt;code&gt;KeyPhrases&lt;/code&gt;, select &lt;strong&gt;Add dynamic content&lt;/strong&gt; and choose the respective blocks.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fcompose_hua8985d1659ed5669887a695dcd94768b_21175_618x398_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fcompose_hua8985d1659ed5669887a695dcd94768b_21175_618x398_fit_q95_h2_box_3.webp" alt="Generate JSON output."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the &lt;strong&gt;Create or update document (V3)&lt;/strong&gt; action as needed.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fsave-data_hu214c80e5436c6ed96b879448ad3e8d99_24022_614x467_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fsave-data_hu214c80e5436c6ed96b879448ad3e8d99_24022_614x467_fit_q95_h2_box_3.webp" alt="Save the results in an Azure Cosmos DB database."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Save the workflow and then click &lt;strong&gt;Run Trigger &amp;gt; Run&lt;/strong&gt;. Upload a file to your Azure Storage container to test the Logic App.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualize the data in Power BI
&lt;/h2&gt;

&lt;p&gt;You can use Power BI to visualize the results obtained from the form processing workflow. If you don’t have a Power BI subscription, you can use Power BI Desktop, which is a free service.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Power BI Desktop and in the Home tab select &lt;strong&gt;Get data &amp;gt; More…&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Choose the &lt;strong&gt;Azure Cosmos DB v1&lt;/strong&gt; connection and click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;In the pop-up window, enter the URL of your Cosmos DB account and the id of your database and collection. Then, click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Once Power BI is connected to your Cosmos DB account, you can see the stored data and transform it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below you can see a simple Power BI dashboard that I created to visualize the data generated by the Azure OpenAI service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fpowerbi_hu7823d172265e8278b876d9617ef5aa68_60875_888x491_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-customer-reviews-processing-with-form-recognizer-and-azure-openai%2Fpowerbi_hu7823d172265e8278b876d9617ef5aa68_60875_888x491_fit_q95_h2_box_3.webp" alt="Power BI dashboard."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary and next steps
&lt;/h2&gt;

&lt;p&gt;In this article, you created an end-to-end automated form processing solution using Form Recognizer, Logic Apps, and Azure OpenAI. You can use this solution to create automated workflows for your specific scenarios. You can also extend this scenario by integrating Azure Cognitive Services for Language into your Logic App workflow.&lt;/p&gt;

&lt;p&gt;You can learn more about Azure OpenAI in the resources below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/cognitive-services/openai/overview?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;What is Azure OpenAI Service? – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/revolutionize-your-enterprise-data-with-chatgpt-next-gen-apps-w/ba-p/3762087?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineering student and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/" rel="noopener noreferrer"&gt;Blog&lt;/a&gt; | &lt;a href="https://github.com/sfoteini" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>ai</category>
      <category>openai</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Automate document processing with Form Recognizer and Logic Apps (Part 2)</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Mon, 10 Jul 2023 13:51:50 +0000</pubDate>
      <link>https://forem.com/sfoteini/automate-document-processing-with-form-recognizer-and-logic-apps-part-2-13kl</link>
      <guid>https://forem.com/sfoteini/automate-document-processing-with-form-recognizer-and-logic-apps-part-2-13kl</guid>
      <description>&lt;p&gt;Processing of forms and documents is part of several scenarios both in business and in everyday life. Manual data extraction from documents, either in electronic or printed format, is time-consuming, costly, and error-prone.&lt;/p&gt;

&lt;p&gt;Azure Form Recognizer is an Applied AI Service that enables you to extract text, table data, key-value pairs, and layout information from forms and documents. In this two-part series, you will learn how to build an end-to-end document processing automation solution utilizing Azure Form Recognizer, Logic Apps, Azure Cosmos DB, and Power BI.&lt;/p&gt;

&lt;p&gt;In the first part, you trained two custom models for extracting key-value pairs from customer service review forms and composed these models together into a single model. In this article, you will build an automated form processing solution.&lt;/p&gt;

&lt;p&gt;You will learn how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Logic App workflow that responds to Event Grid events.&lt;/li&gt;
&lt;li&gt;Integrate Form Recognizer into a Logic Apps workflow.&lt;/li&gt;
&lt;li&gt;Store the extracted information from the Form Recognizer’s model in Azure Cosmos DB.&lt;/li&gt;
&lt;li&gt;Use Power BI to visualize the insights from the analysis of the forms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To build this solution, you will need an Azure subscription. If you don’t have one, you can sign up for an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure free account&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scenario details
&lt;/h2&gt;

&lt;p&gt;Consider that you work in a company that provides customer service to a variety of customers. Every day, customers leave reviews about their experiences with the customer service they received. You need to analyze these reviews in order to identify areas of improvement and to track customer satisfaction. These reviews are submitted as paper forms or PDFs and may include up to two different form types. To make this process easier, we need to use a single service to analyze all the reviews.&lt;/p&gt;

&lt;p&gt;In the following image, you can see the workflow that we will build to automate the process of extracting and analyzing customer service reviews. In this post, we will focus on the second and third steps of the workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fform-processing-solution_hu0d2f9a5c9b70f23615414c7b18aa227f_197712_790x274_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fform-processing-solution_hu0d2f9a5c9b70f23615414c7b18aa227f_197712_790x274_fit_q95_h2_box_3.webp" alt="Form processing automation solution" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The following architecture diagram illustrates the main components involved in the automated form processing solution that we are building and the information flow. The system receives the forms (either in PDF or scanned image format), extracts the fields from the form and saves the extracted data in Azure Cosmos DB. Power BI is then used to visualize the insights from the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Farchitecture_hucc69a0bc34bf688087dd8b7facbed9fd_48404_790x376_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Farchitecture_hucc69a0bc34bf688087dd8b7facbed9fd_48404_790x376_fit_q95_h2_box_3.webp" alt="Architecture diagram that shows an architecture for automating form processing." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Dataflow
&lt;/h3&gt;

&lt;p&gt;The information flow corresponding to the above architecture diagram is described as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;PDF forms or scanned images are (manually or programmatically) uploaded to a container in Azure Storage Account.&lt;/li&gt;
&lt;li&gt;Whenever a form is uploaded to the specified container, an Event Grid event will trigger the Logic App to start processing the form.&lt;/li&gt;
&lt;li&gt;The Logic App sends the URL of the file to the Form Recognizer API and receives the response data.&lt;/li&gt;
&lt;li&gt;The extracted key-value pairs are saved into a NoSQL database in Azure Cosmos DB.&lt;/li&gt;
&lt;li&gt;Power BI is connected to Azure Cosmos DB to ingest the extracted data and provide dashboards.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Components
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setup Azure Blob Storage
&lt;/h3&gt;

&lt;p&gt;You will create a storage account and a container to upload the forms that will be processed by our Logic App.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign in to the Azure portal, search for &lt;strong&gt;Storage accounts&lt;/strong&gt; and then select &lt;strong&gt;Create&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a storage account with the following settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Your Azure subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource group&lt;/strong&gt;: Select an existing resource group (i.e. the resource group that you created in the &lt;a href="https://sfoteini.github.io/blog/automate-document-processing-with-form-recognizer-and-logic-apps-part-1/" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;) or create a new one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage account name&lt;/strong&gt;: Enter a unique name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region&lt;/strong&gt;: Choose any available region, for example, &lt;strong&gt;West Europe&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Standard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redundancy&lt;/strong&gt;: Locally-redundant storage (LRS).
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcreate-storage-account_huc2dfea4d93aea28d1e8f329fe7c12ed6_37817_611x591_fit_q95_h2_box_3.webp" alt="Create a storage account." width="800" height="400"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Review + Create&lt;/strong&gt; and then select the &lt;strong&gt;Create&lt;/strong&gt; button and wait for the deployment to complete. Once the deployment is complete, navigate to your storage account.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the left pane, under &lt;strong&gt;Data storage&lt;/strong&gt;, select &lt;strong&gt;Containers&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a new container. Select a name and set the public access level to &lt;strong&gt;Container&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcreate-container_hu1ed618d0c66dc493fd9449091123cf31_10835_423x198_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcreate-container_hu1ed618d0c66dc493fd9449091123cf31_10835_423x198_fit_q95_h2_box_3.webp" alt="Create a container." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Setup Event Grid
&lt;/h3&gt;

&lt;p&gt;Event Grid is a kind of event broker that you can use to integrate applications using events. We will use Event Grid as a trigger to run the Logic App when a file is uploaded in the Azure Storage container.&lt;/p&gt;

&lt;p&gt;To use Event Grid, you must first register it as a resource provider in your subscription.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fregister-event-grid_hu30b74b4f8419408ca10bcc0dcb0ad87a_103926_1151x525_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fregister-event-grid_hu30b74b4f8419408ca10bcc0dcb0ad87a_103926_1151x525_fit_q95_h2_box_3.webp" alt="Register Event Grid as a resource provider in your Azure subscription." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup Azure Cosmos DB
&lt;/h3&gt;

&lt;p&gt;You need to create an Azure Cosmos DB account, database, and container to store the fields extracted from the safety forms.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Azure portal, search for &lt;strong&gt;Azure Cosmos DB&lt;/strong&gt; and then click &lt;strong&gt;Create&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Create a new &lt;strong&gt;Azure Cosmos DB for NoSQL account&lt;/strong&gt; by selecting the corresponding card in the &lt;strong&gt;Which API best suits your workload?&lt;/strong&gt; window.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create an Azure Cosmos DB account with the settings below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Your Azure subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource group&lt;/strong&gt;: Select an existing resource group (i.e. the resource group that you created in the &lt;a href="https://sfoteini.github.io/blog/automate-document-processing-with-form-recognizer-and-logic-apps-part-1/" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;) or create a new one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Account name&lt;/strong&gt;: Enter a unique name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Location&lt;/strong&gt;: Choose any available region, for example, &lt;strong&gt;West Europe&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capacity mode&lt;/strong&gt;: Provisioned throughput.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limit total account throughput&lt;/strong&gt;: selected.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcreate-cosmos-db_hu19bc4d84482b0dd0cebe030849f8616e_50129_702x571_fit_q95_h2_box_3.webp" alt="Create an Azure Cosmos DB account." width="800" height="400"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Review + Create&lt;/strong&gt; and then select the &lt;strong&gt;Create&lt;/strong&gt; button and wait for the deployment to complete. Once the deployment is complete, navigate to your resource.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the left pane, select &lt;strong&gt;Data Explorer&lt;/strong&gt;. Then, create a new database and a container.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcreate-database-container_hu8e4449d8cc90ca8fd12146f5ec46b704_93042_750x413_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcreate-database-container_hu8e4449d8cc90ca8fd12146f5ec46b704_93042_750x413_fit_q95_h2_box_3.webp" alt="Create a database and a container in Azure Cosmos DB." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Create a Logic App
&lt;/h3&gt;

&lt;p&gt;The last component that we need to provision is a Logic App.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Azure portal, search for &lt;strong&gt;Logic apps&lt;/strong&gt; and then click &lt;strong&gt;+ Add&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a Logic App by specifying the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Your Azure subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource group&lt;/strong&gt;: Select an existing resource group (i.e. the resource group that you created in the &lt;a href="https://sfoteini.github.io/blog/automate-document-processing-with-form-recognizer-and-logic-apps-part-1/" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;) or create a new one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logic App name&lt;/strong&gt;: Enter a unique name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region&lt;/strong&gt;: Choose any available region, for example, &lt;strong&gt;West Europe&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan type&lt;/strong&gt;: Consumption.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zone redundancy&lt;/strong&gt;: Disabled.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcreate-logic-app_hu1bbd652d55667d3ec83bc727ae14f406_42567_492x513_fit_q95_h2_box_3.webp" alt="Create a Logic App." width="800" height="400"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Review + Create&lt;/strong&gt; and then &lt;strong&gt;Create&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once the deployment is complete, navigate to your Logic App resource. You are now ready to build the document processing workflow.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Build the workflow
&lt;/h2&gt;

&lt;p&gt;You will build the document processing workflow using the Logic App Designer, a graphical user interface that enables you to create workflows visually.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Once the Logic App resource is created, you will see the starter window of Logic App designer. In the &lt;strong&gt;Start with a common trigger&lt;/strong&gt; window, select the &lt;strong&gt;When an Event Grid event occurs&lt;/strong&gt; block.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fnew-workflow_hu9dc0bd29036db352209b93fd0d4b3025_152901_800x429_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fnew-workflow_hu9dc0bd29036db352209b93fd0d4b3025_152901_800x429_fit_q95_h2_box_3.webp" alt="Create a workflow that executes when an Event Grid event occurs." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you can’t find this block, create an empty workflow and then select the &lt;strong&gt;When a resource event occurs&lt;/strong&gt; trigger.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Sign in to your account. In the &lt;strong&gt;When a resource event occurs&lt;/strong&gt; block specify the following details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Select your subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Type&lt;/strong&gt;: &lt;code&gt;Microsoft.Storage.StorageAccounts&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Name&lt;/strong&gt;: The name of your Storage account.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Event Type Item – 1&lt;/strong&gt;: &lt;code&gt;Microsoft.Storage.BlobCreated&lt;/code&gt;
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fevent-grid-trigger_huef1df4eaf2c9281838a6968d14110931_23650_618x415_fit_q95_h2_box_3.webp" alt="Setup the When a resource event occurs trigger." width="800" height="400"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;In the Event Grid trigger block, you can optionally apply filters. For example, you can use the &lt;strong&gt;Prefix Filter&lt;/strong&gt; to subscribe to events from a specific container in the Storage account or use the &lt;strong&gt;Suffix Filter&lt;/strong&gt; to filter events based on the extension of the uploaded file, such as &lt;em&gt;.jpg&lt;/em&gt; or &lt;em&gt;.pdf&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a new step. In the &lt;strong&gt;Choose an operation&lt;/strong&gt; window, search for &lt;strong&gt;Parse JSON&lt;/strong&gt; and then select the respective action.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the &lt;strong&gt;Parse JSON&lt;/strong&gt; block, enter the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content&lt;/strong&gt;: &lt;code&gt;Body&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema&lt;/strong&gt;: Use the following schema.
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"blobType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"clientRequestId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"contentLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"contentType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"eTag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"requestId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"sequencer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"storageDiagnostics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                           &lt;/span&gt;&lt;span class="nl"&gt;"batchId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                           &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"dataVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"eventTime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"eventType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"metadataVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fparse-json_hufba8241a3992c74aca2299d4cb465c5e_74813_857x469_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fparse-json_hufba8241a3992c74aca2299d4cb465c5e_74813_857x469_fit_q95_h2_box_3.webp" alt="Configure the Parse JSON block." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find the &lt;strong&gt;Analyze Document for Prebuilt or Custom models (v3.0 API)&lt;/strong&gt; in the Form Recognizer connector and add it to your workflow.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fform-recognizer-api_hu9b76030f831d4fa703c20987a63c2a78_64829_667x451_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fform-recognizer-api_hu9b76030f831d4fa703c20987a63c2a78_64829_667x451_fit_q95_h2_box_3.webp" alt="Add an action to analyze a form using Form Recognizer." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Connect to Form Recognizer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection name&lt;/strong&gt;: Enter a name for the connection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Endpoint URL&lt;/strong&gt;: Enter the &lt;strong&gt;Endpoint URL&lt;/strong&gt; of your Form Recognizer resource.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Account Key&lt;/strong&gt;: Enter the &lt;strong&gt;Key 1&lt;/strong&gt; of your Form Recognizer resource.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;You can find the &lt;strong&gt;Endpoint URL&lt;/strong&gt; and the &lt;strong&gt;Key&lt;/strong&gt; in your Form Recognizer resource, under &lt;strong&gt;Keys and Endpoint&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Then, specify the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Identifier&lt;/strong&gt;: Enter the name of the model that will be used to analyze the forms (i.e. the name of the composed model that you created in &lt;a href="https://sfoteini.github.io/blog/automate-document-processing-with-form-recognizer-and-logic-apps-part-1/" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document/Image URL&lt;/strong&gt;: Click &lt;strong&gt;Add dynamic content&lt;/strong&gt; and select the &lt;code&gt;url&lt;/code&gt;.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fform-recognizer-action_hudd7d97b33d6bda57aad10deab78435c1_17213_616x315_fit_q95_h2_box_3.webp" alt="Configure the Form Recognizer action." width="800" height="400"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Save the extracted data
&lt;/h3&gt;

&lt;p&gt;The JSON response of the &lt;strong&gt;Analyze Document for Prebuilt or Custom models (v3.0 API)&lt;/strong&gt; action includes several information, such as the text extracted from the form, layout information, selection marks, and key-value pairs. We are interested in extracting the fields and their associated values. In the following steps, you will build a pipeline that extracts the fields from the JSON response, processes the values, generates a JSON file and saves the file in Azure Cosmos DB.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Add a &lt;strong&gt;Parse JSON&lt;/strong&gt; action and specify the following details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content&lt;/strong&gt;: Select &lt;strong&gt;Add dynamic content&lt;/strong&gt; and find the block called &lt;code&gt;fields&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema&lt;/strong&gt;: To generate the schema, use a sample JSON response of the Form Recognizer action. Click &lt;strong&gt;Use sample payload to generate schema&lt;/strong&gt; and paste the section of the Form Recognizer’s response identified by the term &lt;code&gt;"fields"&lt;/code&gt;.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fparse-fields_hu16ba4c538942a2c4a6a4db7ba40c485e_29196_566x443_fit_q95_h2_box_3.webp" alt="Use a Parse JSON action to extract key-value pairs." width="800" height="400"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The Logic App automatically adds a &lt;strong&gt;For each&lt;/strong&gt; block around the &lt;strong&gt;Parse JSON&lt;/strong&gt; block.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add a &lt;strong&gt;Compose&lt;/strong&gt; action (inside the &lt;strong&gt;For each&lt;/strong&gt; block). We will save the extracted fields in the following format:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"BillingPayment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"OtherService"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ProductServiceInformation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Recommend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Satisfaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Suggestions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"TechnicalSupport"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"value8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To extract the value associated with each field, use &lt;code&gt;if&lt;/code&gt; statements. You can add an &lt;code&gt;if&lt;/code&gt; statement by selecting &lt;strong&gt;Add dynamic content &amp;gt; Expression&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, to extract the values associated with the fields &lt;code&gt;BillingPayment&lt;/code&gt;, &lt;code&gt;Comments&lt;/code&gt; and &lt;code&gt;Recommend&lt;/code&gt;, you can use the below expressions:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// BillingPayment&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;and&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;BillingPayment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;BillingPayment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;valueSelectionMark&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Comments&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Comments&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Comments&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NA&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Recommend&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;and&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendVeryLikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendVeryLikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;valueSelectionMark&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Very likely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;and&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendLikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendLikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;valueSelectionMark&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Likely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;and&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendNeutral&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendNeutral&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;valueSelectionMark&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Neutral&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;and&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendUnlikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendUnlikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;valueSelectionMark&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unlikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;and&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendVeryUnlikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Parse_Fields&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RecommendVeryUnlikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;valueSelectionMark&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Very unlikely&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NA&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)))))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;In a similar way, you can extract the values corresponding to the remaining fields.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcompose-action_hu39e0f22b8e60e8fabadebe92712c7557_19921_554x426_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcompose-action_hu39e0f22b8e60e8fabadebe92712c7557_19921_554x426_fit_q95_h2_box_3.webp" alt="Generate JSON output." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click &lt;strong&gt;Add an action&lt;/strong&gt; and select the &lt;strong&gt;Create or update document (V3)&lt;/strong&gt; action of the Azure Cosmos DB connector. Then, configure the Azure Cosmos DB connection by specifying the values below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection name&lt;/strong&gt;: Enter a name for the connection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication type&lt;/strong&gt;: Access Key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access Key to your Cosmos DB account&lt;/strong&gt;: Enter the primary (or secondary) key of your Azure Cosmos DB resource.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Account ID&lt;/strong&gt;: Enter the name of your Azure Cosmos DB account.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fcosmos-db-connection_hue3080d022f122fb99859203f6161ef56_51637_602x490_fit_q95_h2_box_3.webp" alt="Configure the Azure Cosmos DB connection." width="800" height="400"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;To find the primary key of your Cosmos DB account, navigate to your Cosmos DB resource and select &lt;strong&gt;Keys&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the &lt;strong&gt;Create or update document (V3)&lt;/strong&gt; block, specify the ID of your database and collection. Then, under &lt;strong&gt;Document&lt;/strong&gt;, select &lt;strong&gt;Add dynamic content&lt;/strong&gt; and add the block named &lt;code&gt;Outputs&lt;/code&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fsave-results-cosmos-db_hu9a13a8bb8e93893a1de323f4706fa41a_17749_586x382_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fsave-results-cosmos-db_hu9a13a8bb8e93893a1de323f4706fa41a_17749_586x382_fit_q95_h2_box_3.webp" alt="Save the results in an Azure Cosmos DB database." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Save the workflow and then click &lt;strong&gt;Run Trigger &amp;gt; Run&lt;/strong&gt;. Upload a file to your Azure Storage container to test the Logic App.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect Power BI to Azure Cosmos DB
&lt;/h2&gt;

&lt;p&gt;You can use Power BI to visualize the results obtained from the form processing workflow. If you don’t have a Power BI subscription, you can use Power BI Desktop, which is a free service.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Power BI Desktop and in the &lt;strong&gt;Home&lt;/strong&gt; tab select &lt;strong&gt;Get data &amp;gt; More…&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Choose the &lt;strong&gt;Azure Cosmos DB v1&lt;/strong&gt; connection and click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;In the pop-up window, enter the URL of your Cosmos DB account and the id of your database and collection. Then, click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Once Power BI is connected to your Cosmos DB account, you can see the stored data and transform it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below you can see a simple Power BI dashboard that I created to visualize the extracted data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fpower-bi-dashboard_hud4e64dcb0aba8e1d1e5189f560b55220_50118_985x549_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-2%2Fpower-bi-dashboard_hud4e64dcb0aba8e1d1e5189f560b55220_50118_985x549_fit_q95_h2_box_3.webp" alt="Power BI dashboard." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary and next steps
&lt;/h2&gt;

&lt;p&gt;In this article, you created an end-to-end automated form processing solution using Form Recognizer, Logic Apps, Azure Cosmos DB and Power BI. You can use this solution to create automated workflows for your specific scenarios. You can also extend this scenario by adding AI capabilities (such as, sentiment analysis, key-phrase extraction, opinion mining) in your Logic App workflow.&lt;/p&gt;

&lt;p&gt;Here are some additional scenarios that leverage Azure Form Recognizer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/applied-ai-services/form-recognizer/tutorial-azure-function?view=form-recog-3.0.0&amp;amp;WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Use Azure Functions and Python to process stored documents – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/ai-builder/create-form-processing-model?toc=%2Fazure%2Fapplied-ai-services%2Fform-recognizer%2Ftoc.json&amp;amp;bc=%2Fazure%2Fapplied-ai-services%2Fform-recognizer%2Fbreadcrumb%2Ftoc.json&amp;amp;view=form-recog-3.0.0&amp;amp;WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Create Form Recognizer workflows with AI Builder – Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Clean-up
&lt;/h2&gt;

&lt;p&gt;If you have finished learning, you can delete the resource group from your Azure subscription:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Azure portal, select &lt;strong&gt;Resource groups&lt;/strong&gt; on the right menu and then select the resource group that you have created.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Delete resource group&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineering student and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/" rel="noopener noreferrer"&gt;Blog&lt;/a&gt; | &lt;a href="https://github.com/sfoteini" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>ai</category>
      <category>cloud</category>
      <category>logicapps</category>
    </item>
    <item>
      <title>Automate document processing with Form Recognizer and Logic Apps (Part 1)</title>
      <dc:creator>Foteini Savvidou</dc:creator>
      <pubDate>Sun, 09 Jul 2023 10:51:19 +0000</pubDate>
      <link>https://forem.com/sfoteini/automate-document-processing-with-form-recognizer-and-logic-apps-part-1-2ep5</link>
      <guid>https://forem.com/sfoteini/automate-document-processing-with-form-recognizer-and-logic-apps-part-1-2ep5</guid>
      <description>&lt;p&gt;Processing of forms and documents is part of several scenarios both in business and in everyday life. Manual data extraction from documents, either in electronic or printed format, is time-consuming, costly, and error-prone.&lt;/p&gt;

&lt;p&gt;Azure Form Recognizer is an Applied AI Service that enables you to extract text, table data, key-value pairs, and layout information from forms and documents. In this two-part series, you will learn how to build an end-to-end document processing automation solution utilizing Azure Form Recognizer, Logic Apps, Azure Cosmos DB, and Power BI. In the first part, you will be introduced to the Form Recognizer's features and train a custom model for extracting key-value pairs from customer service review forms.&lt;/p&gt;

&lt;p&gt;You will learn how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provision a Form Recognizer resource.&lt;/li&gt;
&lt;li&gt;Train a custom model in Form Recognizer Studio.&lt;/li&gt;
&lt;li&gt;Compose two custom template models together into a single model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To build this solution, you will need an Azure subscription. If you don’t have one, you can sign up for an &lt;a href="https://azure.microsoft.com/free/?WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Azure free account&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Azure Form Recognizer?
&lt;/h2&gt;

&lt;p&gt;Azure Form Recognizer is a cloud-based service that applies machine learning-based Optical Character Recognition (OCR) and document understanding technologies to extract data from forms and documents. It provides numerous pre-built models for analyzing common documents, such as invoices, receipts, and business cards, while you can also build custom models to analyze documents specific to your business.&lt;/p&gt;

&lt;p&gt;Study the following sketch note to learn more about the Form Recognizer’s features.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsfoteini%2Fsketchnotes%2Fblob%2Fmain%2Fform-recognizer.jpg%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsfoteini%2Fsketchnotes%2Fblob%2Fmain%2Fform-recognizer.jpg%3Fraw%3Dtrue" alt="Overview of Azure Form Recognizer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Scenario details
&lt;/h2&gt;

&lt;p&gt;Consider that you work in a company that provides customer service to a variety of customers. Every day, customers leave reviews about their experiences with the customer service they received. You need to analyze these reviews in order to identify areas of improvement and to track customer satisfaction. These reviews are submitted as paper forms or PDFs and may include up to two different form types. To make this process easier, we need to use a single service to analyze all the reviews.&lt;/p&gt;

&lt;p&gt;In the following image, you can see the workflow that we will build to automate the process of extracting and analyzing customer service reviews. In this post, we will focus on the first step of the workflow: &lt;em&gt;Train a model in Azure Form Recognizer&lt;/em&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fform-processing-solution_hu1d2af18fb0bb92904bf9fe4b1a883f59_194617_790x274_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fform-processing-solution_hu1d2af18fb0bb92904bf9fe4b1a883f59_194617_790x274_fit_q95_h2_box_3.webp" alt="Form processing automation solution"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a Form Recognizer resource
&lt;/h2&gt;

&lt;p&gt;To use Form Recognizer, you need to create a &lt;strong&gt;Cognitive Services&lt;/strong&gt; or a &lt;strong&gt;Form Recognizer&lt;/strong&gt; resource. In this article, I’ll show you how to create a single-service resource in the Azure portal.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Cognitive Services&lt;/strong&gt; resource is a multi-service resource that provides access to the entire collection of Azure Cognitive Services under a single endpoint and key.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Sign-in to the Azure portal and select &lt;strong&gt;Create a resource&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Search for &lt;strong&gt;Form Recognizer&lt;/strong&gt; and then click &lt;strong&gt;Create&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a Form Recognizer resource with the following settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Your Azure subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource group&lt;/strong&gt;: Select an existing resource group or create a new one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region&lt;/strong&gt;: Choose any available region, for example, &lt;strong&gt;West Europe&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name&lt;/strong&gt;: This would be your custom domain name in your endpoint. Enter a unique name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing tier&lt;/strong&gt;: &lt;strong&gt;Free F0&lt;/strong&gt;.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fcreate-form-recognizer_huafa9dc3954e3b6a76472c8114fa7a394_42335_663x589_fit_q95_h2_box_3.webp" alt="Create a Form Recognizer resource."&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Review + Create&lt;/strong&gt; and then select the &lt;strong&gt;Create&lt;/strong&gt; button and wait for the deployment to complete.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Navigate to the &lt;a href="https://formrecognizer.appliedai.azure.com/" rel="noopener noreferrer"&gt;Form Recognizer Studio&lt;/a&gt; and sign in.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;If this is your first time using Form Recognizer Studio, configure your service resource by selecting your subscription, resource group, and the existing Form Recognizer resource.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Additional resources for custom projects
&lt;/h2&gt;

&lt;p&gt;To train a custom model, in addition to the Form Recognizer resource, you need to create an &lt;strong&gt;Azure Storage Account&lt;/strong&gt; and a container to upload your training dataset.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Azure Portal, search for &lt;strong&gt;Storage accounts&lt;/strong&gt; and then select &lt;strong&gt;Create&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a storage account with the following settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription&lt;/strong&gt;: Your Azure subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource group&lt;/strong&gt;: Select the resource group that you created in the previous step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage account name&lt;/strong&gt;: Enter a unique name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region&lt;/strong&gt;: Choose any available region, for example, &lt;strong&gt;West Europe&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Standard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redundancy&lt;/strong&gt;: Locally-redundant storage (LRS).
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fcreate-storage-account_huc2dfea4d93aea28d1e8f329fe7c12ed6_37817_611x591_fit_q95_h2_box_3.webp" alt="Create a storage account."&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Review + Create&lt;/strong&gt; and then select the &lt;strong&gt;Create&lt;/strong&gt; button and wait for the deployment to complete. Once the deployment is complete, navigate to your storage account.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the left pane, under &lt;strong&gt;Data storage&lt;/strong&gt;, select &lt;strong&gt;Containers&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a new container. Select a name and set the public access level to &lt;strong&gt;Container&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fcreate-container_huf70d1cdcc2c37d849bccf87f96a8df87_10818_427x199_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fcreate-container_huf70d1cdcc2c37d849bccf87f96a8df87_10818_427x199_fit_q95_h2_box_3.webp" alt="Create a container."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the left pane, under &lt;strong&gt;Settings&lt;/strong&gt;, select &lt;strong&gt;Resource sharing (CORS)&lt;/strong&gt;. Create a new CORS entry and set the &lt;strong&gt;Allowed origins&lt;/strong&gt; to &lt;code&gt;https://formrecognizer.appliedai.azure.com&lt;/code&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fconfigure-cors_hud9edc57086068311aaa818e3c89b6ff5_15060_1077x169_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fconfigure-cors_hud9edc57086068311aaa818e3c89b6ff5_15060_1077x169_fit_q95_h2_box_3.webp" alt="Configure Cross Origin Resource Sharing."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now you can upload your sample documents set to the container that you created.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can organize your data in folders and then specify the folder path in the custom project creation window in Form Recognizer Studio.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Create a custom project
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;In the Form Recognizer Studio, select the &lt;strong&gt;Custom extraction models&lt;/strong&gt; card.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a new project and specify the following details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project name&lt;/strong&gt;: Customer Service Reviews - Set 1&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure service resource&lt;/strong&gt;: Select your subscription, resource group, and Form Recognizer resource. In the API version, you can choose the &lt;code&gt;2022-08-31 (General Availability)&lt;/code&gt; API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect training data source&lt;/strong&gt;: Select your subscription, resource group, storage account, and blob container. If you’ve organized your data in folders, specify the folder path in the corresponding field.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fcreate-custom-project_hu66cda144f9555b04f118d936d0ed1f7b_277269_1980x1280_fit_q95_h2_box.webp" alt="Create a custom extraction project."&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select &lt;strong&gt;Create project&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Label data and train model
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;In the &lt;strong&gt;Label data&lt;/strong&gt; view, define the labels that you want to extract from your documents and their types (field, selection mark, signature, or table). For labels of type &lt;strong&gt;Field&lt;/strong&gt;, you can additionally specify their sub-type, such as string or number.&lt;/li&gt;
&lt;li&gt;Select a document from your dataset. Form Recognizer will run the document through the Layout API to extract text and layout information. &lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Then, you can start labeling your documents by selecting the text or selection mark in the document and choosing the appropriate label from the drop-down list.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When labeling selection marks, select only the selection mark, without the surrounding text.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once you have labeled at least five documents, select the &lt;strong&gt;Train&lt;/strong&gt; option.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Flabel-data_hude3e587f32c37dd8d1b1d992e740e9aa_264509_1153x553_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Flabel-data_hude3e587f32c37dd8d1b1d992e740e9aa_264509_1153x553_fit_q95_h2_box_3.webp" alt="Custom extraction model labeling data view."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the &lt;strong&gt;Train a new model window&lt;/strong&gt;, ender a &lt;strong&gt;Model ID&lt;/strong&gt; and set the &lt;strong&gt;Build Mode&lt;/strong&gt; to &lt;strong&gt;Template&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Ftrain-model_hu0c4821228bf0f659b55800d152a56770_18789_718x370_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Ftrain-model_hu0c4821228bf0f659b55800d152a56770_18789_718x370_fit_q95_h2_box_3.webp" alt="Train a custom template model."&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Compose models
&lt;/h2&gt;

&lt;p&gt;Once you’ve trained a template model for each document layout variation, you can compose them together into a single model. Then, you can use the composed model in your applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fcomposed-model_hu603c47474f7dbc0d231b197ce4743f4f_36068_1117x274_fit_q95_h2_box_3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsfoteini.github.io%2Fimages%2Fpost%2Fautomate-document-processing-with-form-recognizer-and-logic-apps-part-1%2Fcomposed-model_hu603c47474f7dbc0d231b197ce4743f4f_36068_1117x274_fit_q95_h2_box_3.webp" alt="Create a composed model."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary and next steps
&lt;/h2&gt;

&lt;p&gt;In this article, you learned how to train a custom template model in Azure Form Recognizer and create a composed model. In the &lt;a href="https://sfoteini.github.io/blog/automate-document-processing-with-form-recognizer-and-logic-apps-part-2/" rel="noopener noreferrer"&gt;next post&lt;/a&gt;, you will use the composed model in an Azure Logic Apps workflow to build an automated form processing solution.&lt;/p&gt;

&lt;p&gt;If you want to improve the accuracy of the template model that you’ve trained, you can check out the following resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/applied-ai-services/form-recognizer/concept-custom-label?view=form-recog-3.0.0&amp;amp;WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Best practices: Generating Form Recognizer labeled dataset - Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/azure/applied-ai-services/form-recognizer/concept-custom-label-tips?view=form-recog-3.0.0&amp;amp;WT.mc_id=AI-MVP-5004971" rel="noopener noreferrer"&gt;Tips for labeling custom model datasets - Microsoft Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Clean-up
&lt;/h2&gt;

&lt;p&gt;If you have finished learning, you can delete the resource group from your Azure subscription:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Azure portal, select &lt;strong&gt;Resource groups&lt;/strong&gt; on the right menu and then select the resource group that you have created.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Delete resource group&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;👋 &lt;strong&gt;Hi, I am Foteini Savvidou!&lt;/strong&gt;&lt;br&gt;
An Electrical and Computer Engineering student and Microsoft AI MVP (Most Valuable Professional) from Greece.&lt;/p&gt;

&lt;p&gt;🌈 &lt;a href="https://www.linkedin.com/in/foteini-savvidou" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://sfoteini.github.io/" rel="noopener noreferrer"&gt;Blog&lt;/a&gt; | &lt;a href="https://github.com/sfoteini" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>ai</category>
      <category>logicapps</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
