<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Basil Ahamed</title>
    <description>The latest articles on Forem by Basil Ahamed (@basil_ahamed).</description>
    <link>https://forem.com/basil_ahamed</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1764243%2F4771a2b9-3594-480e-8c7c-f0b40011a015.jpg</url>
      <title>Forem: Basil Ahamed</title>
      <link>https://forem.com/basil_ahamed</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/basil_ahamed"/>
    <language>en</language>
    <item>
      <title>Prompt Orchestration Markup Language (POML): Future of Structured Prompt Engineering 2025</title>
      <dc:creator>Basil Ahamed</dc:creator>
      <pubDate>Wed, 20 Aug 2025 05:58:14 +0000</pubDate>
      <link>https://forem.com/basil_ahamed/prompt-orchestration-markup-language-poml-the-future-of-structured-prompt-engineering-4mf4</link>
      <guid>https://forem.com/basil_ahamed/prompt-orchestration-markup-language-poml-the-future-of-structured-prompt-engineering-4mf4</guid>
      <description>&lt;p&gt;&lt;strong&gt;Author&lt;/strong&gt;: Basil Ahamed &lt;br&gt;
&lt;strong&gt;Role&lt;/strong&gt;: Senior Software Engineer | Automation Specialist | Tech Educator &lt;br&gt;
&lt;strong&gt;Published on&lt;/strong&gt;: 20-08-2025 &lt;br&gt;
&lt;strong&gt;Tags&lt;/strong&gt;: #LLM #PromptEngineering #POML #AI #OpenSource #Microsoft&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Introduction
&lt;/h2&gt;

&lt;p&gt;Prompt engineering has become a cornerstone of working with Large Language Models (LLMs). Yet, as the complexity of tasks grows, so do the challenges: messy formatting, brittle templates, and poor reusability. Enter &lt;strong&gt;POML (Prompt Orchestration Markup Language)&lt;/strong&gt;—an open-source initiative by Microsoft that brings structure, modularity, and clarity to prompt development.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;🧩 What is POML?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;POML&lt;/strong&gt; is a markup language designed to orchestrate prompts for LLMs using a clean, semantic, and extensible syntax. Inspired by HTML/XML, it allows developers to define roles, tasks, examples, data, and output formats in a readable and maintainable way.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔧 Why POML?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured Prompting&lt;/strong&gt;: No more tangled strings—use semantic tags.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular Design&lt;/strong&gt;: Reuse components across prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integration&lt;/strong&gt;: Embed tables, images, and documents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Templating Engine&lt;/strong&gt;: Dynamic prompt generation with variables and logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tooling Support&lt;/strong&gt;: VS Code extension, SDKs for Python/Node.js.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛠️ Core Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Semantic Tags&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;role&amp;gt;&lt;/span&gt;You are a helpful assistant.&lt;span class="nt"&gt;&amp;lt;/role&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;task&amp;gt;&lt;/span&gt;Summarize the following document.&lt;span class="nt"&gt;&amp;lt;/task&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;document&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"report.pdf"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;output-format&amp;gt;&lt;/span&gt;Bullet points&lt;span class="nt"&gt;&amp;lt;/output-format&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. &lt;strong&gt;Templating Engine&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;let&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"topic"&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;"Photosynthesis"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;task&amp;gt;&lt;/span&gt;Explain {{ topic }} to a 10-year-old.&lt;span class="nt"&gt;&amp;lt;/task&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;{{ variable }}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;for&amp;gt;&lt;/code&gt; loops&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;if&amp;gt;&lt;/code&gt; conditionals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Styling Layer&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;stylesheet&amp;gt;&lt;/span&gt;
  task {
    font-weight: bold;
    color: blue;
  }
&lt;span class="nt"&gt;&amp;lt;/stylesheet&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. &lt;strong&gt;Data Embedding&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;table&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"data.csv"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"diagram.png"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"Photosynthesis Diagram"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧪 Tooling Ecosystem
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VS Code Extension&lt;/strong&gt;: Syntax highlighting, live preview, diagnostics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python SDK&lt;/strong&gt;: Render and test prompts programmatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node.js SDK&lt;/strong&gt;: Integrate with web apps and automation pipelines.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📈 Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Enterprise Prompt Management&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI-Powered Chatbots&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Educational Content Generation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Report Summarization&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-modal Prompting (text + image + data)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 Why It Matters
&lt;/h2&gt;

&lt;p&gt;POML is more than just a markup language—it's a paradigm shift in how we think about prompt engineering. It brings the rigor of software development to the art of prompt crafting, making it scalable, testable, and collaborative.&lt;/p&gt;




&lt;h2&gt;
  
  
  📚 Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/microsoft/poml" rel="noopener noreferrer"&gt;POML GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;VS Code Extension&lt;/li&gt;
&lt;li&gt;Microsoft's Announcement Blog&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✍️ Final Thoughts
&lt;/h2&gt;

&lt;p&gt;As someone deeply involved in automation and AI, I see POML as a game-changer. It empowers developers to build smarter, cleaner, and more reliable LLM applications. Whether you're a solo developer or part of an enterprise team, it's time to give your prompts the structure they deserve. Lets discuss, connect via &lt;a href="https://www.linkedin.com/in/basil-ahamed/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;




</description>
      <category>promptengineering</category>
      <category>ai</category>
      <category>productivity</category>
      <category>automation</category>
    </item>
    <item>
      <title>Automate Your Web Tasks with a Browser AI Agent</title>
      <dc:creator>Basil Ahamed</dc:creator>
      <pubDate>Fri, 07 Feb 2025 11:12:41 +0000</pubDate>
      <link>https://forem.com/basil_ahamed/automate-your-web-tasks-with-a-browser-ai-agent-a61</link>
      <guid>https://forem.com/basil_ahamed/automate-your-web-tasks-with-a-browser-ai-agent-a61</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;In today's fast-paced digital world, automation is key to efficiency. From placing orders on e-commerce platforms to job hunting, automating these repetitive tasks can save both time and effort. In this guide, we'll walk through creating a Browser AI Agent that can perform tasks like applying for jobs, filling out forms, and even automating purchases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overview of a Browser AI Agent
&lt;/h3&gt;

&lt;p&gt;A Browser AI Agent automates web-based operations such as browsing, form submissions, and data extraction without manual intervention. You don’t need extensive coding knowledge—just configure the agent and provide simple instructions to perform tasks automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install the Required Tools
&lt;/h3&gt;

&lt;p&gt;Before getting started, ensure that Python is installed on your system. Then, follow these steps:&lt;/p&gt;

&lt;h4&gt;
  
  
  1.1 Install Browser-Use
&lt;/h4&gt;

&lt;p&gt;This open-source tool connects AI models with the browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;browser-use
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  1.2 Install Playwright
&lt;/h4&gt;

&lt;p&gt;Playwright enables automation by allowing the AI to navigate and interact with websites.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;playwright
playwright &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  1.3 Install Web UI
&lt;/h4&gt;

&lt;p&gt;Web UI simplifies interaction with the browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/browser-use/web-ui.git
&lt;span class="nb"&gt;cd &lt;/span&gt;web-ui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Set Up Python Environment
&lt;/h3&gt;

&lt;p&gt;Navigate to the Web UI folder and set up a virtual environment.&lt;/p&gt;

&lt;h4&gt;
  
  
  2.1 Install UV
&lt;/h4&gt;

&lt;p&gt;UV is used for managing the Python environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows&lt;/span&gt;
powershell &lt;span class="nt"&gt;-ExecutionPolicy&lt;/span&gt; ByPass &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"irm https://astral.sh/uv/install.ps1 | iex"&lt;/span&gt;

&lt;span class="c"&gt;# macOS/Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2.2 Activate Virtual Environment
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv venv &lt;span class="nt"&gt;--python&lt;/span&gt; 3.11
.venv&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\a&lt;/span&gt;ctivate  &lt;span class="c"&gt;# Windows&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2.3 Install Dependencies
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, start the Web UI server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python webui.py &lt;span class="nt"&gt;--ip&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;--port&lt;/span&gt; 7788
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This launches a local server where you can configure your AI agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Configure the AI Model
&lt;/h3&gt;

&lt;p&gt;Choose an LLM provider such as OpenAI, Gemini, or DeepSeek. Obtain an API key and configure it within the agent’s settings, adjusting parameters like temperature for response randomness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Run Your First Task
&lt;/h3&gt;

&lt;p&gt;Let’s create a prompt to search Google for “Agentic AI” and return the first URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Prompt: &lt;span class="s2"&gt;"Go to google.com and search for 'Agentic AI'. Click the first result and return the URL."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the agent, and it will execute the task automatically, displaying the result in the terminal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F63hrfcji9z9m78tvzokh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F63hrfcji9z9m78tvzokh.png" alt="Browser Agent" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Expand Your Automation
&lt;/h3&gt;

&lt;p&gt;Enhance your AI agent with more complex workflows, such as logging into websites, placing orders, or managing job applications.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Prompt: &lt;span class="s2"&gt;"Go to [e-commerce site], log in, search for a product, add it to the cart, and checkout."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;By setting up a Browser AI Agent, you can automate tedious tasks and streamline your workflow. Whether for job applications, online shopping, or data extraction, the possibilities are endless. Start automating today and boost your productivity!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Automate Google Search with Python Selenium</title>
      <dc:creator>Basil Ahamed</dc:creator>
      <pubDate>Mon, 20 Jan 2025 08:03:53 +0000</pubDate>
      <link>https://forem.com/basil_ahamed/automate-google-search-with-python-selenium-3ade</link>
      <guid>https://forem.com/basil_ahamed/automate-google-search-with-python-selenium-3ade</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;Introduction&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
In today’s digital age, automation is key to streamlining repetitive tasks. One common task that can benefit from automation is performing a Google Image search and extracting links from the search results. In this article, we’ll explore how to automate Google Image searches using Python and Selenium.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.geeksforgeeks.org/selenium-python-tricks" rel="noopener noreferrer"&gt;Selenium&lt;/a&gt; is a popular library for automating web browsers, and we’ll use it to build a Python script that performs &lt;a href="https://www.geeksforgeeks.org/text-searching-in-google-using-selenium-in-python" rel="noopener noreferrer"&gt;Google Image searches&lt;/a&gt; for a given query and extracts the links from the search results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Prerequisites&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Before we dive into the code, make sure you have the following prerequisites in place:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt; You’ll need Python installed on your system.&lt;br&gt;
&lt;strong&gt;Selenium:&lt;/strong&gt; Install the Selenium library using pip: pip install selenium&lt;br&gt;
&lt;strong&gt;Chrome WebDriver:&lt;/strong&gt; Download the Chrome WebDriver for your Chrome browser version. Ensure that the WebDriver executable is in your system’s PATH or provide the path to it in the script.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Full-Code Implementation&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from selenium import webdriver
from selenium.webdriver.common.by import By

class GoogleImageSearch:
    def __init__(self):
        self.driver = webdriver.Chrome()  # Initialize Chrome WebDriver

    def fetch_links_by_search(self, search_query):
        # Navigate to Google Images
        self.driver.get('https://www.google.com/imghp?hl=en')

        # Find the search bar and input the search query
        search_box = self.driver.find_element(By.NAME, "q")
        search_box.send_keys(search_query)
        search_box.submit()

        # Wait for search results to load (add any additional wait if required)
        self.driver.implicitly_wait(5)

        # Find all &amp;lt;a&amp;gt; elements with href containing "/imgres" (image result links)
        links = self.driver.find_elements(By.XPATH, "//a[contains(@href, '/imgres')]")

        # Extract and print the links
        for link in links:
            href_value = link.get_attribute('href')
            print(href_value)

        # Close the WebDriver
        self.driver.quit()

# Example usage: 
if __name__ == "__main__":
    search_query = "tech" 
    google_image_search = GoogleImageSearch()
    google_image_search.fetch_links_by_search(search_query)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Running the Script&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
To use the script, change the search_query variable to your desired search term, and execute the script. It will open a Chrome browser, perform the Google Image search, and print the links to the console.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Conclusion&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Automating Google Image searches with Python and &lt;a href="https://www.guvi.com/blog/becoming-a-selenium-expert/" rel="noopener noreferrer"&gt;Selenium&lt;/a&gt; can save you time and effort when you need to extract links from search results. This article provided you with the code and explained its functionality. With this knowledge, you can build upon it and adapt it to your specific automation needs. Also compare images through python selenium with &lt;a href="https://dev.to/basil_ahamed/visual-regression-testing-with-selenium-and-visual-comparison-2k6c"&gt;visual-comparison&lt;/a&gt; module.&lt;/p&gt;

</description>
      <category>selenium</category>
      <category>python</category>
      <category>webautomation</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Visual Regression Testing with Selenium and Visual-Comparison</title>
      <dc:creator>Basil Ahamed</dc:creator>
      <pubDate>Thu, 11 Jul 2024 07:27:00 +0000</pubDate>
      <link>https://forem.com/basil_ahamed/visual-regression-testing-with-selenium-and-visual-comparison-2k6c</link>
      <guid>https://forem.com/basil_ahamed/visual-regression-testing-with-selenium-and-visual-comparison-2k6c</guid>
      <description>&lt;p&gt;Visual testing is crucial for ensuring that a web application’s appearance remains consistent and visually correct after updates or changes. This blog will guide you through using &lt;a href="https://www.guvi.com/blog/mastering-web-automation/" rel="noopener noreferrer"&gt;Selenium for browser automation&lt;/a&gt; and a custom image comparison utility for performing visual tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Visual testing helps detect unintended changes in the UI by comparing screenshots taken at different points in time. In this guide, we will use Selenium to automate web interactions and take screenshots, and then compare these screenshots using an image comparison utility known as visual-comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Prerequisites&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before we start, make sure you have the following installed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.x&lt;/li&gt;
&lt;li&gt;Selenium (pip install selenium)&lt;/li&gt;
&lt;li&gt;Visual Comparison(pip install visual-comparison)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Setting Up the Environment&lt;/strong&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Install Selenium:&lt;br&gt;
&lt;code&gt;pip install selenium&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Install Visual-Comparison Package:&lt;br&gt;
&lt;code&gt;pip install visual-comparison&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Writing the Selenium Script&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let’s write a Selenium script that logs into a sample website, takes a screenshot, and compares it with a baseline image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Initialize WebDriver and Open the Webpage&lt;/strong&gt;&lt;br&gt;
First, initialize the WebDriver and navigate to the target webpage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from selenium import webdriver
from selenium.webdriver.common.by import By

# Initialize the WebDriver
driver = webdriver.Chrome()

# Open the target webpage
driver.get("https://www.saucedemo.com/v1/")
driver.maximize_window()
driver.implicitly_wait(5)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Perform Login&lt;/strong&gt;&lt;br&gt;
Next, log into the website by filling in the username and password fields and clicking the login button. Currently visual testing the dashboard page after login. You can modify this code based on your requirements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Login to the website 
username = driver.find_element(By.ID, "user-name")
username.send_keys("standard_user")

password = driver.find_element(By.ID, "password")
password.send_keys("secret_sauce")

# Click on the login button
login_button = driver.find_element(By.ID, "login-button")
login_button.click()`

**Step 3: Take a Screenshot**
After logging in, take a screenshot of the page and save it:
# Take a screenshot after login to visualize the changes
actual_image_path = "actual.png"
driver.save_screenshot(actual_image_path)

# Close the browser
driver.quit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Compare Images&lt;/strong&gt;&lt;br&gt;
Use your custom image comparison utility to compare the baseline image (expected.png) with the newly taken screenshot (actual.png):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from visual_comparison.utils import ImageComparisonUtil

# Load the expected image and the actual screenshot
expected_image_path = "expected.png"
expected_image = ImageComparisonUtil.read_image(expected_image_path)
actual_image = ImageComparisonUtil.read_image(actual_image_path)

# Choose the path to save the comparison result
result_destination = "result.png"

# Compare the images and save the result
similarity_index = ImageComparisonUtil.compare_images(expected_image, actual_image, result_destination)
print("Similarity Index:", similarity_index)

# Asserting both images
match_result = ImageComparisonUtil.check_match(expected_image_path, actual_image_path)
assert match_result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Complete Script&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here is the complete script combining all the steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"""
This python script compares the baseline image with the actual image.
After any source code modification, the visual changes are compared easily through this script.
"""
from selenium import webdriver
from selenium.webdriver.common.by import By
from visual_comparison.utils import ImageComparisonUtil

# Initialize the WebDriver
driver = webdriver.Chrome()

# Open the target webpage
driver.get("https://www.saucedemo.com/v1/")
driver.maximize_window()
driver.implicitly_wait(5)

# Login to the website 
username = driver.find_element(By.ID, "user-name")
username.send_keys("standard_user")

password = driver.find_element(By.ID, "password")
password.send_keys("secret_sauce")

# Click on the login button
login_button = driver.find_element(By.ID, "login-button")
login_button.click()

# Take a screenshot after login to visualize the changes
actual_image_path = "actual.png"
expected_image_path = "expected.png"
driver.save_screenshot(actual_image_path)

# Close the browser
driver.quit()

# Load the expected image and the actual screenshot
expected_image = ImageComparisonUtil.read_image(expected_image_path)
actual_image = ImageComparisonUtil.read_image(actual_image_path)

# Choose the path to save the comparison result
result_destination = "result.png"

# Compare the images and save the result
similarity_index = ImageComparisonUtil.compare_images(expected_image, actual_image, result_destination)
print("Similarity Index:", similarity_index)

# Asserting both images
match_result = ImageComparisonUtil.check_match(expected_image_path, actual_image_path)
assert match_result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Output
Similarity Index: 1.0 (i.e.No Visual Changes)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: Create a baseline image/expected image before executing the above script. Refer to this repository &lt;a href="https://github.com/BASILAHAMED/visual-testing.git" rel="noopener noreferrer"&gt;GitHub Link&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This guide demonstrates how to perform visual testing using Selenium for &lt;a href="https://www.guvi.com/blog/mastering-web-automation/" rel="noopener noreferrer"&gt;web automation&lt;/a&gt; and visual-comparison package to compare screenshots. By automating visual tests, you can ensure that UI changes do not introduce any visual flaws, thus maintaining a consistent user experience. Also follow essential steps to &lt;a href="https://www.guvi.com/blog/becoming-a-selenium-expert/" rel="noopener noreferrer"&gt;master selenium web automation&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>selenium</category>
      <category>python</category>
      <category>softwaredevelopment</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
