Forem: Basil Ahamed

Prompt Orchestration Markup Language (POML): Future of Structured Prompt Engineering 2025

Basil Ahamed — Wed, 20 Aug 2025 05:58:14 +0000

Author: Basil Ahamed
Role: Senior Software Engineer | Automation Specialist | Tech Educator
Published on: 20-08-2025
Tags: #LLM #PromptEngineering #POML #AI #OpenSource #Microsoft

🚀 Introduction

Prompt engineering has become a cornerstone of working with Large Language Models (LLMs). Yet, as the complexity of tasks grows, so do the challenges: messy formatting, brittle templates, and poor reusability. Enter POML (Prompt Orchestration Markup Language)—an open-source initiative by Microsoft that brings structure, modularity, and clarity to prompt development.

🧩 What is POML?

POML is a markup language designed to orchestrate prompts for LLMs using a clean, semantic, and extensible syntax. Inspired by HTML/XML, it allows developers to define roles, tasks, examples, data, and output formats in a readable and maintainable way.

🔧 Why POML?

Structured Prompting: No more tangled strings—use semantic tags.
Modular Design: Reuse components across prompts.
Data Integration: Embed tables, images, and documents.
Templating Engine: Dynamic prompt generation with variables and logic.
Tooling Support: VS Code extension, SDKs for Python/Node.js.

🛠️ Core Features

1. Semantic Tags

<role>You are a helpful assistant.</role>
<task>Summarize the following document.</task>
<document src="report.pdf" />
<output-format>Bullet points</output-format>

2. Templating Engine

<let name="topic" value="Photosynthesis" />
<task>Explain {{ topic }} to a 10-year-old.</task>

Supports:

{{ variable }}
<for> loops
<if> conditionals

3. Styling Layer

<stylesheet>
  task {
    font-weight: bold;
    color: blue;
  }
</stylesheet>

4. Data Embedding

<table src="data.csv" />
<img src="diagram.png" alt="Photosynthesis Diagram" />

🧪 Tooling Ecosystem

VS Code Extension: Syntax highlighting, live preview, diagnostics.
Python SDK: Render and test prompts programmatically.
Node.js SDK: Integrate with web apps and automation pipelines.

📈 Use Cases

Enterprise Prompt Management
AI-Powered Chatbots
Educational Content Generation
Automated Report Summarization
Multi-modal Prompting (text + image + data)

🧠 Why It Matters

POML is more than just a markup language—it's a paradigm shift in how we think about prompt engineering. It brings the rigor of software development to the art of prompt crafting, making it scalable, testable, and collaborative.

📚 Resources

POML GitHub Repository
VS Code Extension
Microsoft's Announcement Blog

✍️ Final Thoughts

As someone deeply involved in automation and AI, I see POML as a game-changer. It empowers developers to build smarter, cleaner, and more reliable LLM applications. Whether you're a solo developer or part of an enterprise team, it's time to give your prompts the structure they deserve. Lets discuss, connect via LinkedIn.

Automate Your Web Tasks with a Browser AI Agent

Basil Ahamed — Fri, 07 Feb 2025 11:12:41 +0000

Introduction

In today's fast-paced digital world, automation is key to efficiency. From placing orders on e-commerce platforms to job hunting, automating these repetitive tasks can save both time and effort. In this guide, we'll walk through creating a Browser AI Agent that can perform tasks like applying for jobs, filling out forms, and even automating purchases.

Overview of a Browser AI Agent

A Browser AI Agent automates web-based operations such as browsing, form submissions, and data extraction without manual intervention. You don’t need extensive coding knowledge—just configure the agent and provide simple instructions to perform tasks automatically.

Step 1: Install the Required Tools

Before getting started, ensure that Python is installed on your system. Then, follow these steps:

1.1 Install Browser-Use

This open-source tool connects AI models with the browser.

pip install browser-use

1.2 Install Playwright

Playwright enables automation by allowing the AI to navigate and interact with websites.

pip install playwright
playwright install

1.3 Install Web UI

Web UI simplifies interaction with the browser.

git clone https://github.com/browser-use/web-ui.git
cd web-ui

Step 2: Set Up Python Environment

Navigate to the Web UI folder and set up a virtual environment.

2.1 Install UV

UV is used for managing the Python environment.

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

2.2 Activate Virtual Environment

uv venv --python 3.11
.venv\Scripts\activate  # Windows

2.3 Install Dependencies

uv pip install -r requirements.txt

Now, start the Web UI server:

python webui.py --ip 127.0.0.1 --port 7788

This launches a local server where you can configure your AI agent.

Step 3: Configure the AI Model

Choose an LLM provider such as OpenAI, Gemini, or DeepSeek. Obtain an API key and configure it within the agent’s settings, adjusting parameters like temperature for response randomness.

Step 4: Run Your First Task

Let’s create a prompt to search Google for “Agentic AI” and return the first URL:

Prompt: "Go to google.com and search for 'Agentic AI'. Click the first result and return the URL."

Run the agent, and it will execute the task automatically, displaying the result in the terminal.

Step 5: Expand Your Automation

Enhance your AI agent with more complex workflows, such as logging into websites, placing orders, or managing job applications.

Example:

Prompt: "Go to [e-commerce site], log in, search for a product, add it to the cart, and checkout."

Conclusion

By setting up a Browser AI Agent, you can automate tedious tasks and streamline your workflow. Whether for job applications, online shopping, or data extraction, the possibilities are endless. Start automating today and boost your productivity!

Automate Google Search with Python Selenium

Basil Ahamed — Mon, 20 Jan 2025 08:03:53 +0000

Introduction
In today’s digital age, automation is key to streamlining repetitive tasks. One common task that can benefit from automation is performing a Google Image search and extracting links from the search results. In this article, we’ll explore how to automate Google Image searches using Python and Selenium.

Selenium is a popular library for automating web browsers, and we’ll use it to build a Python script that performs Google Image searches for a given query and extracts the links from the search results.

Prerequisites
Before we dive into the code, make sure you have the following prerequisites in place:

Python: You’ll need Python installed on your system.
Selenium: Install the Selenium library using pip: pip install selenium
Chrome WebDriver: Download the Chrome WebDriver for your Chrome browser version. Ensure that the WebDriver executable is in your system’s PATH or provide the path to it in the script.

Full-Code Implementation

from selenium import webdriver
from selenium.webdriver.common.by import By

class GoogleImageSearch:
    def __init__(self):
        self.driver = webdriver.Chrome()  # Initialize Chrome WebDriver

    def fetch_links_by_search(self, search_query):
        # Navigate to Google Images
        self.driver.get('https://www.google.com/imghp?hl=en')

        # Find the search bar and input the search query
        search_box = self.driver.find_element(By.NAME, "q")
        search_box.send_keys(search_query)
        search_box.submit()

        # Wait for search results to load (add any additional wait if required)
        self.driver.implicitly_wait(5)

        # Find all <a> elements with href containing "/imgres" (image result links)
        links = self.driver.find_elements(By.XPATH, "//a[contains(@href, '/imgres')]")

        # Extract and print the links
        for link in links:
            href_value = link.get_attribute('href')
            print(href_value)

        # Close the WebDriver
        self.driver.quit()

# Example usage: 
if __name__ == "__main__":
    search_query = "tech" 
    google_image_search = GoogleImageSearch()
    google_image_search.fetch_links_by_search(search_query)

Running the Script
To use the script, change the search_query variable to your desired search term, and execute the script. It will open a Chrome browser, perform the Google Image search, and print the links to the console.

Conclusion
Automating Google Image searches with Python and Selenium can save you time and effort when you need to extract links from search results. This article provided you with the code and explained its functionality. With this knowledge, you can build upon it and adapt it to your specific automation needs. Also compare images through python selenium with visual-comparison module.

Visual Regression Testing with Selenium and Visual-Comparison

Basil Ahamed — Thu, 11 Jul 2024 07:27:00 +0000

Visual testing is crucial for ensuring that a web application’s appearance remains consistent and visually correct after updates or changes. This blog will guide you through using Selenium for browser automation and a custom image comparison utility for performing visual tests.

Introduction

Visual testing helps detect unintended changes in the UI by comparing screenshots taken at different points in time. In this guide, we will use Selenium to automate web interactions and take screenshots, and then compare these screenshots using an image comparison utility known as visual-comparison.

Prerequisites

Before we start, make sure you have the following installed:

Python 3.x
Selenium (pip install selenium)
Visual Comparison(pip install visual-comparison)

Setting Up the Environment

Install Selenium:
pip install selenium
Install Visual-Comparison Package:
pip install visual-comparison

Writing the Selenium Script

Let’s write a Selenium script that logs into a sample website, takes a screenshot, and compares it with a baseline image.

Step 1: Initialize WebDriver and Open the Webpage
First, initialize the WebDriver and navigate to the target webpage:

from selenium import webdriver
from selenium.webdriver.common.by import By

# Initialize the WebDriver
driver = webdriver.Chrome()

# Open the target webpage
driver.get("https://www.saucedemo.com/v1/")
driver.maximize_window()
driver.implicitly_wait(5)

Step 2: Perform Login
Next, log into the website by filling in the username and password fields and clicking the login button. Currently visual testing the dashboard page after login. You can modify this code based on your requirements:

# Login to the website 
username = driver.find_element(By.ID, "user-name")
username.send_keys("standard_user")

password = driver.find_element(By.ID, "password")
password.send_keys("secret_sauce")

# Click on the login button
login_button = driver.find_element(By.ID, "login-button")
login_button.click()`

**Step 3: Take a Screenshot**
After logging in, take a screenshot of the page and save it:
# Take a screenshot after login to visualize the changes
actual_image_path = "actual.png"
driver.save_screenshot(actual_image_path)

# Close the browser
driver.quit()

Step 4: Compare Images
Use your custom image comparison utility to compare the baseline image (expected.png) with the newly taken screenshot (actual.png):

from visual_comparison.utils import ImageComparisonUtil

# Load the expected image and the actual screenshot
expected_image_path = "expected.png"
expected_image = ImageComparisonUtil.read_image(expected_image_path)
actual_image = ImageComparisonUtil.read_image(actual_image_path)

# Choose the path to save the comparison result
result_destination = "result.png"

# Compare the images and save the result
similarity_index = ImageComparisonUtil.compare_images(expected_image, actual_image, result_destination)
print("Similarity Index:", similarity_index)

# Asserting both images
match_result = ImageComparisonUtil.check_match(expected_image_path, actual_image_path)
assert match_result

Complete Script

Here is the complete script combining all the steps:

"""
This python script compares the baseline image with the actual image.
After any source code modification, the visual changes are compared easily through this script.
"""
from selenium import webdriver
from selenium.webdriver.common.by import By
from visual_comparison.utils import ImageComparisonUtil

# Initialize the WebDriver
driver = webdriver.Chrome()

# Open the target webpage
driver.get("https://www.saucedemo.com/v1/")
driver.maximize_window()
driver.implicitly_wait(5)

# Login to the website 
username = driver.find_element(By.ID, "user-name")
username.send_keys("standard_user")

password = driver.find_element(By.ID, "password")
password.send_keys("secret_sauce")

# Click on the login button
login_button = driver.find_element(By.ID, "login-button")
login_button.click()

# Take a screenshot after login to visualize the changes
actual_image_path = "actual.png"
expected_image_path = "expected.png"
driver.save_screenshot(actual_image_path)

# Close the browser
driver.quit()

# Load the expected image and the actual screenshot
expected_image = ImageComparisonUtil.read_image(expected_image_path)
actual_image = ImageComparisonUtil.read_image(actual_image_path)

# Choose the path to save the comparison result
result_destination = "result.png"

# Compare the images and save the result
similarity_index = ImageComparisonUtil.compare_images(expected_image, actual_image, result_destination)
print("Similarity Index:", similarity_index)

# Asserting both images
match_result = ImageComparisonUtil.check_match(expected_image_path, actual_image_path)
assert match_result

Output
Similarity Index: 1.0 (i.e.No Visual Changes)

Note: Create a baseline image/expected image before executing the above script. Refer to this repository GitHub Link

Conclusion

This guide demonstrates how to perform visual testing using Selenium for web automation and visual-comparison package to compare screenshots. By automating visual tests, you can ensure that UI changes do not introduce any visual flaws, thus maintaining a consistent user experience. Also follow essential steps to master selenium web automation.