Forem: Rahul @codingkite

📘 Prompt Engineering: Mastering the Art of Talking to AI

Rahul @codingkite — Sat, 19 Apr 2025 03:48:43 +0000

🧭 Topics Covered:

🗑️ GIGO: Garbage In, Garbage Out
✍️ What is Prompt Engineering?
🧠 How Prompts Influence LLM Output
🧰 Types & Styles of Prompts
🧪 Prompting Techniques (Zero-shot, Few-shot, CoT, etc.)
🧙‍♂️ Role, Persona & Contextual Prompting
🧩 Which Prompt Technique to Choose?
🔐 Prompt Templates & Security

🎬 Let’s Begin with a Story…

Imagine you walk into a pizza shop 🍕 and say,

“Give me food.”

The waiter looks puzzled.

“Uhh… what kind? Spicy? Veg? Size? Cheese?”

Now instead, you say,

“Can I get a large margherita pizza with extra cheese and jalapeños?”

Boom 💥—now you're served exactly what you want. AI models work the same way. The better you explain your needs (prompt), the better the result (output)!

And this is where Prompt Engineering comes in.

🗑️ GIGO – Garbage In, Garbage Out

Have you ever typed something into ChatGPT and got a weird or useless response?

Well, that’s GIGO at play:

Garbage Input = Garbage Output 💩

AI is like a mirror—it reflects what you give it. Messy or vague input leads to confusing results. So crafting the right input is everything.

✍️ What is Prompt Engineering?

Prompt Engineering is the skill of writing smart instructions (called prompts) to get smart results from an AI model 🤖.

Later in this blog we will learn about more technical stuff, like different prompt formats and techniques.

🤔 What is a Prompt?

A prompt is the initial instruction or input you give to the AI to perform a task.

But here's a catch…

If you ask AI to generate a prompt, and then feed that prompt back to the AI, the results are often not great 😬. Why?

Probabl because most LLMs (like GPT or Gemini) were trained on human-written content—not AI-generated ones.

🔑 Takeaway: Always prefer writing your own prompts over relying on AI-generated ones.

🧠 System Prompts

System prompts help set the initial context for the conversation.

As developers, we can’t control user queries, but we can control the system prompt to steer the AI’s tone, behavior, or role 🎛️.

“You are a helpful travel assistant.” – That’s a system prompt.

Also, keep in mind:

LLMs charge you based on both input and output tokens 💰.Checkout the pricing page of the model that you are using.
Tokens should not be considered the same as words
Repeating the same system prompt? It might be cached and priced differently

✨ Prompt Templates – Why Bother Using Them?

Imagine sending raw user input straight to the AI. That’s risky!

🔒The Problem: Prompt Injection Attacks

One of the biggest vulnerabilities in LLMs today is prompt injection — where users sneak in inputs that hijack or manipulate the AI’s behavior.
Think of it like someone whispering fake orders to your assistant while you’re not looking 😅

🛡️ The Fix: Prompt Templates

Prompt templates let you structure conversations into clear roles:

System – instructions from the developer
User – the actual user input
Assistant – the AI’s response

This layered approach (like OpenAI’s ChatML format) tells the model who is saying what, and where one speaker stops and another begins. That boundary is key 🔐

This makes it much harder for malicious input to confuse or trick the model.

🧱 Why This Matters

Prompt templates reduce ambiguity, helping LLMs interpret input more accurately
They separate trusted developer instructions from unpredictable user text
Over time, this structure can help fully prevent injection attacks

Even when you give a simple instruction to an LLM, behind the scenes, it’s wrapped in a structured template — marking your role, your intent, and your context.

📐 Prompt Formats (Styles)

Here are a few popular formats used in different LLMs:

🦙 Alpaca Prompt

### Instruction:
Do X

### Input:
With Y

### Response:
Result

  Instruction: For the given number by user perform arthematic operation
  Input: what is 2 + 2
  Response:
  ## the LLM will predict the next set of token and return 4.

🦙 LLaMA-2 Format ( used by LLaMA-2) :

      <s> 
        [INST] 
          <<SYS>>
            {{ system_prompt }}
          <</SYS>>
            {{ user_message_1 }}
        [/INST]
        {{ model_answer_1 }}
      </s>
      <s>
        [INST]
          {{ user_message_2 }} 
        [/INST] 
      </s>

🦙 LLaMA-3 Format ( used by LLaMA-3)

      <|begin_of_text|>
        <|start_header_id|>
          system
        <|end_header_id|>

      You are a helpful AI assistant for travel tips and recommendations

      <|eot_id|>
        <|start_header_id|>
          user
        <|end_header_id|>

      What can you help me with?

      <|eot_id|>
        <|start_header_id|>
          assistant
        <|end_header_id|>

💬 ChatML Format (used by OpenAI)

{ "role": "system", "content": "You are a helpful assistant" }
{ "role": "user", "content": "What is LRU cache?" }
{ "role": "assistant", "content": "LRU stands for..." }

🛠️ Prompting Techniques

Let’s explore the ways you can craft prompts:

1. 🕵️ Zero-Shot Prompting

Just ask the question without giving any examples.

“Write a cold email introducing our new app.”

AI uses its existing knowledge. Good for quick tasks. No examples needed.

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI()

api_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role":"system", "content":"what is 5*45+34%3*2"}
    ]   
)

print("AI response -> ",api_response.choices[0].message.content)

2. ✌️ Few-Shot Prompting

Here, you give a few examples first, then ask for a new answer.

Helps improve accuracy when the task is nuanced or requires understanding a pattern.

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI()

system_prompt = '''
                    You are an AI assistant which helper user in solving mathematical question.
                    Any question other than the mathematical question should not be answered by you.

                    Example:
                    Input: 2+2
                    Output: 2+2 is 4 which is calculate by adding 2 + 2

                    Input: 3*0+5
                    Output: 3*0+5 is 5. As per rule we first multiply and then add. So 3*0 is 0 and 0+5 is 5 which is calculated by first multiplying 3 with 0 and then adding the result with 5

                    Input: why is sky blue?
                    Output: Is this maths query? I am an mathematic assistant and can help you in mathematics only.

                '''
api_response_1 = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=200, ## adjust to control pricing by limiting the token count
    temperature=0.7, ## adjust temperature to add more creativity/randmoness to output
    messages=[
        {"role":"system","content":system_prompt},
        {"role":"system", "content":"what is 5*45+34%3*2"}
    ]   
)
print("AI response 1 -> ",api_response_1.choices[0].message.content)

api_response_2 = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role":"system","content":system_prompt},
        {"role":"system", "content":"what's the speed at which cheetah can run"}
    ]   
)
print("AI response 2 -> ",api_response_2.choices[0].message.content)

3. 🔗 Chain-of-Thought (CoT)

Here, we ask the model to explain step-by-step before giving the answer.
The model is encroughed to break down reasoning step by step before arriving at an answer.

“Let’s break it down: First we..., then we...”

This improves accuracy and makes AI reasoning more transparent 🧠

import os
import json
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI()

api_response = None # Initialize response variable

system_prompt = '''
    You are an AI assistant who is expert in breaking down complex problems
    and then resolve the user query.

    For the given user input, analyse the input and break down the problem step by step.
    Atleast think 5-6 steps on how to resolve the problem before solving the problem.

    The steps are you get "input", you "analyse", you "think" several times and then return the output with explanation.
    Finally you validate the ouput before giving the final "output".

    Follow these step in sequence "analsyse","think","output","validate" and finally "output".

    Rules:
        1. Follow the strict JSON ouput as per Output schema.
        2. Always perform one step at a time and wait for next input
        3. Carefully analyze the user query.

    Output Fromat:
        {{step :"output",content:"string"}}

    Example:
    Input : what is 2+2?
    Output : {{step:"analyse",content:"The user is intrresetd in basic maths query and he is asking basic arthematice operation"}}   
    Output : {{step:"think",content:"To perform addition one must got from left to rigth and add all the operands"}}
    Output : {{step:"output",content:"4"}}
    Output : {{step:"validate",content:"Seems like 4 is correct as 2+2 adds up to 4"}}
    Output : {{step:"output",content:"2 + 2 = 4 and that is calculated by adding all numbers."}}

'''

api_response= None
messages=[
    {"role":"system", "content":system_prompt},
]

user_query = input("> ")
messages.append({"role":"user", "content":user_query},)

while True:
    api_response = client.chat.completions.create(
        model="gpt-4o", 
        response_format={"type":"json_object"},
        messages=messages
    )

    parsed_response = json.loads(api_response.choices[0].message.content)
    messages.append({"role":"assistant", "content":json.dumps(parsed_response)})

    if parsed_response.get('step') != 'output':
        print("each step -> ",parsed_response.get('content'))
        continue

    print(f'Parsed Respose : {parsed_response.get('content')}')
    break

4. 🔁 Self-Consistency Prompting

Run the same prompt multiple times. Pick the most common or logical answer.

Just like asking 5 friends and trusting the one most agree on!

5. 🧑‍🎓 Persona-Based Prompting

Give the AI a personality or an profession.

“You are a doctor giving tips to new parents.”

It shapes how the AI responds!

6. 🎭 Role-Playing Prompt

“You are an expert coding tutor for beginners.”

Let the AI act in character 🎬 and adapt to the role you've assigned.

As we go deeper, we’ll explore advanced prompt engineering strategies like:

1.🔍 Contextual Prompting

2.🖼️ Multimodal Prompting

These techniques go beyond just writing smart instructions — they need an orchestrator behind the scenes.

Think of the orchestrator as a conductor, managing how data flows into and out of the LLM for maximum accuracy and relevance.
These technique are used in apps that require deep context like chatbots, search assistants, etc.

To make this work, we’ll integrate:

Vector Databases – to provide semantic context and memory
Graph Databases – to model relationships between entities
PostgreSQL – for handling structured data
Tool / Function Calling – so the model can dynamically execute actions in real-time

We’ll learn how to stitch these together to build powerful, context-aware, multi-modal AI systems in upcoming blog.

🧪 How to Choose the Right Prompt Technique?

Here’s the secret sauce:

👉 Experiment. Track. Improve.

Observe how your app responds to real user queries
Mix and match techniques like:
- CoT + Role-Play + Persona 🤯
Use observability tools to capture and analyze bad vs. good outputs to tweak your prompt technique accordingly.
Keep refining over time

🎯 Final Thoughts

Prompt Engineering isn’t just about giving commands to AI.

It’s about speaking its language clearly and cleverly 💡

If you're building AI tools, learning how to write great prompts will make your results 10x better.

Great prompts = Great products 🚀

Generative AI Jargons You Should Know

Rahul @codingkite — Sun, 13 Apr 2025 10:29:22 +0000

🤖 Ever Wondered How ChatGPT or Gemini Works?

We all have used AI tools like ChatGPT or Gemini, but have you ever wondered how these tools are able to generate such accurate responses to our queries? 🤔

In this blog, we’ll get an overview of how AI models generate responses — and along the way, we’ll learn some jargon 🧠 that you often see floating around the internet 🌐.

🧩 AI Jargon You’ll Learn in This Blog

Let’s walk through the working of an AI model while exploring these terms:

🔤 LLM
🧠 GPT (Generative Pre-trained Transformer)
⚙️ Transformer
🧱 Tokens
📥 Encoder / Encoding
📍 Positional Encoding
📤 Decoder / Decoding
🧮 Vectors
🔗 Embedding
🧠 Semantic Meaning
👁️ Self Attention
🎯 SoftMax
🧠 Multi-Head Attention
🌡️ Temperature
📅 Knowledge Cutoff
✂️ Tokenization
📚 Vocab Size

🤔 What is AI?

AI is basically an algorithm trained on data. After training, it generates output based on learned weights in response to a user query.

⚙️ Transformer

An AI model is a mathematical structure that has learned patterns from data.
A Transformer is a type of deep learning architecture that’s especially good at understanding sequences like text or audio.
It was introduced by Google in 2017 in a paper called: 📄 Attention is All You Need

🤖 GPT

GPT stands for Generative Pre-trained Transformer.
It’s a pre-trained transformer that generates the next token based on the data it has seen.
🏋️ Training GPT is expensive and time-consuming, so it’s not retrained frequently.
Thus, GPT has a knowledge cutoff — it doesn’t know anything that happened after its last training.

🔍 Fun Fact: ChatGPT combines GPT with an agent system in the background.

🧱 Transformer Model Architecture

At first glance, this architecture might seem scary 😨, but it becomes simple when explained properly. Let’s break it down:

🧾 Input Query & Tokenization

Input: The query provided by the user.
AI models like LLMs don’t understand human languages directly — they understand numbers.
So, we convert the input into numbers — this process is called Tokenization.
- A token is a word or piece of a word.
- The token ID is the numeric representation.
Libraries like tiktoken (used by OpenAI) perform tokenization.

🔤 Vocabulary Size = Total number of unique tokens in the tokenizer's dictionary.

🧰 Tokenizer

A tokenizer is separate from the model.
It has a fixed vocabulary: a mapping of words (tokens) to numbers (token IDs).
The model doesn’t “know” words — it only understands token IDs.
During inference:
- If a new word appears, the tokenizer:
- Breaks it into known sub-tokens, or
- Uses an unknown token placeholder.
So, vocab size = number of unique tokens the model can recognize.

👉 Tokenizer visualizer: tiktokenizer.vercel.app

🧪 Tokenizer Code Example

from tiktoken._educational import *
import tiktoken

encoder = tiktoken.get_encoding("o200k_base")
print("encoder", encoder)
print("vocab size", encoder.n_vocab)

tokens = encoder.encode('Hello World, How are you?')
print("tokens", tokens)

decoded_text = encoder.decode(tokens)
print("decoded_text", decoded_text)

print(tiktoken.encoding_for_model('text-embedding-3-small'))

encoder_for_model = tiktoken.encoding_for_model("gpt-4o")
print("encoder_for_model", encoder_for_model)

# Train a BPE tokeniser on a small amount of text
enc = train_simple_encoding()
print("enc", enc)

# Visualise how the GPT-4 encoder encodes text
another_encoder = SimpleBytePairEncoding.from_tiktoken("cl100k_base")
print(another_encoder.encode("hello world aaaaaaaaaaaa"))

🧠 Embeddings (Vector & Positional)

🧮 Vector Embedding

A vector embedding is the numerical representation of tokens that captures semantic meaning.
Semantic meaning = meaning of the word in a specific context.
- "Reserve Bank" vs "Bank of a river" — same word, different meanings.

✨ Imagine "cat" as a point in space: [0.2, 1.3, -0.5, ...]

These embeddings can be stored in vector databases like:
- 🔍 Pinecone
- 🌌 Chroma DB
- 🧭 Qdrant

🧭 Visualize embeddings here: TensorFlow Projector

🧠 Semantic Example

King ➡️ Queen implies Man ➡️ ?
If "Queen" is 3 units down and "Man" is 4 units left of "King" in vector space — then the model can estimate the missing word using vector math ✨.

📍 Positional Embedding

Tokens alone don’t carry position info.
Two sentences with the same words but different order mean very different things!
- "The cat sat on the mat"
- "The mat sat on the cat"

So, we use positional embeddings to give the model context of order.

🛠️ These modify the original embedding to reflect word position in the sentence.

📥 Embedding Example with OpenAI

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

text = "Eiffel tower is in Paris and is a famous landmark, it is 324 meters tall"

response = client.embeddings.create(
    input=text,
    model="text-embedding-ada-002"
)

print("vector embeddings", response.data[0].embedding)

🔄 Self Attention Mechanism

In simple terms: Tokens can talk to each other and update themselves! 🧠💬

This means when two tokens interact, they can update their vector embeddings based on the sentence and with respect to each other.
Example: "river bank"
- When these 2 tokens talk to each other, they update their embeddings based on the context.
- So even if the word "bank" appears in both "river bank" and "icici bank", and both have:
- the same token,
- the same original vector embedding,
- and the same positional embedding,
- the final vector embedding will differ because of how they interact with other tokens in the sentence.
Tokens update their embeddings based on all tokens in the sentence — not just one or two.
So what does self-attention do?
- It allows tokens to adjust their embeddings based on the other tokens present in the input. 🔁

🧠 Multi-Head Attention — Seeing Things in Many Ways

Think of it like observing the sentence from multiple perspectives at once! 👀🔍

It helps the model focus on different aspects/perspectives of the tokens simultaneously.
At its heart, attention is about weighing the importance of different input tokens when focusing on a specific one.
Multi-head attention runs multiple attention operations in parallel (called “heads”) and combines their results.
This allows the model to learn various types of relationships between tokens — all at the same time. 💡🧩

🔗 Feed Forward

It’s a neural network that processes each token individually after attention is done. 🛠️

The interaction cycle between Multi-Head Attention and Feed Forward is repeated many times to get a rich contextual result.
In GPT (and Transformers in general), after using attention to understand word relationships, the model sends the result through a Feed Forward Neural Network (FFN).

🧠 "Okay, I understand which words matter (thanks to attention)...

Now let me do some math on each word to extract more meaning!" ➗🔍

⚙️ How It Works

The Feed Forward block is just a small, regular neural network applied to each token individually.

Here's what happens:

Each token (word embedding) goes into a Linear layer (fully connected).
Then through a ReLU or GELU activation (for non-linearity).
Then again through another Linear layer.
The output replaces the old one and continues through the Transformer stack.

🧱 Formula:

FFN(token) = Linear -> Activation -> Linear

🧠 Why Is It Important?

Attention handles relationships between words.
Feed Forward handles processing each word by itself, like extracting deeper meanings and features. 🌱

🔁 Simple Analogy

Attention: "Hey 'cat', pay attention to 'mat' and 'sat'!" 🐱🧘🪑

Feed Forward: "Cool. Now let me upgrade 'cat' with better features based on that context." 🚀📈

🔄 Self Attention Mechanism

In simple terms: Tokens can talk to each other and update themselves! 🧠💬

This means when two tokens interact, they can update their vector embeddings based on the sentence and with respect to each other.
Example: "river bank"
- When these 2 tokens talk to each other, they update their embeddings based on the context.
- So even if the word "bank" appears in both "river bank" and "icici bank", and both have:
- the same token,
- the same original vector embedding,
- and the same positional embedding,
- the final vector embedding will differ because of how they interact with other tokens in the sentence.
Tokens update their embeddings based on all tokens in the sentence — not just one or two.
So what does self-attention do?
- It allows tokens to adjust their embeddings based on the other tokens present in the input. 🔁

🧠 Multi-Head Attention — Seeing Things in Many Ways

Think of it like observing the sentence from multiple perspectives at once! 👀🔍

It helps the model focus on different aspects/perspectives of the tokens simultaneously.
At its heart, attention is about weighing the importance of different input tokens when focusing on a specific one.
Multi-head attention runs multiple attention operations in parallel (called “heads”) and combines their results.
This allows the model to learn various types of relationships between tokens — all at the same time. 💡🧩

🔗 Feed Forward

It’s a neural network that processes each token individually after attention is done. 🛠️

The interaction cycle between Multi-Head Attention and Feed Forward is repeated many times to get a rich contextual result.
In GPT (and Transformers in general), after using attention to understand word relationships, the model sends the result through a Feed Forward Neural Network (FFN).

🧠 "Okay, I understand which words matter (thanks to attention)...

Now let me do some math on each word to extract more meaning!" ➗🔍

⚙️ How It Works

The Feed Forward block is just a small, regular neural network applied to each token individually.

Here's what happens:

Each token (word embedding) goes into a Linear layer (fully connected).
Then through a ReLU or GELU activation (for non-linearity).
Then again through another Linear layer.
The output replaces the old one and continues through the Transformer stack.

🧱 Formula:

FFN(token) = Linear -> Activation -> Linear

🧠 Why Is It Important?

Attention handles relationships between words.
Feed Forward handles processing each word by itself, like extracting deeper meanings and features. 🌱

🔁 Simple Analogy

Attention: "Hey 'cat', pay attention to 'mat' and 'sat'!" 🐱🧘🪑

Feed Forward: "Cool. Now let me upgrade 'cat' with better features based on that context." 🚀📈

🚀 Two Phases of a Model

🔧 Training Phase
🧠 Inference Phase (using phase)

🔧 Training Phase

Let’s break it down:

In the training phase, we match input and output.
We provide the input, the model gives us an output, we compare it with the actual output, calculate the loss, and then backpropagate it (backpropagation = वapas jao 😄).

Example Flow:

Input: <start> my name is piyush
Actual Output: <start> my name is piyush <end> I am good
Model Output: <start> my name is piyush xsfd@e 😅

Then:

We calculate the loss between the model’s output and the actual output.
We send it back through the model to adjust the weights.
We repeat this until the model starts giving the expected output. 🔁

✅ Goal: Allow the model to update its weights using the training data.

⚙️ This backpropagation and weight update process requires a lot of compute power, hence heavy GPU usage 💻🔥.

🧠 Inference Phase: Using the model

Time to use what we've trained! 😎

We provide an input to the model:
- token → vector embedding ➕ positional embedding → 🎯 multi-head attention

⚡ Technically, the model generates multiple possible outputs for a given input.

🧪 Example:

Input: <start> how are you?<end>
Raw Outputs: I, S, U
The linear step calculates probabilities for each token.
Example Output (with probabilities):
- I (98%) ✅
- S (5%)
- U (0.3%)

The Softmax step picks the one with the highest probability.

🎛️ Temperature Parameter:

Used by Softmax to control randomness.

Higher temperature → more randomness (might pick a less probable token).

Lower temperature → more deterministic (sticks to highest probability).

🪜 Steps Breakdown

STEP 1:
- Input: <start> how are you?<end>
- Probabilities: I (98%), S (5%), U (0.3%)
- Chosen: I
- ✅ Final Output: <start> how are you?<end> I
STEP 2:
- Input: <start> how are you?<end> I
- Probabilities: _am (80%), few (5%), good (4%)
- Chosen: _am
- ✅ Final Output: <start> how are you?<end> I_am
STEP 3:
- The process continues iteratively 🔁 (until it reaches the <end> token or finishes generating).

📚 Extra Info

📜 The Transformer model was introduced by Google in the research paper "Attention is All You Need".
🗣️ Originally created for Google Translate, the idea was to understand and translate semantic meaning using NLP.
🧠 This means it was built with language understanding in mind.

🔍 GPT (by OpenAI) was based on the transformer —

But instead of translation, it was built for next token prediction 🧩