Forem: NEBULA DATA

Mastering the Interface: Why Prompt Engineering is the New Software Syntax

NEBULA DATA — Tue, 05 May 2026 10:38:06 +0000

In the traditional software development lifecycle, we communicate with machines through rigid syntax—Python, Java, or C++. If a semicolon is missing, the system fails. However, with the rise of Large Language Models (LLMs), the barrier between human intent and machine execution has shifted.

We are no longer just writing code; we are engineering intent. This is the essence of Prompt Engineering.

What is Prompt Engineering?

At its surface, prompt engineering is the art of crafting inputs to get the desired output from an AI. But looking under the hood, it is a sophisticated form of Informed Heuristic Search.

When you send a prompt to a model like GPT-4, Claude, or Llama, you aren't just "asking a question." You are providing a set of constraints that narrow down the model’s vast probabilistic space. A well-engineered prompt acts as a guide, steering the model away from hallucinations and toward a high-probability, high-accuracy response.

The Mechanics: How It Works

To master prompting, one must understand three core technical pillars:

1. Tokenization & Attention

Models don't read words; they process tokens. Prompt engineering involves placing anchors (key terms) that the model’s attention mechanism can weigh heavily to maintain context.

2. Context Window Management

Every model has a limit on how much information it can hold at once. Efficient prompting maximizes the utility of this window, ensuring the most relevant data is prioritized.

3. Instruction Following & Few-Shot Learning

By providing a few examples within the prompt (few-shot), we shift the model from a general-purpose engine to a specialized tool for a specific task—without traditional fine-tuning.

From Theory to Practice: The Framework

A production-ready prompt isn't just a sentence—it’s a structured document.

C.R.E.D.O Framework

Context: What is the background?
Example: "You are a Senior Security Auditor."
Role: Define the persona.
Evidence/Data: Provide the specific text or logs to be analyzed.
Deliverable: Define the output format.
Example: "Output a JSON object with 'severity' and 'description' keys."
Objective: What is the ultimate goal?

From Theory to Code: A Practical Implementation

import requests

# Your NebulaAPI Key
api_key = "YOUR_NEBULA_API_KEY"
url = "https://api.nebula-data.ai/v1/chat/completions"

def get_architect_advice(technology_stack):
    # Defining the structured prompt
    system_prompt = "You are a Senior Solution Architect specialized in Cloud Infrastructure."
    user_prompt = f"""
    Analyze the following tech stack: {technology_stack}.

    Task: 
    1. Identify potential scalability bottlenecks.
    2. Suggest 2 managed services to optimize the architecture.

    Output Format:
    Return the response in valid JSON format with keys: 'bottlenecks' and 'recommendations'.
    """ 

    payload = { 
        "model": "gpt-4o",  # You can swap to "claude-3-sonnet" or "llama-3-70b"
        "messages": [ 
            {"role": "system", "content": system_prompt}, 
            {"role": "user", "content": user_prompt} 
        ], 
        "temperature": 0.2  # Lower temperature for deterministic output
    } 

    headers = { 
        "Authorization": f"Bearer {api_key}", 
        "Content-Type": "application/json" 
    } 

    response = requests.post(url, json=payload, headers=headers) 
    return response.json() 

# Example Usage
result = get_architect_advice("Python Flask, PostgreSQL, Single Region AWS")
print(result['choices'][0]['message']['content'])

Why This Matters for Prompt Engineering

Using a centralized aggregator like NebulaAPI makes prompt engineering more scientific:

Version Control: Store prompt templates and test them across 150+ models
Consistency: Standardized request/output formats
Efficiency: Benchmark prompts across multiple models with a single loop

The Challenge of a Fragmented AI Landscape

As a developer or architect, the biggest hurdle isn't just writing prompts—it’s portability.

A prompt that works perfectly on one model may fail on another. In production, being locked into a single provider is a major risk.

Streamlining the Workflow with NebulaAPI

NebulaAPI acts as a Unified LLM Aggregator:

Instant A/B Testing: Compare models simultaneously
Unified Infrastructure: Switch models by changing a single string
Enterprise Scalability: Simplifies RAG and agentic workflows

Conclusion

Prompt Engineering is evolving into a rigorous discipline of software architecture.

The era of “one model fits all” is over. Tools like NebulaAPI allow developers to focus less on infrastructure and more on building intelligent, resilient, and optimized AI systems.

Prompt Engineering, LLM Aggregator, RAG, NebulaAPI, Model Optimization

Why One Model Fails

NEBULA DATA — Wed, 29 Apr 2026 03:50:45 +0000

What Real AI Systems Actually Look Like)

There’s a common pattern in early AI projects.

You pick one model — usually the “best” one — wire it into your backend, ship a feature, and call it a day.

And at first, it worked.

Until it doesn’t.

The Illusion of “The Best Model”

When people say:

“Just use GPT-5”
or
“This model is the most powerful”

They’re not wrong.

They’re just incomplete.

Because in real-world systems, “best” depends on context:

Best for reasoning ≠ best for speed
Best for coding ≠ best for cost
Best for summarization ≠ best for conversation

And once your app hits production, these differences stop being theoretical — they become operational problems.

Where Things Start Breaking

Let’s say you build a simple AI endpoint:

import requests

def generate(prompt):
    response = requests.post(
        "https://api.provider.com/v1/chat/completions",
        json={
            "model": "gpt-5.5",
            "messages": [{"role": "user", "content": prompt}]
        }
    )
    return response.json()

Looks clean.

But here’s what happens over time:

1. Latency Spikes

Some requests take:

300ms
others take 5–10 seconds

Now your UX becomes inconsistent.

2. Cost Explodes

You start sending everything to a premium model:

simple queries
complex reasoning
trivial formatting

You’re overpaying for tasks that don’t need it.

3. Output Quality is Inconsistent

Even the same model:

hallucinates sometimes
misses edge cases
behaves unpredictably

4. Feature Limitations

Not all models support:

reasoning tokens
tool usage
streaming
structured outputs

Eventually you hit:

“We need another model for this…”

And that’s where things get messy.

The Realization: One Model ≠ One System

At some point, every serious AI product discovers this:

You’re not building a model integration.
You’re building a decision system.

Instead of:

input → model → output

You actually need:

input → decision → model → output

A Simple Example: Smarter Routing

Let’s upgrade the earlier code.

Instead of hardcoding one model, we route based on the task:

def select_model(prompt):
    if len(prompt) < 50:
        return "mistral/mistral-small"  # cheap & fast
    elif "analyze" in prompt or "why" in prompt:
        return "openai/gpt-5.5"  # better reasoning
    return "anthropic/claude-3-haiku"  # balanced

Then:

def generate(prompt):
    model = select_model(prompt)

    response = requests.post(
        "https://api.nebula-data.ai/v1/chat/completions",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "model": model,
            "messages": [{"role": "user", "content": prompt}]
        }
    )
    return response.json()

Now your system is:

cheaper
faster
more adaptable

Reliability: The Hidden Problem

Even with routing, things can still fail:

rate limits
API downtime
unexpected responses

So production systems add fallbacks:

MODELS = [
    "openai/gpt-5.5",
    "anthropic/claude-3-opus",
    "mistral/mixtral"
]

def safe_generate(prompt):
    for model in MODELS:
        try:
            response = requests.post(
                "https://api.nebula-data.ai/v1/chat/completions",
                headers={"Authorization": "Bearer YOUR_API_KEY"},
                json={
                    "model": model,
                    "messages": [{"role": "user", "content": prompt}]
                },
                timeout=5
            )

            data = response.json()

            if "choices" in data:
                return data["choices"][0]["message"]["content"]

        except Exception:
            continue

    return "All models failed."

Now the system feels different:

It’s no longer fragile
It adapts

The Shift: From Model-Centric to System-Centric

This is the mental shift most people miss.

Early stage:

“Which model should we use?”

Later stage:

“How should we orchestrate multiple models?”

That’s a completely different problem.

It introduces new layers:

routing logic
fallback strategies
evaluation
cost optimization
capability matching

Why This Gets Complicated Fast

In theory, multi-model sounds simple.

In practice, you run into:

different API formats
different response structures
different capabilities
different pricing models

You end up writing glue code like:

def normalize(resp):
    if "choices" in resp:
        return resp["choices"][0]["message"]["content"]
    elif "output_text" in resp:
        return resp["output_text"]
    return ""

Multiply this across 5–10 providers…

Now you’re not building a product anymore —
you’re maintaining infrastructure.

A More Practical Approach

This is where a unified API layer starts to make sense.

Instead of:

wiring multiple providers
handling inconsistencies
managing fallbacks manually

You interact with one interface, and treat models as interchangeable components.

That doesn’t remove the need for logic —
but it makes the logic actually manageable.

Platforms that aggregate models behind a single API (like Nebula) essentially turn this:

multiple vendors → multiple SDKs → fragmented logic

into:

one API → many models → centralized control

What Real AI Systems Look Like

By the time a system matures, it usually has:

Routing → choose the right model
Fallbacks → handle failure
Evaluation → compare outputs
Optimization → balance cost vs quality

At that point, the question isn’t:

“Which model is best?”

It becomes:

“How do we use multiple models intelligently?”

Final Thought

One model works for demos.

But production systems live in the messy reality of:

trade-offs
variability
failure

And the only way to handle that is not by choosing better models…

…but by designing better systems.

If you build long enough in this space, you’ll notice:

“The real advantage doesn’t come from having access to a powerful model
It comes from having control over many of them.”

And once you reach that point,

You’re no longer just calling AI —
You’re orchestrating it.

If you want to try this approach yourself, platforms like Nebula provide access to hundreds of top-tier AI models through a single API:

👉 https://nebula-data.ai/

Specialized Chatbot using RAG — Part III

NEBULA DATA — Sat, 18 Apr 2026 10:56:07 +0000

Alright — this is where things finally get interesting. In Part II, we already prepared everything:

Our documents are processed
Chunks are created
Embeddings are stored inside ChromaDB

So technically… our chatbot already has knowledge. But here’s the problem: it still doesn’t know how to use it. Right now, our system is just a “smart storage”. Not a “smart chatbot”. In this part, we’re going to fix that.

From Storage → Intelligence

Let’s quickly recall how RAG actually works. Instead of answering directly, the system:

Converts the user question into an embedding
Searches the vector database
Retrieves relevant chunks
Sends them to the LLM as context
Generates an answer

This is what transforms a normal chatbot into a domain-specific assistant.

Step 1 — Converting the User Query into an Embedding

When a user asks something, we don’t send it directly to the model. We first convert it into a vector.

query = "How much is cash balances (Kas) for 2024?"
query_embedding = embed([query])[0]

Why? Because our database doesn’t understand raw text — it understands vectors.

Step 2 — Searching the Vector Database

Now we use that embedding to find the most relevant chunks.

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=3
)

This will return the top 3 most relevant pieces of text.

Step 3 — Preparing Context

Combine those chunks into a single context to provide to the LLM.

context = "\n\n".join(results)
# or, if `results` is a list of dicts with a 'text' field:
# context = "\n\n".join([r['text'] for r in results])

Step 4 — Sending Context to the LLM

Pass the combined context plus the original user question to the LLM so it can generate an informed, domain-specific answer.

prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
response = llm.generate(prompt)

Step 5 — Generating the Answer

The LLM uses the retrieved context to produce a response grounded in your documents. This is the step that turns a plain vector store into an intelligent, domain-aware chatbot.

That’s the core RAG flow — convert query → retrieve relevant chunks → provide context → generate answer. In the next part we’ll look at optimizing retrieval quality, prompt engineering, and handling long contexts.

If you want to try it out yourself, check out Nebula Lab here: https://ai-nebula.com/.

Specialized chatbot using RAG

NEBULA DATA — Mon, 09 Mar 2026 08:25:01 +0000

Specialized Chatbot using RAG (Retrieval-Augmented Generation) — Part II

In the previous episode, we already discussed the concept of Retrieval-Augmented Generation (RAG) and prepared our project structure, requirements, and source data.

Now we are moving to the most important step of the RAG pipeline, which is ingesting our knowledge source into the vector database.

This process includes several stages:

Reading the PDF
Splitting the document into chunks
Creating embeddings
Storing them inside ChromaDB

Once this process is completed, our chatbot will have a searchable knowledge base.

Later, when a user asks a question, our system will retrieve the most relevant parts of the document and use them as context for the model.

Understanding the Ingestion Pipeline

Before jumping into the code, let's understand the overall workflow.

Our program will perform the following steps:

Load the PDF document
Extract text from the PDF
Split the text into smaller chunks
Convert each chunk into embeddings
Store the embeddings and text into ChromaDB

The result will be a vector database that represents the entire BCA Annual Report in semantic form.

Importing Required Libraries

First, we import all the libraries used in our program.

import os
import chromadb
from pypdf import PdfReader
from openai import OpenAI
from dotenv import load_dotenv

Explanation of Each Library

os
Used to interact with our system environment.

chromadb
Our vector database that will store embeddings.

pypdf
Used to read and extract text from PDF documents.

OpenAI client (Nebula API)
Used to generate embeddings.

dotenv
Used to securely load our API key from the .env file.

Loading Environment Variables

Next, we load the environment variables.

python load_dotenv()

This allows our program to read the NEBULA_API_KEY stored inside the .env file.

Example .env file:

NEBULA_API_KEY=your_api_key_here

This method is important because API keys should never be hardcoded directly inside the program.

Initializing Nebula API Client

Now we initialize the API client that will communicate with Nebula.

python client = OpenAI( api_key=os.getenv("NEBULA_API_KEY"), base_url="https://llm.ai-nebula.com/v1" )

Here we use an OpenAI-compatible client, but the request is actually sent to the Nebula API endpoint.

This allows us to use Nebula infrastructure while keeping the OpenAI SDK interface.

Initializing ChromaDB

Now we prepare our vector database.

`python
db_path = "./chroma_db"

chroma_client = chromadb.PersistentClient(path=db_path)

collection = chroma_client.get_or_create_collection(
name="bca_annual_report_2025"
)
`

Explanation

PersistentClient
Creates a persistent database stored locally.

db_path
The folder where our vector database will be stored.

Collection
Similar to a table in a traditional database.

In this case we create a collection called:

bca_annual_report_2025

This collection will contain:

document chunks
embeddings
document IDs

Step 1 — Reading the PDF

Now we create a function to read the entire PDF document.

`python
def read_pdf(path):
reader = PdfReader(path)
text = ""

for page in reader.pages:
    text += page.extract_text() + "\n"

return text

What This Function Does

Opens the PDF file
Iterates through every page
Extracts the text
Combines all text into a single string

Since the BCA Annual Report contains around 600 pages, this step may take a few seconds depending on your machine.

Step 2 — Splitting the Document into Chunks

Next we split the document into smaller pieces.

`python
def chunk_text(text, chunk_size=500, overlap=50):
words = text.split()
chunks = []

for i in range(0, len(words), chunk_size - overlap):
    chunk = " ".join(words[i:i+chunk_size])
    chunks.append(chunk)

return chunks

Why Do We Need Chunking?

Because:

Language models have context limits
Sending the entire document into the prompt would be extremely inefficient

Instead, we break the document into smaller segments.

In this code:

chunk_size = 500 words overlap = 50 words

The overlap helps preserve context continuity between chunks, which improves retrieval quality.

Example Structure

Chunk 1 : word 1 → word 500 Chunk 2 : word 450 → word 950 Chunk 3 : word 900 → word 1400

This technique helps prevent information loss between chunks.

Step 3 — Creating Embeddings

Now we convert each chunk into embeddings.

`python
def embed(texts):
response = client.embeddings.create(
model="text-embedding-3-large",
input=texts
)

return [e.embedding for e in response.data]

Embeddings are numerical vector representations of text.

Instead of storing raw text only, we convert each chunk into vectors so the database can perform semantic similarity search.

Example Concept

"bank revenue growth" → [0.021, -0.771, 0.144, ...]

Texts with similar meaning will have vectors close to each other in vector space.

This is what allows RAG systems to find relevant knowledge quickly.

Step 4 — Running the Ingestion Process

Now we run the entire ingestion pipeline.

`python
pdf_path = "source/20260212-BCA-AR-2025-ID.pdf"

if os.path.exists(pdf_path):

print("⏳ Reading PDF...")
pdf_text = read_pdf(pdf_path)

print("⏳ Creating chunks...")
chunks = chunk_text(pdf_text)

print(f"⏳ Creating embeddings for {len(chunks)} chunks...")
embeddings = embed(chunks)

collection.add(
    documents=chunks,
    embeddings=embeddings,
    ids=[str(i) for i in range(len(chunks))]
)

print(f"✅ Success! Database saved to: {db_path}")

else:
print(f"❌ File not found: {pdf_path}")
`

Step-by-Step Explanation

Step 1 — Check if the PDF exists

python os.path.exists(pdf_path)

This prevents errors if the file is missing.

Step 2 — Extract the text

python pdf_text = read_pdf(pdf_path)

The entire PDF is converted into raw text.

Step 3 — Create chunks

python chunks = chunk_text(pdf_text)

The document is split into smaller pieces.

For a 600-page report, this may generate hundreds or even thousands of chunks.

Step 4 — Generate embeddings

python embeddings = embed(chunks)

Each chunk is converted into a vector representation using the embedding model.

Step 5 — Store everything in ChromaDB

python collection.add( documents=chunks, embeddings=embeddings, ids=[str(i) for i in range(len(chunks))] )

We store three components:

documents → the text chunks
embeddings → vector representations
ids → unique identifiers

Now our vector database is ready.

After the Ingestion

Once the ingestion process is finished, our database will contain semantic vectors for every chunk of the BCA Annual Report.

This means our system can now perform:

Semantic Search instead of traditional Keyword Search.

What Happens Next?

Now that our knowledge base has been stored inside ChromaDB, the next step is building the retrieval pipeline.

In the next part we will implement:

Convert user question into embedding
Search the vector database
Retrieve the most relevant chunks
Send them as context to Nebula API
Generate a grounded response

This is where the actual RAG magic happens.

Nebula Lab

For those who want to build chatbots or other AI applications, you can check Nebula Lab here:

https://nebula-data.ai/

They offer more than just API access for multiple models, including various tools and features for AI development.

See you in the next part.

Specialized chatbot using rag (retrieval augmented generation) Part I

NEBULA DATA — Fri, 27 Feb 2026 07:37:45 +0000

In the previous episode, we successfully built an interactive chatbot that can respond to user questions and secure the API key itself. Now, we want to enhance it by implementing Retrieval-Augmented Generation (RAG).
RAG allows us to specialize our chatbot by enabling it to use our own documents as a knowledge source. These documents can belong to specific domains such as finance, law, science, mathematics, or any other specialized field. Instead of relying only on the model’s pre-trained knowledge, we provide it with domain-specific information so it can generate more accurate and relevant responses.
The first step in implementing RAG is preparing the source data. The source consists of the documents we want the chatbot to learn from, such as PDFs, text files, reports, or databases.
However, a language model cannot efficiently process large raw documents directly every time a user asks a question. Large documents may exceed the model’s context limit and would be inefficient to pass into the prompt repeatedly. Therefore, we need to preprocess these documents.
The preprocessing steps typically include:

Splitting the documents into smaller chunks Each document is divided into manageable pieces to fit within the model’s context window.
Generating embeddings for each chunk Using an embedding model (for example, from OpenAI), each text chunk is converted into a numerical vector representation. These vectors capture the semantic meaning of the text.
Storing embeddings in a vector database The generated embeddings are stored in a vector database such as Pinecone, Weaviate, Chroma, or FAISS. This database allows us to perform similarity searches efficiently. When a user submits a question: • The question is converted into an embedding. • The system searches the vector database to find the most relevant document chunks. • These retrieved chunks are then provided to the language model as additional context. • Finally, the model generates an answer grounded in the retrieved information. By applying RAG, our chatbot becomes more accurate, domain-specific, and capable of providing responses based on our own knowledge base rather than relying solely on general training data.

Now, for the source, I’m using Annual Report from BCA (Bank Central Asia):
This file consists 600 pages and mostly full of text. Now, we’re moving to the next step.
And make sure the structure of our app should be look like this:

Nebula -> Source (Annual Report from BCA), .env , nebula.py (our main program)
So, our program workflow should be like these:

Load PDF
Split into chunks
Create embeddings
Store in ChromaDB
Search relevant chunks
Send context + question to Nebula API

Preparing Requirements:
Before we can run our program, we need to install several library in order to support it:

chromadb
pip install langchain (Optional but better for chunking)
pypdf
sentence-transformers
tiktoken
python-dotenv (We have installed it before)
openai (We have installed it before) Do we need to set up Virtual Environment for this project? Well, the answer is depend of the scale of our app, if we want to create this chatbot to take way more source, way more capability etc, scared of different environment changes because of inconsistent library version of each update, the answer is YES. But for these type of app that I build to show you guys how to build simple RAG chatbot using Nebula API, might be not necessary.

Okay, now we need to install all the requirement of library inside our environment / local device, the command is very simple ‘pip install our package name’, so if I want to install chroma db, I just need to type ‘pip install chromadb’ on my CLI and hit Enter.

If you guys see, on my cli all the output says “Requirement already satisfied”, it happened because Im already installing all of those package on the past so the program will automatically tell me and skip the process.

But there’s several way to do installation,

One by one (like pip install chromadb)
By Listing (pip install chromadb langchain etc)
Whole list (put all the library list on the requirements.txt and on the CLI, just type pip install -r requirements.txt and hit enter) So anything you want to use, its good and its serve different purposes.

OKAY, on the next episode, we will try to ingest the pdf into our database (Chroma DB), it will be a bit difficult at first but I know you guys who read this article will master it easily.
And for you guys who wants to build chatbot or even other things with ai, you can check NEBULA LAB here:

https://nebula-data.ai/
There’s much more than just an API KEY FOR ALL MODEL, they also had a ton of other features such as marketing etc.

How to secure your api key?

NEBULA DATA — Fri, 20 Feb 2026 13:00:10 +0000

Hi guys, I'm back!
In the previous episode, we’re successfully create an interactive chatbot, but one critical thing remain, our api key is still inside our python file, which is if we’re publishing our code into public or moving it into bigger scale of development or even production stage, of course its unsafe at all, that’s why im telling you on the next episode, ill show you how to secure our private thing (in this case API key).
Now I'm going to show you how to secure your own API key (Nebula API)!

Environment variable

An environment variable is a dynamically named value that can affect the way running processes behave on a computer. Think of them as "global constants" for your operating system’s environment.
Instead of hard-coding a specific file path or a secret API key directly into a program's source code, you store that information in an environment variable. This allows the program to remain generic and adaptable to different computers or users.

Now, the first thing we need is to create a ‘.env’ file, which will contain our Nebula API key:

After we create .env file, now we need to put our Nebula API key inside this file, you can just simply move the variable that contain our key into that file, in this snippet:

from openai import OpenAI
client = OpenAI(
    nebula_api_key="sk-xxxxx",
    base_url="https://llm.ai-nebula.com/v1"
)

We’re going to move “ nebula_api_key="sk-xxxxx" ” inside the .env file (don’t forget to remove the double quote symbol in order to system can retrieve the Nebula_api_key as a value, not just a string):

How can our program retrieve the nebula api key?

Now, we need to do something so our program can read the nebula api key inside the .env files.
There’s why we need to use ‘dotenv’ and ‘os’ library, The os.getenv bridge: Simply having the .env file isn't enough, Python needs that "bridge" command to go grab the value from the system memory.

So, we need to import 2 more, so it will be like this:

The ‘load_dotenv()’ function is most important because it retrieves variables from our .env file.

Configuring the Client

This is where the "bridge" happens. In the original code, the API key was typed directly into the script, a big security risk! Now, we use os.getenv() to look for the key we saved in our .env file.

Original Code:

client = OpenAI(
    nebula_api_key="sk-xxx", # Hard-coded (Unsafe!)
    base_url="https://llm.ai-nebula.com/v1"
)

Secure Code:

client = OpenAI(
    nebula_api_key=os.getenv("NEBULA_API_KEY"), # Fetched from system memory
    base_url="https://llm.ai-nebula.com/v1"
)

Enhancing the Chat Loop

Finally, we can add a small confirmation message. While the logic of the while loop remains the same, having the API key stored externally makes the code much cleaner and ready for professional deployment.

print("Chatbot initialized. Type 'exit' to stop.")
while True:
    user_input = input("You: ")
# ... rest of your logic stays the same!

By making these changes, you have successfully:

Separated Configuration from Code: Your code now tells the computer what to do, while the .env file tells it which credentials to use.
Prevented Data Leaks: Even if you share your Python file with a classmate or push it to GitHub, your secret key stays safe on your own machine.

Now, all we need is to test it out if these changes will work or not?

As you guys can see, these changes is success and the program runs smoothly without any problem, and with these changes, we can continue developing our program into the next stage without worrying about our security.

And for you guys who want to build a chatbot or even other things with AI, you can check NEBULA LAB here:

Nebula Lab

There’s much more than just an API KEY FOR ALL MODELS. They also had tons of other features, such as marketing, etc.

MAKE THE CHATBOT MORE INTERACTIVE USING PYTHON ONLY USING NEBULA API

NEBULA DATA — Thu, 12 Feb 2026 04:02:11 +0000

For today’s episode, I'll continue working on our project to make rag based chatbot using Nebula Data’s api.
Okay, from the previous episode, we already set up the account, api and the billing. Now I want to make a simple chatbot, but using CLI (command line interface) only (maybe in the next article, I will make the interface).
Okay, so we already have the base of our code from the Nebula documentation. Like this:

from openai import OpenAI
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://llm.ai-nebula.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)

This is for testing the api either they’re working or not, by simply saying “Hello!” to the AI through the nebula api.

Now, what I wanna do is to make this ai can answer my questions.
Why don’t I just change the string “Hello!” to my question?

Well, technically it can, but that’s not interactive, because if we just change that part, it won't be possible to keep changing the code again and again each time we need to ask something to the AI.

So I want to make a loop and create a variable that can contain our questions, so we don’t need to change it again, all we need is just to ask via the CLI.

Okay, so first things first,
We need to make a variable, let’s name it “user_input”, inside this variable is our question. And this process:

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

I will wrap them into a loop function, specifically the “while loop”,

What is a while loop?

A while loop is a control structure in programming that repeatedly runs a block of code as long as a condition is true.
The condition is checked before each iteration, so if it’s false from the start, the loop won’t run at all.

Okay, now if I applied the loop function on purpose to keep the AI running, then how can I stop them? Well, all I need is just to make a “Break Condition” that can be applied inside the loop and put a condition that can make the loop stop.

So the simplest way to do that is to make a conditional statement (if statement) that contains “break function”. So the condition I wanna make as if the user types “quit” or “exit”, the program will activate the break condition and finally call the break function.

This is how you can make the loop function work on a chatbot (it’s the easiest method I can explain). Now, after the loop is True, it will take the user input and process it into the AI using NEBULA API, so for doing that, all we need is just to place the response code inside the while loop condition:

And the rest or the outer while loop function is just initialize the NEBULA API, URL, and the Print function, so the final version should be like this:

Okay, all we need now is to run this program,

And to close the program:

And just like that, we can make the chatbot more interactive using the NEBULA API!
But many people use APIs just like that. It's not safe to write the API key in a Python file like that, so in the next article, I will show you how to use .env and the benefits of using it.
And for you guys who want to build a chatbot or even other things with AI, you can check NEBULA LAB here:
Nebula Lab

There’s much more than just an API KEY FOR ALL MODELS, they also have tons of other features, such as marketing, etc.

High Performance, Low Cost: Building a Professional RAG Chatbot from Scratch

NEBULA DATA — Tue, 27 Jan 2026 08:33:08 +0000

Building a RAG Chatbot from Scratch: Part 1

Choosing the Right Engine

Hello everyone! Today, I’m kicking off a short series where I’ll be documenting my journey of building a specialized chatbot. Unlike a standard chatbot that provides general answers, I want this one to have a very specific "job": answering questions based on the 2024 Indonesian Government Financial Reports compiled by the Ministry of Finance.

You might be wondering: "What’s the difference between a regular chatbot and a RAG-based chatbot?" The primary difference lies in the information source and how the AI formulates its response.

Understanding the RAG Difference

In a standard AI setup, the process is quite linear:

As you can see, the user asks a question, and the AI responds based on the data it was trained on. However, for my project, I am adding a critical component that prevents the AI from needing to "guess" or rely on outdated training data.

By adding Stored Information (our 2024 Financial Report), the general-purpose AI becomes a Specialized AI. It will only provide answers relevant to the context found in that stored data. We will discuss what happens when a user asks something "out of context" in future articles, but today, my focus is on selecting the right AI model.

Selecting the Model: Why Nebula Lab?

When looking for a model, I felt overwhelmed by the different platforms—GPT, Claude, and Gemini all live in different ecosystems. I initially looked at OpenRouter, a popular API aggregator. However, after some research and a tip from a friend, I discovered Nebula Lab.

Nebula Lab (ai-nebula.com) is an API aggregator that offers not just LLMs, but also marketing tools. Here is why I decided to switch from OpenRouter to Nebula:

Cost-Effectiveness: Their prices are significantly lower. For example, GPT-5.2 is listed at $1.40 USD per 1M tokens. Compared to official OpenAI pricing, Nebula is genuinely more affordable.

No Platform Fees: Unlike some aggregators that charge a 5% platform fee, Nebula Lab doesn't tack on extra costs.
Model Variety: They host all the heavy hitters, including OpenAI, Google, and Anthropic.
Clean UI: The interface is simple and easy to navigate.

Clear Documentation: For a beginner, their documentation is straightforward and easy to implement.

Testing the API

Getting started was incredibly easy:

Navigate to the Model Center.
Select API Key on the left sidebar.
Generate your key.

To ensure everything was working, I tested the connection using two methods provided in their documentation:

1. Testing via CURL

I ran the following command in my terminal (Command Prompt/PowerShell):

curl https://llm.ai-nebula.com/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{
        "model": "gpt-5.2",
        "messages": [{"role": "user", "content": "Hello!"}]
    }'

The response was instant and normal. The "GPT-5.2" model responded perfectly.

2. Testing via Python

I then used Python (version 3.13.2) for a more integrated test:

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxxxxxxxxxxxxx", # Replace with your actual key
    base_url="https://llm.ai-nebula.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Success! The code ran smoothly without a single hitch. I’m really impressed with Nebula Lab’s variety and ease of use.

What's Next?

In the next article, we’ll start building the actual chatbot and gradually begin injecting our financial data to transform this from a simple API call into a full-fledged RAG system.

If you want to try it out yourself, check out Nebula Lab here: https://openai-nebula.com/

Beyond Loss Curves: Interpreting the Transition from Instability to Structure

NEBULA DATA — Thu, 22 Jan 2026 06:29:16 +0000

In the previous section, I described an apparent transition during training: volatile representations early on, followed by a phase of rapid structural alignment, and finally a stabilization period where loss improvements slow but internal consistency increases. Assuming this pattern is not an artifact, the next question becomes more fundamental:
What changes in the learning dynamics cause this transition to occur?
Rather than treating training as a smooth, monotonic process, it may be more accurate to view it as a sequence of qualitatively different regimes.

A Possible Phase Transition in Learning

One way to interpret the observed behavior is as a phase transition in representation space. Early in training, parameter updates appear dominated by large gradients responding to easy-to-exploit statistical regularities. These updates reshape embeddings aggressively, but not coherently.
To quantify this intuition, I extended the training loop to explicitly track gradient magnitude and representation drift:

def flatten_params(tensor):
    return tensor.view(-1)

prev_embedding = None

for epoch in range(num_epochs):
    optimizer.zero_grad()

    outputs = model(inputs)
    loss = criterion(outputs, targets)

    loss.backward()

    grad_norm = model.encoder.weight.grad.norm().item()
    optimizer.step()

    with torch.no_grad():
        current_embedding = model.encoder.weight.clone()
        emb_norm = current_embedding.norm().item()

        if prev_embedding is not None:
            drift = torch.norm(
                flatten_params(current_embedding) -
                flatten_params(prev_embedding)
            ).item()
        else:
            drift = 0.0

    print(
        f"Epoch {epoch} | "
        f"Loss: {loss.item():.4f} | "
        f"GradNorm: {grad_norm:.2f} | "
        f"EmbNorm: {emb_norm:.2f} | "
        f"Drift: {drift:.4f}"
    )

    prev_embedding = current_embedding

What stood out was that:

• Gradient norms were largest early, even when loss reductions were modest
• Embedding drift was extreme during early epochs
• Drift dropped sharply after a certain point, even though gradients remained non-zero
This suggests that early gradients primarily drive exploration of parameter space rather than refinement of stable structure.

Optimization vs. Organization

This observation hints at an important distinction that loss alone does not capture:
• Optimization: minimizing task error
• Organization: imposing structure and consistency on internal representations
Early training is dominated by optimization pressure. Later training appears to emphasize organization, even when the task objective changes little.
To probe this more directly, I tracked pairwise cosine similarity between randomly sampled token embeddings across epochs:

import random
import torch.nn.functional as F

def sample_cosine_stats(embeddings, num_samples=100):
    indices = random.sample(range(embeddings.size(0)), num_samples)
    cosines = []
    for i in range(len(indices) - 1):
        a = embeddings[indices[i]]
        b = embeddings[indices[i + 1]]
        cosines.append(F.cosine_similarity(a, b, dim=0).item())
    return sum(cosines) / len(cosines)

with torch.no_grad():
    cosine_mean = sample_cosine_stats(model.encoder.weight)

print(f"Epoch {epoch} | Mean Pairwise Cosine: {cosine_mean:.4f}")

During early epochs, cosine statistics fluctuated wildly. Mid-training, they shifted rapidly toward consistent ranges, indicating emerging geometry. Late training showed minimal variance, even when loss had largely saturated.
This reinforces the idea that internal geometry continues evolving long after task performance stabilizes.

Why Representations Stabilize Even When Loss Does Not

One puzzling outcome was that representation similarity between epochs continued increasing even when loss improvement was negligible. This suggests that gradients were still shaping the model—but along dimensions invisible to the objective.
A plausible explanation is that, once the model enters a low-loss basin, gradient descent mostly moves parameters within an equivalence class of solutions. Outputs remain similar, but internal redundancy decreases and representations become smoother.
This can be approximated by tracking singular value decay of the embedding matrix:

with torch.no_grad():
    u, s, v = torch.linalg.svd(model.encoder.weight, full_matrices=False)
    spectral_energy = (s / s.sum()).cpu()

print(
    f"Epoch {epoch} | "
    f"Top-5 Singular Energy: {spectral_energy[:5].sum():.4f}"
)

Late training often showed:
• Increasing concentration of spectral energy
• Reduced effective rank
• More anisotropic embedding spaces
These changes align with emergent structure rather than improved prediction accuracy.

Rethinking Overfitting as a Developmental Stage

This lens reframes overfitting. Instead of a terminal failure mode, it may act as a developmental constraint that forces the model to commit to specific distinctions.
Empirically, models that were heavily regularized from the start showed:
• Lower early drift
• Weaker mid-training clustering
• Less stable late-stage representations
Conversely, temporarily allowing overfitting seemed to accelerate representational alignment—suggesting that structure must exist before it can be generalized.

Signals That May Matter More Than Loss

If loss is insufficient for understanding learning dynamics, other metrics may be more revealing:
• Representation drift between epochs
• Cosine similarity convergence
• Effective rank or spectral entropy
• Gradient-to-drift ratio
• Sensitivity of embeddings to noise perturbations
For example, injecting small Gaussian noise into inputs revealed that late-stage representations were significantly more invariant than early ones, even at identical loss levels.

A Lingering Uncertainty

Despite these findings, an uncomfortable possibility remains:
What if stabilization reflects numerical inertia rather than semantic abstraction?
Without transfer tasks or probing classifiers, it is difficult to disambiguate meaningful structure from converged parameterization. Representation stability is likely necessary for abstraction but not sufficient.

Reframing the Core Question

At this point, the question can be stated more precisely:
Is training best understood as minimizing loss, or as guiding representations through unstable exploratory regimes toward constrained, reusable geometries?

If the latter is even partially true, then training dynamics—not just end metrics—deserve closer attention.
I still hesitate to claim novelty. These ideas likely overlap with concepts such as neural collapse, spectral bias, or information bottlenecks. What feels distinct is seeing them not as isolated effects, but as successive phases of a single developmental process.
Which leaves the question open, but sharper:
If abstraction only emerges after instability, how many of our training practices are unintentionally optimized to prevent the very conditions that make abstraction possible?
For now, this remains less a conclusion than a suspicion but one increasingly supported by the behavior of the models themselves.

Rethinking Learning Dynamics in AI Models: An Early Theory from Experimentation

NEBULA DATA — Thu, 15 Jan 2026 03:39:37 +0000

Observing Representation Instability During Neural Network Training

While experimenting with neural network training behaviors, I started noticing a recurring pattern that does not seem to be explicitly discussed in most mainstream AI literature. This article is not meant to present a finalized solution, but rather a working theory that emerged during development.

I am sharing this to ask: does this interpretation make sense, or am I misunderstanding the dynamics at play?

Background Assumption

Most modern AI systems, particularly deep learning models, rely on gradient-based optimization. The underlying assumption is relatively straightforward:

Minimize loss → improve performance

However, during a series of experiments, I observed that loss minimization alone does not always correlate with meaningful representation learning, especially in early training phases.

This led me to hypothesize that:

AI models may pass through a “representation instability phase” where gradients optimize surface-level patterns before stable internal abstractions emerge.

I am not fully confident whether this is already well-known under a different name, or whether I am misinterpreting training noise as structure.

Initial Observation

While training a small transformer-like model on synthetic data, I logged intermediate layer activations and noticed something interesting:

Early epochs show highly volatile embeddings
Mid-training shows sudden clustering behavior
Late training stabilizes, even when loss improvement slows down

Here is a simplified snippet of the training loop I used:

for epoch in range(num_epochs):
    optimizer.zero_grad()

    outputs = model(inputs)
    loss = criterion(outputs, targets)

    loss.backward()
    optimizer.step()

    with torch.no_grad():
        embedding_norm = model.encoder.weight.norm().item()

    print(f"Epoch {epoch} | Loss: {loss.item():.4f} | Embedding Norm: {embedding_norm:.2f}")

What surprised me is that embedding norms and cosine similarities changed more drastically than loss values, especially early on.

Tentative Theory

My current theory (and this is where I’m unsure):

Gradient descent initially prioritizes optimization shortcuts rather than semantic structure, and only later converges toward representations that are robust and generalizable.

If this is true, then:

Early stopping might prevent meaningful abstraction
Some overfitting phases might actually be necessary
Regularization might delay, not prevent, representation collapse

But this raises questions:

Is this just an artifact of small datasets?
Is this already explained by concepts like loss landscape flatness or mode connectivity?
Am I confusing emergent structure with random alignment?

A Small Diagnostic Experiment

To test whether representations actually stabilize, I added a simple cosine similarity tracker:

`python
def cosine_similarity(a, b):
return torch.nn.functional.cosine_similarity(a, b, dim=0)

prev_embedding = None

for epoch in range(num_epochs):
# training step...

current_embedding = model.encoder.weight.clone().detach()

if prev_embedding is not None:
    similarity = cosine_similarity(
        prev_embedding.view(-1),
        current_embedding.view(-1)
    )
    print(f"Epoch {epoch} | Embedding Stability: {similarity.item():.4f}")

prev_embedding = current_embedding

The similarity score jumps erratically at first, then begins converging toward ~0.98–0.99, even when loss improvement becomes marginal.

This makes me wonder:

Is loss the wrong primary signal for understanding learning progress?

Open Questions for Discussion

I’m genuinely unsure whether this line of thinking is insightful or redundant, so I’d like to open this up:

Is this “instability → abstraction → stabilization” pattern formally recognized?
Could this be explained by information bottleneck theory, or is that a stretch?
Are there better metrics than loss for tracking learning quality?
Am I over-interpreting noise due to small-scale experiments?

Closing Thoughts

I feel like I’m observing something real, but I’m not convinced I’m explaining it correctly yet.

If this theory is flawed, I’d like to know where the reasoning breaks.
If it’s valid, I’d like to understand how others frame it more rigorously.

At this stage, I’m treating this less as a claim and more as a question:

What exactly is an AI model learning before it learns what we want it to learn?

How Machine Learning Works?

NEBULA DATA — Tue, 06 Jan 2026 09:23:20 +0000

Machine Learning (ML) is a specialized area of Artificial Intelligence that analyzes data to discover patterns and make predictions about future outcomes. It follows a structured workflow that includes data collection, preprocessing, model building, training, evaluation, visualization, and deployment.

Today, machine learning plays a vital role in industries such as healthcare, finance, marketing, education, and more.

This article explains machine learning fundamentals, its core concepts, data handling processes, and the ethical responsibilities involved.

What is Machine Learning?

Machine Learning is a field of Artificial Intelligence that enables systems to learn from data and improve performance without being explicitly programmed. It relies on algorithms and mathematical models to analyze large volumes of data, identify patterns, and make informed decisions.

Over time, machine learning has evolved from basic statistical methods to advanced techniques such as deep learning and neural networks. Its growth has been fueled by increased computational power and the availability of large datasets.

Today, ML supports technologies such as:

Natural Language Processing (NLP)
Computer Vision
Recommendation Systems
Autonomous Systems

Example: Predicting House Prices

A simple machine learning task is predicting house prices using features such as area, number of rooms, and location.

import pandas as pd

# Sample dataset
data = {
    "area": [800, 1200, 1500, 1800],
    "rooms": [2, 3, 4, 4],
    "price": [200000, 300000, 350000, 400000]
}

df = pd.DataFrame(data)
df

Key Concepts of Machine Learning

Machine learning is broadly classified into four categories:

1. Supervised Learning

Uses labeled data where the output is known. The model learns a mapping between input and output.

from sklearn.linear_model import LinearRegression

X = df[["area", "rooms"]]
y = df["price"]

model = LinearRegression()
model.fit(X, y)

2. Unsupervised Learning

Works with unlabeled data to identify patterns or clusters.

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=2)
df["cluster"] = kmeans.fit_predict(X)
df

3. Semi-Supervised Learning

Combines a small amount of labeled data with a large amount of unlabeled data to improve learning efficiency.

# Conceptual example
# Libraries such as sklearn-semi-supervised can be used
# Commonly applied in text and image classification

4. Reinforcement Learning

An agent learns by interacting with an environment and receiving rewards or penalties.

# Conceptual example
# Commonly implemented using OpenAI Gym or Stable-Baselines

Core Machine Learning Components

Algorithms

Algorithms define how learning happens.

from sklearn.tree import DecisionTreeRegressor

tree_model = DecisionTreeRegressor()
tree_model.fit(X, y)

Models

Models store learned relationships and are used for prediction.

predicted_price = model.predict([[1600, 3]])
predicted_price

Training

Training adjusts model parameters to reduce prediction error.

# Training already performed using model.fit()

Testing

Testing evaluates model performance on unseen data.

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2
)

model.fit(X_train, y_train)
predictions = model.predict(X_test)

mean_squared_error(y_test, predictions)

How Machine Learning Works (Step-by-Step)

1. Data Collection

Data can be gathered from APIs, databases, web scraping, or public datasets. Ethical data usage is essential.

import seaborn as sns

dataset = sns.load_dataset("tips")
dataset.head()

2. Data Preprocessing

This step cleans and prepares the data for modeling.

# Handling missing values
dataset.fillna(dataset.mean(numeric_only=True), inplace=True)

# Encoding categorical variables
dataset = pd.get_dummies(dataset, drop_first=True)

3. Model Training

The dataset is split into training and testing subsets, and a suitable algorithm is applied.

from sklearn.linear_model import LogisticRegression

X = dataset.drop("tip", axis=1)
y = dataset["tip"]

model = LogisticRegression(max_iter=1000)
model.fit(X, y)

4. Model Evaluation

Evaluation metrics help assess model performance.

from sklearn.metrics import accuracy_score

predictions = model.predict(X)
accuracy_score(y, predictions)

5. Model Deployment

The trained model is integrated into real-world applications.

import joblib

joblib.dump(model, "ml_model.pkl")

Visualization and Interpretation

Visualizations help understand model behavior and feature relationships.

import matplotlib.pyplot as plt

plt.scatter(df["area"], df["price"])
plt.xlabel("Area")
plt.ylabel("Price")
plt.title("Area vs House Price")
plt.show()

Challenges and Ethical Considerations in Machine Learning

Privacy Concerns

Sensitive data must be protected and anonymized.

# Removing personal identifiers
df.drop(columns=["user_id"], errors="ignore", inplace=True)

Bias in Data and Algorithms

Biased data can produce unfair predictions. Balanced datasets help reduce this risk.

# Checking class distribution
y.value_counts(normalize=True)

Interpretability and Transparency

Models should be explainable to build trust.

# Coefficients of linear regression
model.coef_

Societal Impact

Machine learning can both empower and disrupt society. Responsible deployment ensures fairness, transparency, and trust.

Conclusion

Machine learning simplifies complex decision-making across industries, but it must be used responsibly. This article explained:

Machine learning fundamentals
Core learning types
End-to-end ML workflow
Ethical challenges and best practices

By combining technical expertise with ethical responsibility, machine learning can drive sustainable innovation and positive societal impact.