Jonathan Gastón Löwenstern

Posted on May 12

Using Ollama with Python: A Simple Guide

#howto #ollama #ai #python

Once you’ve installed Ollama and experimented with running models from the command line, the next logical step is to integrate these powerful AI capabilities into your Python applications. This guide will show you how to use Ollama with Python.

Setting Up

First, make sure Ollama is installed and running on your system.

You can check this other article Getting Started with Ollama: Run LLMs on Your Computer if you are no familiar with Ollama yet.

Required Ollama Models

Before running the Python examples in this guide, make sure you have the necessary models pulled. You can pull them using the Ollama CLI:

Pull the models used in these examples

ollama pull llama3.2:1b

You only need to pull these models once. Check which models you already have with:

ollama list

Creating a Virtual Environment

It’s a good practice to use a virtual environment for your Python projects. This keeps your dependencies isolated and makes your project more portable:

# Create a virtual environment  

python -m venv ollama-env  

# Activate the virtual environment  

# On Windows:  

ollama-env\Scripts\activate  

# On macOS/Linux:  
source ollama-env/bin/activate

Installing Dependencies

Install the Ollama Python library:

pip install ollama

Creating a requirements.txt

For better project management, create a requirements.txt file:

pip freeze > requirements.txt

To install from this file in the future:

pip install -r requirements.txt

Basic Usage

Let’s start with a simple example using the Llama 3.2 1B model.

Create a file named generate.py with this content:

from ollama import generate  
# Regular response  
response = generate('llama3.2:1b', 'Why is the sky blue?')  
print(response['response'])

This will output the model’s explanation of why the sky is blue as a complete response.

Streaming Responses

For a more interactive experience, you can get the response as it’s being generated.

Create a file named generate-stream.py with this content:

from ollama import generate  
# Streaming response  
print("Streaming response:")  
for chunk in generate('llama3.2:1b', 'Why is the sky blue?', stream=True):  
    print(chunk['response'], end='', flush=True)  
print()  # New line at the end

This displays the response incrementally as it’s generated, creating a more interactive experience.

Why `for chunk in generate` is used?

When you use the streaming functionality with Ollama, the response isn’t returned all at once. Instead, it’s broken into small pieces (chunks) that arrive one at a time as they’re generated by the model.

The generate() function with stream=True returns an iterator in Python. This iterator yields new chunks of text as they become available from the model. The for loop processes these chunks one by one as they arrive:

Each chunk contains a small piece of the response in chunk['response']
The end='' parameter prevents adding newlines between chunks
The flush=True ensures text displays immediately

This creates the effect of watching the AI “think” in real-time, similar to watching someone type.

Using System Prompts

The system prompt allows you to set context and instructions for the model before the conversation starts. It’s a powerful way to define the model’s behavior.

Create a file named chat-system-role.py with this content:

from ollama import chat  

# Define a system prompt  
system_prompt = "You speaks and sounds like a pirate with short sentences."  
# Chat with a system prompt  
response = chat('llama3.2:1b',   
                messages=[  
                    {'role': 'system', 'content': system_prompt},  
                    {'role': 'user', 'content': 'Tell me about your boat.'}  
                ])  
print(response.message.content)

The system prompt stays active throughout the conversation, influencing how the model responds to all user inputs.

Conversational Context

Maintain a conversation with context using streaming for a more interactive experience.

Create a file named chat-history-stream.py with this content:

from ollama import chat  

# Initialize an empty message history  
messages = []  
while True:  
    user_input = input('Chat with history: ')  
    if user_input.lower() == 'exit':  
        break  
    # Get streaming response while maintaining conversation history  
    response_content = ""  
    for chunk in chat(  
        'llama3.2:1b',  
        messages=messages + [  
            {'role': 'system', 'content': 'You are a helpful assistant. You only give a short sentence by answer.'},  
            {'role': 'user', 'content': user_input},  
        ],  
        stream=True  
    ):  
        if chunk.message:  
            response_chunk = chunk.message.content  
            print(response_chunk, end='', flush=True)  
            response_content += response_chunk  
    # Add the exchange to the conversation history  
    messages += [  
        {'role': 'user', 'content': user_input},  
        {'role': 'assistant', 'content': response_content},  
    ]  
    print('\n')  # Add space after response

Here you can see how this example looks like.

Conclusion

The Ollama Python library makes it easy to integrate powerful language models into your Python applications. Whether you’re building a simple script or a complex application, the library’s straightforward API allows you to focus on creating value rather than managing the underlying AI infrastructure.

As you become more comfortable with the basics, explore more advanced features and consider how you can use these capabilities to solve real-world problems in your projects.

Resources

In this GitHub repository, you'll find working code examples: GitHub Repository

The email service that speaks your language

Whether you code in Ruby, PHP, Python, C#, or Rails, Postmark's robust API libraries make integration a breeze. Plus, bootstrapping your startup? Get 20% off your first three months!

Start free

Top comments (0)

ACI.dev: Fully Open-source AI Agent Tool-Use Infra (Composio Alternative)

100% open-source tool-use platform (backend, dev portal, integration library, SDK/MCP) that connects your AI agents to 600+ tools with multi-tenant auth, granular permissions, and access through direct function calling or a unified MCP server.

Check out our GitHub!

Forem

Using Ollama with Python: A Simple Guide

Setting Up

Required Ollama Models

Pull the models used in these examples

Creating a Virtual Environment

Installing Dependencies

Creating a requirements.txt

Basic Usage

Streaming Responses

Why `for chunk in generate` is used?

Using System Prompts

Conversational Context

Conclusion

Resources

The email service that speaks your language

Top comments (0)

ACI.dev: Fully Open-source AI Agent Tool-Use Infra (Composio Alternative)

Join the Runner H "AI Agent Prompting" Challenge: $10,000 in Prizes for 20 Winners!

Setting Up

Required Ollama Models

Pull the models used in these examples

Creating a Virtual Environment

Installing Dependencies

Creating a requirements.txt

Basic Usage

Streaming Responses

Why for chunk in generate is used?

Using System Prompts

Conversational Context

Conclusion

Resources

The email service that speaks your language

ACI.dev: Fully Open-source AI Agent Tool-Use Infra (Composio Alternative)

Join the Runner H "AI Agent Prompting" Challenge: $10,000 in Prizes for 20 Winners!

Why `for chunk in generate` is used?