Skip to content

DEV Community

techtech

Posted on Apr 9

macOS AI Agent

#ai #agent #programming #python

Building a Gemini AI Assistant for macOS

I created the first macOS AI-powered Agent:

Meet the AI-Agent, a native macOS application that integrates with Google's Gemini AI to provide a seamless assistant experience. This project is open-source, and we encourage you to test it, modify it, and contribute to its development.

SchBenedikt / ai-agent

Testing macOS AI Agent with Google Gemini Live Web API

Gemini Assistant macOS App

A native macOS application that connects to Google's Gemini AI. The app automatically accesses your camera and microphone to provide a seamless AI assistant experience

Features

Audio input through your microphone
Visual context through your camera
Text responses displayed in the app
Audio responses played through your speakers

Setup

Prerequisites

Python 3.8+
A Google Gemini API key

Installation

Install the required dependencies:

pip install google-generativeai opencv-python pyaudio pillow mss PyQt5 pynput python-dotenv pyinstaller

Set your Gemini API key as an environment variable (optional):
```
export GEMINI_API_KEY="your-api-key-here"
```
If not set as an environment variable, the app will ask for it on startup.

Building the macOS App

There are two ways to build the app:

Method 1: Using PyInstaller (Recommended)

PyInstaller creates a more reliable standalone application that better handles dependencies:

Make sure PyInstaller is installed:
```
pip install pyinstaller
```
Run the build process:
```
# First clean any previous builds
```
…

What is AI-Agent?

The Gemini Assistant is a macOS application designed to:

Capture audio input through your microphone.
Use your camera for visual context.
Provide AI-powered responses via text.

The app leverages Google's Gemini AI for natural language understanding and response generation, making it a powerful tool for productivity and interaction.

Features

Audio Input: Speak to the assistant using your microphone.
Visual Context: The app uses your camera to gather additional context.
Text Responses: Get responses displayed in the app
Customizable: Modify the code to add new features or improve existing ones

How It Works

The application is built using Python and integrates several libraries:

PyQt5: For the user interface.
OpenCV: For camera access and visual processing.
PyAudio: For capturing and playing audio.
Google Generative AI: For natural language processing.
Python-dotenv: For managing environment variables.

The app uses a .env file to store your Google Gemini API key securely. If the file doesn't exist, the app will create one for you.

Getting Started

Prerequisites

Python 3.8 or higher
A Google Gemini API key

Installation

Clone the repository:

   git clone https://github.com/SchBenedikt/ai-agent.git
   cd ai-agent

Install the required dependencies:

   pip install -r requirements.txt

Set your Gemini API key in the .env file:

   echo "GEMINI_API_KEY=your-api-key" > .env

Running the App

To run the app directly without building:

python app.py

Building the App

You can build a standalone macOS application using PyInstaller:

pyinstaller gemini.spec

The app will be created in the dist folder as Gemini Assistant.app.

Contributing

We welcome contributions! Here are some ways you can help:

Test the App: Run the app and report any issues.
Improve the Code: Add new features or optimize existing ones.
Documentation: Help us improve the documentation.

Feedback

We'd love to hear your thoughts! Share your feedback, suggestions, or issues in the GitHub repository.

Conclusion

The Gemini Assistant is a powerful example of how AI can be integrated into everyday applications.

I hope you find this project as useful and enjoyable as it is!

Thanks for reading,
techtech

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Top comments (0)

Subscribe