DEV Community

Cover image for macOS AI Agent
techtech
techtech

Posted on

macOS AI Agent

Building a Gemini AI Assistant for macOS

Image description

I created the first macOS AI-powered Agent:

Meet the AI-Agent, a native macOS application that integrates with Google's Gemini AI to provide a seamless assistant experience. This project is open-source, and we encourage you to test it, modify it, and contribute to its development.

GitHub logo SchBenedikt / ai-agent

Testing macOS AI Agent with Google Gemini Live Web API

Gemini Assistant macOS App

A native macOS application that connects to Google's Gemini AI. The app automatically accesses your camera and microphone to provide a seamless AI assistant experience image

Features

  • Audio input through your microphone
  • Visual context through your camera
  • Text responses displayed in the app
  • Audio responses played through your speakers

Setup

Prerequisites

  1. Python 3.8+
  2. A Google Gemini API key

Installation

  1. Install the required dependencies:

    pip install google-generativeai opencv-python pyaudio pillow mss PyQt5 pynput python-dotenv pyinstaller
    
  2. Set your Gemini API key as an environment variable (optional):

    export GEMINI_API_KEY="your-api-key-here"
    

    If not set as an environment variable, the app will ask for it on startup.

Building the macOS App

There are two ways to build the app:

Method 1: Using PyInstaller (Recommended)

PyInstaller creates a more reliable standalone application that better handles dependencies:

  1. Make sure PyInstaller is installed:

    pip install pyinstaller
    
  2. Run the build process:

    # First clean any previous builds

What is AI-Agent?

The Gemini Assistant is a macOS application designed to:

  • Capture audio input through your microphone.
  • Use your camera for visual context.
  • Provide AI-powered responses via text.

The app leverages Google's Gemini AI for natural language understanding and response generation, making it a powerful tool for productivity and interaction.

Features

  • Audio Input: Speak to the assistant using your microphone.
  • Visual Context: The app uses your camera to gather additional context.
  • Text Responses: Get responses displayed in the app
  • Customizable: Modify the code to add new features or improve existing ones

How It Works

The application is built using Python and integrates several libraries:

  • PyQt5: For the user interface.
  • OpenCV: For camera access and visual processing.
  • PyAudio: For capturing and playing audio.
  • Google Generative AI: For natural language processing.
  • Python-dotenv: For managing environment variables.

The app uses a .env file to store your Google Gemini API key securely. If the file doesn't exist, the app will create one for you.

Getting Started

Prerequisites

Installation

  1. Clone the repository:
   git clone https://github.com/SchBenedikt/ai-agent.git
   cd ai-agent
Enter fullscreen mode Exit fullscreen mode
  1. Install the required dependencies:
   pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode
  1. Set your Gemini API key in the .env file:
   echo "GEMINI_API_KEY=your-api-key" > .env
Enter fullscreen mode Exit fullscreen mode

Running the App

To run the app directly without building:

python app.py
Enter fullscreen mode Exit fullscreen mode

Building the App

You can build a standalone macOS application using PyInstaller:

pyinstaller gemini.spec
Enter fullscreen mode Exit fullscreen mode

The app will be created in the dist folder as Gemini Assistant.app.

Contributing

We welcome contributions! Here are some ways you can help:

  • Test the App: Run the app and report any issues.
  • Improve the Code: Add new features or optimize existing ones.
  • Documentation: Help us improve the documentation.

Feedback

We'd love to hear your thoughts! Share your feedback, suggestions, or issues in the GitHub repository.

Conclusion

The Gemini Assistant is a powerful example of how AI can be integrated into everyday applications.

I hope you find this project as useful and enjoyable as it is!

Thanks for reading,
techtech

Image of PulumiUP 2025

From Cloud to Platforms: What Top Engineers Are Doing Differently

Hear insights from industry leaders about the current state and future of cloud and IaC, platform engineering, and security.

Save Your Spot

Top comments (0)