Building a Gemini AI Assistant for macOS
I created the first macOS AI-powered Agent:
Meet the AI-Agent, a native macOS application that integrates with Google's Gemini AI to provide a seamless assistant experience. This project is open-source, and we encourage you to test it, modify it, and contribute to its development.
SchBenedikt
/
ai-agent
Testing macOS AI Agent with Google Gemini Live Web API
Gemini Assistant macOS App
A native macOS application that connects to Google's Gemini AI. The app automatically accesses your camera and microphone to provide a seamless AI assistant experience
Features
- Audio input through your microphone
- Visual context through your camera
- Text responses displayed in the app
- Audio responses played through your speakers
Setup
Prerequisites
- Python 3.8+
- A Google Gemini API key
Installation
-
Install the required dependencies:
pip install google-generativeai opencv-python pyaudio pillow mss PyQt5 pynput python-dotenv pyinstaller
-
Set your Gemini API key as an environment variable (optional):
export GEMINI_API_KEY="your-api-key-here"
If not set as an environment variable, the app will ask for it on startup.
Building the macOS App
There are two ways to build the app:
Method 1: Using PyInstaller (Recommended)
PyInstaller creates a more reliable standalone application that better handles dependencies:
-
Make sure PyInstaller is installed:
pip install pyinstaller
-
Run the build process:
…# First clean any previous builds
What is AI-Agent?
The Gemini Assistant is a macOS application designed to:
- Capture audio input through your microphone.
- Use your camera for visual context.
- Provide AI-powered responses via text.
The app leverages Google's Gemini AI for natural language understanding and response generation, making it a powerful tool for productivity and interaction.
Features
- Audio Input: Speak to the assistant using your microphone.
- Visual Context: The app uses your camera to gather additional context.
- Text Responses: Get responses displayed in the app
- Customizable: Modify the code to add new features or improve existing ones
How It Works
The application is built using Python and integrates several libraries:
- PyQt5: For the user interface.
- OpenCV: For camera access and visual processing.
- PyAudio: For capturing and playing audio.
- Google Generative AI: For natural language processing.
- Python-dotenv: For managing environment variables.
The app uses a .env
file to store your Google Gemini API key securely. If the file doesn't exist, the app will create one for you.
Getting Started
Prerequisites
- Python 3.8 or higher
- A Google Gemini API key
Installation
- Clone the repository:
git clone https://github.com/SchBenedikt/ai-agent.git
cd ai-agent
- Install the required dependencies:
pip install -r requirements.txt
- Set your Gemini API key in the
.env
file:
echo "GEMINI_API_KEY=your-api-key" > .env
Running the App
To run the app directly without building:
python app.py
Building the App
You can build a standalone macOS application using PyInstaller:
pyinstaller gemini.spec
The app will be created in the dist
folder as Gemini Assistant.app
.
Contributing
We welcome contributions! Here are some ways you can help:
- Test the App: Run the app and report any issues.
- Improve the Code: Add new features or optimize existing ones.
- Documentation: Help us improve the documentation.
Feedback
We'd love to hear your thoughts! Share your feedback, suggestions, or issues in the GitHub repository.
Conclusion
The Gemini Assistant is a powerful example of how AI can be integrated into everyday applications.
I hope you find this project as useful and enjoyable as it is!
Thanks for reading,
techtech
Top comments (0)