Forem: Syed

Building an Offline AI App That Lets You Chat With PDFs on Android

Syed — Sat, 07 Mar 2026 05:09:12 +0000

Most AI tools that analyze documents require uploading your PDFs to the cloud.

That means your files are processed on remote servers, which can be a privacy concern—especially when dealing with personal or sensitive documents.

I wanted a different approach.

So I built EdgeDox, an Android app that lets you chat with PDFs using AI that runs completely on-device.

Why Offline AI?

Many users are uncomfortable uploading documents to external servers. By running the AI locally on the device, we get several advantages:

• Privacy-first – documents never leave the phone
• Offline functionality – works without internet
• Faster interaction – no network latency
• Better control over data

What EdgeDox Does

The app allows users to interact with their documents using natural language.

You can:

Ask questions about a PDF
Generate summaries
Extract key information
Quickly navigate large documents

Everything happens locally on the phone.

The Idea

I’ve always been interested in on-device AI and privacy-first applications. With recent improvements in lightweight AI models, it’s becoming possible to run useful AI workloads directly on mobile devices.

EdgeDox is an experiment in pushing this idea further—making document AI tools work without relying on cloud infrastructure.

Current Status

The app is still early, but it recently got its first few subscribers, which is a big milestone for an indie developer.

Now I'm focusing on improving:

response quality
document processing speed
user experience

Try It

If you're interested in offline AI or privacy-focused apps, you can check it out here:

https://play.google.com/store/apps/details?id=io.cyberfly.edgedox

I’d love feedback from the developer community on how to improve it further.

This app is full vibe coded

Syed — Thu, 05 Mar 2026 02:08:30 +0000

Building an Offline AI App That Lets You Chat With PDFs on Android

Syed ・ Mar 5

#ai #android #privacy #showdev

Building an Offline AI App That Lets You Chat With PDFs on Android

Syed — Thu, 05 Mar 2026 02:07:25 +0000

Most AI tools that analyze documents require uploading your PDFs to the cloud.

That means your files are processed on remote servers, which can be a privacy concern—especially when dealing with personal or sensitive documents.

I wanted a different approach.

So I built EdgeDox, an Android app that lets you chat with PDFs using AI that runs completely on-device.

Why Offline AI?

Many users are uncomfortable uploading documents to external servers. By running the AI locally on the device, we get several advantages:

• Privacy-first – documents never leave the phone
• Offline functionality – works without internet
• Faster interaction – no network latency
• Better control over data

What EdgeDox Does

The app allows users to interact with their documents using natural language.

You can:

Ask questions about a PDF
Generate summaries
Extract key information
Quickly navigate large documents

Everything happens locally on the phone.

The Idea

I’ve always been interested in on-device AI and privacy-first applications. With recent improvements in lightweight AI models, it’s becoming possible to run useful AI workloads directly on mobile devices.

EdgeDox is an experiment in pushing this idea further—making document AI tools work without relying on cloud infrastructure.

Current Status

The app is still early, but it recently got its first few subscribers, which is a big milestone for an indie developer.

Now I'm focusing on improving:

response quality
document processing speed
user experience

Try It

If you're interested in offline AI or privacy-focused apps, you can check it out here:

https://play.google.com/store/apps/details?id=io.cyberfly.edgedox

I’d love feedback from the developer community on how to improve it further.

How I ran LLM + RAG fully offline on Android using MNN

Syed — Sat, 14 Feb 2026 18:02:27 +0000

Running LLM + RAG Fully Offline on Android Using MNN (No Cloud, No API)

Most AI apps today depend completely on the cloud.

Upload your document → send to API → wait for response → pay per request.
And if internet is slow or unavailable? The AI stops working.

I wanted to explore something different:
Can we run a complete LLM + RAG pipeline fully offline on a mobile device?

After months of experimentation, optimization, and many failed attempts, I finally got a working offline document AI running entirely on-device. Here’s what I learned.

🎯 Goal

Build a document assistant that:

Runs fully offline
Uses no external API
Keeps documents local
Works on mid-range Android devices
Provides usable response speed

Not just a demo — something actually practical.

🧠 Architecture Overview

The system is a fully local RAG pipeline running on-device:

Pipeline:

User loads PDF/document
Text extracted locally
Converted into embeddings
Stored in local vector index
User asks question
Relevant chunks retrieved
Local LLM generates answer

Everything happens inside the device. No cloud calls.

⚙️ Tech Stack

LLM (Quantized)

Smaller quantized models optimized for CPU inference.
Main challenge: balancing size vs response quality.

Embeddings (On-device)

Multilingual embeddings generated locally and stored for retrieval.

Vector Storage

Lightweight local vector index for fast retrieval without heavy RAM usage.

MNN (Mobile Neural Network)

The biggest unlock.

MNN provides:

Efficient CPU inference
Mobile-optimized runtime
Low memory overhead
Faster load vs some other runtimes I tested

For on-device AI, runtime efficiency matters more than raw model size.

🚧 Major Challenges

1. Memory limits on mid-range phones

High-end phones are easy.
Real challenge: 4–6GB RAM devices.

Solutions:

Aggressive quantization
Model size tuning
Streaming token generation
Careful memory release

2. Model loading time

Large models = slow startup.

Fix:

Preload strategy
Lazy loading
Smaller embedding models

3. Embedding speed

Generating embeddings locally was initially slow.

Optimizations:

Batch processing
Lightweight embedding models
Efficient tensor handling in MNN

4. Response usability

Offline LLM must still feel usable.

Tradeoffs:

Slightly slower than cloud
But instant availability
Zero latency from network

📊 Current Performance (mid-range Android)

Fully offline end-to-end
No internet required
Works on mid-range devices
Private document processing
No API cost

Still optimizing:

response speed
model quality
memory usage

🔐 Why Offline AI Matters

Cloud AI is powerful, but comes with trade-offs:

Privacy concerns
Recurring API cost
Internet dependency
Latency
Not usable in low-connectivity regions

Offline AI flips this model.

Use cases:

Students with limited internet
Journalists handling sensitive docs
Developers
Researchers
Privacy-first users

🧪 Lessons Learned

Biggest insight:
Offline AI on mobile is no longer impractical.

With the right optimization:

Quantized models
Efficient runtime (like MNN)
Lightweight RAG pipeline

…it becomes usable today.

Not perfect yet, but very real.

🔮 What’s Next

Currently exploring:

Faster token generation
Better small models
Multi-document knowledge base
Offline voice input
Cross-platform support

Long term: building a fully offline AI ecosystem.

🤝 Looking for Feedback

Curious if others here are experimenting with:

On-device LLMs
Offline RAG
Mobile AI inference
MNN / llama.cpp / other runtimes

What models or runtimes are you using?
Is there real demand for offline/private AI vs cloud?

Would love to hear thoughts from the community.

📱 Demo App (for testing)

If anyone wants to try the current implementation:
https://play.google.com/store/apps/details?id=io.cyberfly.edgedox

Mainly looking for technical feedback and ideas from other builders working on local AI.

rust docker multiplatform optimization https://medium.com/@syedabuthahir/how-i-cut-my-rust-multi-platform-docker-build-time-from-2-hours-to-18-minutes-a0dbc1016f9d

Syed — Fri, 07 Nov 2025 05:52:29 +0000

Rust + Docker + GitHub Actions: My Multi-Arch Build Went from 2 Hours to 18 Minutes | by Syed Abuthahir | Nov, 2025 | Medium

A simple CI/CD optimization that saved me hours every time I pushed code

medium.com