Forem: Saga

An open-source project that enables zero-cost replication of the $200/month pro version of ChatGPT for deep research.

Saga — Fri, 14 Feb 2025 08:34:37 +0000

Recently, a trending project called "deep-research" on GitHub has replicated ChatGPT's new Deep Research feature. In theory, it can integrate with any large AI model and combine with internet search services, enabling the AI to autonomously search for information, delve deeper into a topic, and generate a research report. However, it’s currently limited to terminal use, which raises the barrier to entry. Is there a more user-friendly alternative?

There's a pure front-end webpage that visually displays the entire search process and responds very quickly. You can ask it any questions you're curious about—for instance, if you've heard rumors about GPT-4.5 and GPT-5, you can have it search the web for relevant information and then summarize a report for you.

The repository URL is: https://github.com/AnotiaWang/deep-research-web-ui

The usage is straightforward—simply configure these two services on the webpage:

API key for the large model service, currently supporting OpenAI-compatible services like OpenRouter, DeepSeek, etc.
API key for the web search service, currently supporting Tavily, which allows 1,000 free searches per month. Just generate an API key at https://app.tavily.com/home.

The entire process is completed locally in your browser, ensuring data security.

Free DeepSeek playground: build a small app

Saga — Wed, 08 Jan 2025 12:42:55 +0000

https://deepseek.edgeone.app/

Free Developer Resources Mind Map | Tools, Services, and Platforms

Saga — Tue, 08 Oct 2024 07:03:49 +0000

Live Demo: https://free-for-dev.edgeone.app

This developer free resources mind map project is derived from the GitHub project: https://github.com/ripienaar/free-for-dev

This project showcases a large number of free cloud services, including SaaS, PaaS, IaaS, etc., making it easy for businesses and independent developers to quickly launch their products at low or even no cost.

You can now use Chrome's native AI in the official version of Chrome.

Saga — Thu, 26 Sep 2024 12:15:43 +0000

Live Demo: https://chrome-ai.edgeone.app

Chrome built-in AI initially required filling out an application form and could only be experienced in the developer version of Chrome. Now, users can enable this feature in the official version with just a few simple steps.

After completing the configuration according to the instructions on the webpage, you can access the debug page. Here, you can quickly modify the code and experience the powerful capabilities of local AI.

Note: The Chrome API is still in draft stage and may undergo significant changes. This webpage is developed based on Chrome version 129 and is not compatible with the API of version 128.

Why is there Chrome Local AI?

In the past, when using AI applications, we usually relied on server-side solutions, raising privacy concerns for some users.
Some developers have tried to migrate AI models into the browser, but the model size is typically around a thousand times that of the median webpage size. Since these models are not shared before the website loads, accessing different webpages requires re-downloading these models, which is a resource-intensive solution for users.

Therefore, Chrome integrates Gemini Nano in the browser and exposes standard Web platform APIs, aiming to run on most desktops and laptops. With Chrome's built-in AI capabilities, your website can quickly execute AI-driven tasks without the need to deploy or manage your own AI models.

Currently, users can directly call large models locally in a privacy-safe mode on the webpage, performing functions such as Q&A and translation.

Benefits of Built-in AI for Web Developers?

Simple Deployment: The browser automatically distributes the models, considering the device's capabilities and managing model updates. This means you are not responsible for downloading or updating large models over the network, nor do you have to worry about storage releases, runtime memory limitations, service costs, and other issues.
Access to Hardware Acceleration: The browser's AI runtime is optimized to make full use of available hardware resources, whether it’s GPU, NPU, or falling back to CPU. As a result, your application can achieve optimal performance on every device.

Benefits of Running AI on Device?

Local Processing of Sensitive Data: AI on devices can enhance your privacy protection. For instance, if you deal with sensitive data, you can offer AI features with end-to-end encryption to users.
Responsive User Experience: In some cases, eliminating the round-trip to the server means providing almost instantaneous results. AI on devices can be the key differentiator between usable features and suboptimal user experiences.
Broader Access to AI: Users’ devices can share part of the processing load in exchange for more features. For example, if you offer advanced AI functionalities, you can preview these features through on-device AI, letting potential customers understand the advantages of your product without increasing your costs. This hybrid approach can also help you manage inference costs, especially in frequently used user flows.
Offline AI Usage: Your users can access AI features even without an internet connection. This means your website and web applications can function normally in offline or unstable network conditions.

Browser Architecture and APIs

The built-in AI capabilities are primarily accessed via the Task API. The Task API is designed to run inference with the best assigned model.

In Chrome, these APIs aim to run inference for Gemini Nano through fine-tuning or expert models. Gemini Nano is designed to run locally on most modern devices and is best suited for language-related use cases such as summarization, rewriting, or classification.

Key Term: Fine-tuning is a dynamic approach to enhance a model's ability to perform specific tasks without the need to download a new model for each task.

Prompt API: Send any task expressed in natural language to the built-in large language model (Gemini Nano in Chrome).
Fine-tuning (LoRA) API: Adjust the model's weights using low-rank adaptive fine-tuning to improve the performance of the built-in LLM on the task.

What capabilities can be provided to users?

AI-enhanced content consumption: including summaries, translations, answering content-related questions, classification, and feature analysis.
AI-supported content creation: including writing assistance, proofreading, grammar correction, and rewriting.

Summary API:

Meeting notes summary for users who joined the meeting late or completely missed it.
Key points in customer relationship management support dialogues.
Sentence or paragraph-sized summaries of multiple product reviews.
Key points of long articles to help readers determine if the article is relevant.
Summarizing questions in forums to help experts find the most relevant questions in their field of expertise.

Writing and rewriting API:

Writing based on initial ideas and optional background. For example, writing a formal email to a bank requesting a credit limit increase, with the background being that you are a long-term customer.
Optimizing existing content by adjusting the length or tone of the text. For example, rewriting a short email to make it sound more polite and formal.

Integrate nearly real-time free multi-language translation in the application, based on Chrome AI API

Saga — Tue, 11 Jun 2024 07:26:51 +0000

Chrome has integrated AI capabilities in the latest version of Chrome Dev (version 127.0.6512.0 and above), provided in the form of experimental flags.

Download the latest Chrome Dev: https://www.google.com/intl/en_us/chrome/dev/

Chrome Dev Configuration

Verify that the Chrome Dev version is higher than 127.0.6512.0
In the URL input: chrome://flags/#optimization-guide-on-device-model, choose Enabled BypassPerfRequirement to allow the model to download smoothly.
In the URL input: chrome://flags/#prompt-api-for-gemini-nano, select Enabled.
Wait for the model to finish downloading. You can check whether the download is complete at chrome://components/. If it does not start downloading automatically, you can click Check for update to force the download, which will need to download about 1GB of content. When you see Version: 2024.65.2205, it means it can be used. Restart Chrome Dev.

API Capability Testing

Open the command line with cmd + option + I, enter await window.ai.canCreateTextSession();, when you see "readily" it means it can be used.

Case 1: Rewriting the tone of the text

We can see that with just two lines of code, we can solve the text expression problem that troubles many people, and it can be done with extremely fast speed and excellent privacy.

Case Study 2: Text Translation

Complete text translation in a fast and free way, making multi-language display of any application more convenient.

Integration within the application

Our app https://timmerse.com is a customizable 3D immersive world, suitable for work and entertainment. Create a space to achieve immersive connections between people. Combining video calls and custom 3D worlds, integrating AI NPC, makes gatherings in work and life more creative and enjoyable.

When playing videos in the OpenDay scene, we can easily translate and display the original English subtitles into bilingual subtitles in real time according to the user's Chrome language preference.

Of course, the llm model is not just for translation. With the wide spread of various end-side models and multimodal, it will definitely change the way people interact with devices in various ways, improving the efficiency of life and work.