Forem: Ragie

How Ragie Outperformed the FinanceBench Test

Ibrahim Salami — Wed, 23 Oct 2024 16:22:01 +0000

In this article, we’ll walk you through how Ragie handled the ingestion of over 50,000+ pages in the FinanceBench dataset (360 PDF files, each roughly 150-250 pages long) in just 4 hours and outperformed the benchmarks in key areas like the Shared Store configuration, where we beat the benchmark by 42%.

For those unfamiliar, the FinanceBench is a rigorous benchmark designed to evaluate RAG systems using real-world financial documents, such as 10-K filings and earnings reports from public companies. These documents are dense, often spanning hundreds of pages, and include a mixture of structured data like tables and charts with unstructured text, making it a challenge for RAG systems to ingest, retrieve, and generate accurate answers.

In the FinanceBench test, RAG systems are tasked with answering real-world financial questions by retrieving relevant information from a dataset of 360 PDFs. The retrieved chunks are fed into a large language model (LLM) to generate the final answer. This test pushes RAG systems to their limits, requiring accurate retrieval across a vast dataset and precise generation from complex financial data.

The Complexity of Document Ingestion in FinanceBench

Ingesting complex financial documents at scale is a critical challenge in the FinanceBench test. These filings contain crucial financial information, legal jargon, and multi-modal content, and they require advanced ingestion capabilities to ensure accurate retrieval.

Document Size and Format Complexity: Financial datasets consist of structured tables and unstructured text, requiring a robust ingestion pipeline capable of parsing and processing both data types.
Handling Large Documents: The 10-K can be overwhelming as the document often exceeds 150 pages, so your RAG system must efficiently manage thousands of pages and ensure that ingestion speed does not compromise accuracy (a tough capability to build).

‍How we Evaluated Ragie using the FinanceBench test

The RAG system was tasked with answering 150 complex real-world financial questions. This rigorous evaluation process was pivotal in understanding how effectively Ragie could retrieve and generate answers compared to the gold answers set by human annotators.

Each entry features a question (e.g., "Did AMD report customer concentration in FY22?"), the corresponding answer (e.g., “Yes, one customer accounted for 16% of consolidated net revenue”), and an evidence string that provides the necessary information to verify the accuracy of the answer, along with the relevant document's page number.

Grading Criteria:

Accuracy: Matching the gold answers for correct responses.
Refusals: Cases where the LLM avoided answering, reducing the likelihood of hallucinations.
Inaccurate Responses: Instances where incorrect answers were generated.

Ragie’s Performance vs. FinanceBench Benchmarks

We evaluated Ragie across two configurations:

Single-Store Retrieval: In this setup, the vector database contains chunks from a single document, and retrieval is limited to that document. Despite being simpler, this setup still presents challenges when dealing with large, complex financial filings.

We matched the benchmark for Single Vector Store retrieval, achieving 51% accuracy using the setup below:

Top\_k=32, No rerank

Shared Store Retrieval: In this more complex setup, the vector database contains chunks from all 360 documents, requiring retrieval across the entire dataset. Ragie had a 27% accuracy compared to the benchmark of 19% for Shared Store retrieval, outperforming the benchmark by 42% using this setup:

Top\_k=8, No rerank

The Shared Store retrieval is a more challenging task since retrieval happens across all documents simultaneously; ensuring relevance and precision becomes significantly more difficult because the RAG system needs to manage content from various sources and maintain high retrieval accuracy despite the larger scope of data.

Key Insights:

In a second Single Store run with top_k=8, we ran two tests with rerank on and off:
- Without rerank, the test was 50% correct, 32% refusals, and 18% incorrect answers.
- With rerank on, the test was 50% correct, but refusals increased to 37%, and incorrect answers dropped to 13%.
- Conclusion: Reranking effectively reduced hallucinations by 16%
There was no significant difference between GPT-4o and GPT-4 Turbo’s performance during this test.

Why Ragie Outperforms: The Technical Advantages

Advanced Ingestion Process: Ragie's advanced extraction in hi_res mode enables it to extract all the information from the PDFs using a multi-step extraction process described below:
- Text Extraction: Firstly, we efficiently extract text from PDFs during ingestion to retain the core information.**
- Tables and Figures: For more complex elements like tables and images, we use advanced optical character recognition (OCR) techniques to extract structured data accurately.
- LLM Vision Models: Ragie also uses LLM vision models to generate descriptions for images, charts, and other non-text elements. This adds a semantic layer to the extraction process, making the ingested data richer and more contextually relevant.
Hybrid Search: We use hybrid search by default, which gives you the power of semantic search (for understanding context) and keyword-based retrieval (for capturing exact terms). This dual approach ensures precision and recall. For example, financial jargon will have a different weight in the FinanceBench dataset, significantly improving the relevance of retrievals.
Scalable Architecture: While many RAG systems experience performance degradation as dataset size increases, Ragie’s architecture maintains high performance even with 50,000+ pages. Ragie also uses summary index for hierarchical and hybrid hierarchical search; this enhances the chunk retrieval process by processing chunks in layers and ensuring that context is preserved to retrieve highly relevant chunks for generations.

Conclusion

Before making a Build vs Buy decision, developers must consider a range of performance metrics, including scalability, ingestion efficiency, and retrieval accuracy. In this rigorous test against FinanceBench, Ragie demonstrated its ability to handle large-scale, complex financial documents with exceptional speed and precision, outperforming the Shared Store accuracy benchmark by 42%.

If you’d like to see how Ragie can handle your own large-scale or multi-modal documents, you can try Ragie’s Free Developer Plan.

Feel free to reach out to us at support@ragie.ai if you're interested in running the FinanceBench test yourself.

How to Build Smarter AI Apps and Reduce Hallucinations with RAG

Ibrahim Salami — Thu, 10 Oct 2024 21:20:15 +0000

With the rise of AI-powered apps, developers are continuously looking for ways to enhance the accuracy and relevance of AI-generated content. One of the most effective methods for achieving this is through Retrieval-Augmented Generation (RAG), which combines the power of LLMs with real-time access to external data sources. RAG makes AI applications more reliable, intelligent, and context-aware. Additionally, RAG can mitigate hallucination, which is when AI models generate false or misleading information.

In this blog, we’ll explore how developers can use RAG to build smarter AI apps and reduce hallucinations.

What Is Retrieval-Augmented Generation (RAG)?

RAG is an advanced technique that enhances LLMs by allowing them to pull real-time, relevant information from external databases, knowledge bases, or other sources. Traditional LLMs rely solely on the data they were trained on, which can lead to inaccurate or outdated results, especially when faced with complex, domain-specific questions. RAG provides a retrieval mechanism that can tap into live data sources, enabling LLMs to generate more accurate and relevant responses.

Why Use RAG to Build Smarter AI Applications?

RAG has several key benefits that make it ideal for developers looking to build more intelligent AI applications:

Real-Time Data Access: Traditional LLMs are limited by their training data, which can become outdated. RAG addresses this issue by retrieving real-time data from external sources, ensuring responses are up-to-date and accurate.
Improved Accuracy and Reliability: While LLMs are proficient at generating text, they can sometimes fabricate information when solid factual information isn’t present in their training data. RAG ensures responses are grounded in real, curated data, making it ideal for tasks where correctness is critical, such as research, journalism, or technical documentation.
Context-Aware Responses: RAG’s retrieval mechanism selects information relevant to the input query, ensuring that the responses are accurate and contextually aligned with the specific question or task at hand.
Fast and Efficient Retrieval: RAG utilizes vector and other specialized databases to quickly retrieve information based on semantic similarity, ensuring that the right information is available to the LLM in fractions of a second.
Increased Response Precision: The combination of RAG with LLMs results in answers that are not only more coherent but also more precise and informative, allowing AI to generate more comprehensive responses across text and multi-modal formats.

Avoiding AI Hallucinations with RAG

One of the biggest challenges developers face when using LLMs is dealing with hallucinations. Hallucinations occur when AI systems generate content that is factually incorrect, irrelevant, or misleading, often because the model attempts to fill gaps in its knowledge. Hallucination is a common problem with LLMs, RAG significantly reduces its occurrence by ensuring that responses are anchored in real, external data sources.

How RAG Reduces Hallucinations:

Real-time Data Retrieval: By accessing a continuously updated knowledge base, RAG allows models to generate responses based on current, factual data rather than outdated or incomplete training sets. This real-time retrieval ensures that AI-generated answers remain relevant and accurate.
Factual Consistency: RAG encourages models to produce responses that are aligned with the factual data retrieved. Instead of relying on the model’s built-in knowledge, which might contain inaccuracies or contradictions, it conditions the generation process on accurate and structured information from external sources.
Improved Contextual Understanding: One of the key benefits of RAG is its ability to retrieve contextually relevant information to the input query. It provides the AI with access to relevant and targeted data, enabling the model to generate more coherent and contextually appropriate responses. This helps avoid hallucinations where the AI might otherwise improvise.

How Ragie Helps Developers Build Smarter Generative AI Apps

Ragie is a fully managed RAG-as-a-service platform that simplifies the process of building smarter, RAG-powered AI applications. Developers can easily use Ragie APIs to index and retrieve multi-modal data (text, images, PDFs, etc.) to ensure factual accuracy and minimize hallucinations.

Key Features of Ragie that Help Reduce Hallucinations

Easily Sync Data: Ragie allows developers to connect their AI systems to external data sources like Google Drive, Notion, and Confluence. This ensures that the AI system always has real-time access to up-to-date and relevant information, reducing the chances of generating outdated or inaccurate responses.
Summary Index: Ragie’s advanced “Summary Index” feature helps prevent document affinity problems, where the AI might disproportionately rely on a small subset of documents that have high semantic similarity when key facts may be distributed across many documents. It helps the AI retrieve the most relevant sections from multiple diverse documents.
Entity Extraction for Structured Data: Ragie offers entity extraction capabilities, allowing developers to retrieve structured data from unstructured sources like PDFs or scanned documents. This feature helps AI systems understand and contextualize the information better, reducing the chances of hallucinating incorrect information.
Advanced Chunking and Retrieval: Ragie uses advanced chunking methods to break down large documents into manageable parts. This ensures that the AI retrieves only the most relevant chunks of information, providing a more focused and accurate response.
Scalable and Fast Pipelines: With Ragie, developers don’t need to worry about building and maintaining a complex data ingest and retrieval pipelines. Ragie’s fully managed service is scalable, reliable, and highly performant, allowing developers to focus on delivering their AI products without any compromises.

Conclusion

It is critical to ensure that AI-generated content is accurate. RAG helps developers build smarter and more context-aware AI applications, significantly reducing the risk of hallucinations.

Whether you’re building a chatbot, a knowledge-base, an agent, or an enterprise-grade AI solution, Ragie’s fully managed RAG-as-a-Service platform provides the tools and infrastructure necessary to ensure your AI applications are smarter, faster, and, most importantly, accurate. Ragie SDKs are open-source, please star us on GitHub.

Try Ragie for free —->

‍