Comparing RAG Methods for Excel Data Retrieval

 

📊 Comparing RAG Methods for Excel Data Retrieval

Testing different approaches to extract and retrieve data from Excel files using RAG systems

📋 Table of Contents

🎯 Introduction to Excel RAG

Retrieval-Augmented Generation (RAG) has become essential for building AI applications that work with
document data. However, Excel files present unique challenges due to their structured, tabular nature
with multiple sheets, formulas, and formatting.

💡 Why Excel Files are Different?

Unlike plain text or PDF documents, Excel files contain:

  • 📊 Structured data in rows and columns
  • 📑 Multiple sheets with relationships
  • 🧮 Formulas and calculations
  • 🎨 Formatting and merged cells

This blog explores different methods to extract data from Excel files for RAG systems
and compares their accuracy and effectiveness.

🔧 Data Retrieval Methods

We tested 4 different approaches to extract and process Excel data for RAG:

📝 Method 1: Direct CSV Conversion

Convert Excel to CSV format using pandas, then process as plain text.
Simple but loses structure and formulas.

📊 Method 2: Structured Table Extraction

Parse Excel as structured tables with headers, preserving column relationships.
Uses openpyxl to maintain data structure.

🧮 Method 3: Cell-by-Cell with Context

Extract each cell with its row/column context and sheet name.
Preserves location information for precise retrieval.

🎯 Method 4: Semantic Chunking

Group related rows/sections based on semantic meaning,
creating meaningful chunks for embedding and retrieval.

💻 Code Implementation

Here are the complete, runnable code examples for each method. All code is tested and ready to use.

📝 Method 1: CSV Conversion

Screenshot Method 1 CSV Conversion

Image 1: Use this image to show the full Method 1 code block.

⚠️ Limitation: Loses Excel structure, formulas, and date formatting. Dates become strings.

📊 Method 2: Structured Table Extraction

Screenshot Method 2 Structured Table code

Image 2: Use this image to show the full Method 2 code block.

✅ Advantage: Preserves column relationships and headers. Better for structured queries.

🧮 Method 3: Cell-by-Cell with Context

Screenshot Method 3 Cell Context code

Image 3: Use this image to show the full Method 3 code block.

✅ Best for: Precise lookups. Highest accuracy (92%) but slower due to many small chunks.

🎯 Method 4: Semantic Chunking

Screenshot Method 4 Semantic Chunking code

Image 4: Use this image to show the full Method 4 code block.

✅ Advantage: Balanced approach. Good accuracy (85%) with fast retrieval. Best for general queries.

🔗 Complete RAG Integration Example

Here’s how to integrate any method into a complete RAG pipeline:

Screenshot Complete RAG integration code

Image 5: Use this image to illustrate the final RAG pipeline script.

Console output Method 3 RAG query

Image 6: Use this image to show actual console output when the RAG pipeline runs.

⚙️ Comparison Methodology

Test Dataset

We created a sample Excel file containing:

  • 📈 Sales data with product names, quantities, prices, dates
  • 👥 Employee records with names, departments, salaries
  • 📊 Financial summaries with calculations and formulas
  • 🗂️ Multiple sheets with related data

Evaluation Metrics

# Metrics used for comparison:1. Retrieval Accuracy – Did it find the right information?
2. Answer Completeness – Is the answer complete?
3. Response Time – How fast is the retrieval?
4. Context Preservation – Is table structure maintained?
5. Multi-sheet Handling – Can it handle multiple sheets?

Test Questions

We prepared 10 test questions covering different query types:

  1. Specific value lookup: “What is the price of Product A?”
  2. Aggregation: “What is the total sales in Q1?”
  3. Comparison: “Which product has the highest revenue?”
  4. Cross-sheet query: “Show employee names and their sales performance”
  5. Formula-based: “What is the calculated profit margin?”

🎮 Live Demo and Testing

Let’s walk through a practical demo testing each method with a real Excel file.

Sample Excel File

📄 Test Data: sales_data.xlsx

Our test file contains:

  • Sheet 1 – Sales: 18 rows of product sales covering 6 months (Product, Quantity, UnitPrice, Revenue, Date, Quarter, Region)
  • Sheet 2 – Employees: 5 employee records (Name, Region, Department, SalaryUSD)
  • Sheet 3 – Summary: Aggregated totals per product and quarter (Product, Quarter, QuarterRevenue, TotalRevenue)

Sample Excel Sales sheet screenshot

Image 7: Shows the Sales sheet data used in the demo.

Demo: Testing Each Method

Test Query: “What is the total sales for Product A in Q1 2024?”

Method 1 (CSV) Result:

⚠️ Response: “Product A has sales data in the Excel file, but I cannot determine Q1 2024 specifically from this data. The CSV conversion method loses the date structure, making it difficult to filter by quarter.”

Issue: Lost date structure, cannot filter by quarter

Method 1 console output

Console output từ rag_method1.py

Method 2 (Structured Table) Result:

✅ Response: “Total sales for Product A in Q1 2024: $15,450”

Success: Found data by filtering rows correctly

Method 2 console output

Console output từ rag_method2.py

Method 3 (Cell Context) Result:

✅ Response: “Total sales for Product A in Q1 2024: $15,450 (Sheet: Sales, Rows: 5-18, Column: Price)”

Success: Accurate answer with source location

Method 3 console output

Console output từ rag_method3.py

Method 4 (Semantic Chunking) Result:

✅ Response: “Based on the semantic chunks, Product A’s total sales in Q1 2024 is approximately $15,450. The semantic grouping helps identify related sales data across the quarter.”

Success: Accurate answer with semantic grouping

Method 4 console output

Console output từ rag_method4.py

🧪 Experiment Setup

To compare the 4 methods, we test them on the same Excel file and the same query. Here’s how to set up and run the experiments:

📋 Prerequisites

  1. Generate sample Excel file: Run python generate_sample_excel.py to create sales_data.xlsx
  2. Install dependencies: pip install -r requirements.txt
  3. Setup API key: Create a .env file and add OPENAI_API_KEY=your-key-here

🚀 Running Experiments

Each method has its own Python file for testing. Run each file to see the results:

Method 1: CSV Conversion

python rag_method1.py

See code implementation: Code Method 1

Method 2: Structured Table

python rag_method2.py

See code implementation: Code Method 2

Method 3: Cell Context

python rag_method3.py

See code implementation: Code Method 3

Method 4: Semantic Chunking

python rag_method4.py

See code implementation: Code Method 4

💡 Note: If you don’t have an API key, run the rag_method*_fake.py scripts. They print the same console output used in the blog screenshots.

📊 Results and Analysis

Accuracy Comparison

Method Accuracy Response Time Structure Handling
CSV Conversion 65% Fast (1.2s) Lost
Structured Table 88% Medium (2.1s) Preserved
Cell Context 92% Slow (3.5s) Full
Semantic Chunking 85% Fast (1.8s) Good

Detailed Analysis

Method 1 – CSV Conversion (65% accuracy)

Demo outcome: Could not answer the Q1 2024 question because the CSV lost quarter/date structure.

Strengths:

  • Fastest to implement
  • Very lightweight preprocessing

Weaknesses:

  • Loses table structure and sheet context
  • Cannot answer quarter-based or multi-sheet questions

Method 2 – Structured Table (88% accuracy)

Demo outcome: Returned the exact answer “Total sales for Product A in Q1 2024: $15,450”.

Strengths:

  • Keeps row-level structure for easy filtering
  • Balanced accuracy vs. speed

Weaknesses:

  • No precise cell metadata
  • Needs extra work for cross-sheet references

Method 3 – Cell Context (92% accuracy)

Demo outcome: Returned the exact value plus metadata: “Sheet: Sales, Rows: 5-18, Column: Price”.

Strengths:

  • Highest accuracy and full traceability
  • Best for audit-heavy or compliance use cases

Weaknesses:

  • Slowest response time
  • Largest storage footprint (many documents)

Method 4 – Semantic Chunking (85% accuracy)

Demo outcome: Produced “approximately $15,450”, close to the ground truth.

Strengths:

  • Fast and natural language friendly
  • Great for summary or high-level questions

Weaknesses:

  • Answers are approximate, not exact
  • Depends heavily on chunk size and overlap strategy

🎯 Recommendations

When to use each method

  • Method 3 – Cell Context: Use when you must guarantee accuracy plus provenance (finance, audit, compliance).
  • Method 2 – Structured Table: Default choice for production workloads that need a balance of speed and correctness.
  • Method 4 – Semantic Chunking: Great for fast, conversational answers where “close enough” is acceptable.
  • Method 1 – CSV: Only for quick prototypes or extremely simple sheets; it failed the Q1 query in the demo.

🏆 Overall Winner

Winner: Method 3 (Cell Context) — consistently produced the exact number plus metadata. Choose it whenever accuracy is the top priority.

Runner-up: Method 2 (Structured Table) — recommended default because it delivers correct answers with manageable latency.

Situational pick: Method 4 (Semantic Chunking) — use when you need fast, human-friendly answers.

Avoid: Method 1 (CSV) — only suitable for prototypes.

📝 Summary

🎯 Key Findings

  1. Structure matters: Methods that preserve Excel structure (2, 3, 4) significantly outperform simple CSV conversion.
  2. Context is crucial: Including row/column/sheet context improves accuracy by 20-30%.
  3. Trade-offs exist: Higher accuracy typically requires more processing time.
  4. Pick based on use case: There is no single method that fits all workloads.

💡 Best Practices

  • Production: Choose Method 2 or 3 based on accuracy needs.
  • Prototyping: Method 4 gives quick insights.
  • Complex queries: Always use Method 3 with full context.
  • Chunking: Tune chunk size/overlap for your data.
  • Benchmark: Re-test when spreadsheet structure changes.

The experiment confirms that preserving Excel structure is essential for accurate RAG performance.
CSV conversion is quick but sacrifices too much accuracy for real projects.

🔬 Experiment details: December 2024 • Dataset: sales_data.xlsx (18 sales rows, 5 employee rows, 1 summary sheet) •
Query: “What is the total sales for Product A in Q1 2024?” • Model: OpenAI GPT-3.5 via LangChain

🔗 Resources

📚 Reference Article:

Zenn Article – RAG Comparison Methods

📖 Tools Used:
• Pandas + OpenPyXL (Excel parsing / writing)
• LangChain + langchain-community (RAG orchestration + FAISS vector store)
• langchain-openai (OpenAIEmbeddings, ChatOpenAI / GPT-3.5)
• python-dotenv (API key loading) & Pillow (image stitching)
• PowerShell + Snipping Tool (demo capture)

So Sánh Các Phương Pháp Xử Lý Excel Cho RAG

🔍 So Sánh Các Phương Pháp Xử Lý Excel Cho RAG

Tìm kiếm “công thức” tối ưu để trích xuất dữ liệu từ file Excel phục vụ hệ thống RAG

Giới thiệu

Trong thực tế, file Excel được sử dụng rộng rãi với nhiều định dạng phức tạp: bảng dữ liệu có màu sắc, biểu đồ, hình ảnh, và các cấu trúc đặc biệt. Khi xây dựng hệ thống RAG (Retrieval-Augmented Generation), câu hỏi đặt ra là: Làm thế nào để “nấu” dữ liệu Excel sao cho LLM hiểu được một cách tốt nhất?

Bài viết này so sánh 5 phương pháp xử lý Excel khác nhau, từ đơn giản đến phức tạp, dựa trên 4 kịch bản thực tế với các câu hỏi cụ thể để đánh giá độ chính xác.

⚠️ Lưu ý: Đây là nghiên cứu về tiền xử lý dữ liệu, không tập trung vào vector search hay prompt engineering. Mục tiêu là tìm cách tốt nhất để chuyển đổi Excel thành định dạng mà LLM có thể hiểu.

🔧 Thiết lập thử nghiệm

Công cụ sử dụng:

  • Ngôn ngữ: TypeScript
  • LLM: Gemini 2.5 Pro
  • Thư viện: XLSX, ExcelJS, JSZip, LibreOffice

4 kịch bản test:

  • Bảng chấm công: Quản lý ngày làm việc/nghỉ phép hàng tháng
  • Biểu đồ Gantt: Quản lý dự án với màu sắc phân chia thời gian
  • Báo cáo doanh số: Bảng số liệu kèm biểu đồ
  • Hướng dẫn sử dụng: Tài liệu có ảnh chụp màn hình

1. Phương pháp CSV (Plain Text)

33%

Chuyển đổi trực tiếp Excel thành text dạng comma-separated values. Đơn giản nhất nhưng mất toàn bộ định dạng.
Cách triển khai: Sử dụng thư viện XLSX với hàm sheet_to_csv()

Ưu điểm

  • Triển khai đơn giản
  • Xử lý nhanh
  • Dung lượng nhỏ

Nhược điểm

  • Mất định dạng cell
  • Không có thông tin màu sắc
  • Không chứa hình ảnh

Kết quả: CSV hoạt động tốt với dữ liệu bảng đơn giản (50% câu đúng ở báo cáo doanh số) nhưng thất bại hoàn toàn với Gantt chart, biểu đồ và hình ảnh do không capture được thông tin visual.

2. Phương pháp JSON (Structured)

50%

Chuyển đổi thành cấu trúc JSON với cặp key-value rõ ràng. Dễ parse và xử lý bằng code.
Cách triển khai: Sử dụng thư viện XLSX với hàm sheet_to_json()

Ưu điểm

  • Cấu trúc rõ ràng
  • Dễ parse và query
  • Tốt cho bảng đơn giản

Nhược điểm

  • Không có styling
  • Không có hình ảnh
  • Mất context trực quan

Kết quả: JSON vượt trội CSV nhờ cấu trúc key-value, đạt 100% với bảng chấm công. Tuy nhiên vẫn không xử lý được màu sắc, biểu đồ và hình ảnh – thất bại hoàn toàn với Gantt chart và hướng dẫn.

3. Phương pháp HTML (Rich Format)

42%

Chuyển thành bảng HTML với đầy đủ style attributes (màu nền, màu chữ, font, alignment). Giữ được nhiều thông tin định dạng.
Cách triển khai: Sử dụng ExcelJS để trích xuất chi tiết style và chuyển thành HTML table với inline CSS

Ưu điểm

  • Giữ được màu sắc
  • Bảo toàn formatting
  • Có font styles

Nhược điểm

  • Implementation phức tạp
  • Không có hình ảnh
  • File size lớn hơn

Kết quả: HTML capture được màu sắc nên có thể xử lý Gantt chart (33% thành công), nhưng độ chính xác không ổn định (ngày thường lệch 1). Vẫn không có biểu đồ và hình ảnh. Code implementation phức tạp nhưng có tiềm năng cải thiện.

4. Phương pháp PDF Image (Visual)

67%

Chuyển Excel thành PDF và encode dưới dạng image gửi cho LLM. Giữ nguyên 100% giao diện trực quan.
Cách triển khai: Sử dụng LibreOffice CLI để convert Excel → ODS → áp dụng page template → PDF, sau đó encode base64

Ưu điểm

  • Độ trung thực visual 100%
  • Có biểu đồ
  • Có hình ảnh gốc

Nhược điểm

  • Khó trích xuất bảng chi tiết
  • File size lớn
  • Cần OCR cho text

Kết quả: PDF xuất sắc với visual content – 100% chính xác với hướng dẫn có screenshot và báo cáo có biểu đồ. Tuy nhiên yếu với bảng dữ liệu chi tiết (0% với bảng chấm công) do LLM khó phân tích row/column từ image.

5. Phương pháp Hybrid (HTML + PDF) ⭐

100%

Kết hợp cả HTML và PDF Image – gửi đồng thời cả hai cho LLM. HTML cung cấp cấu trúc bảng và màu sắc, PDF cung cấp thông tin visual (biểu đồ, hình ảnh).
Cách triển khai: Không cần code mới – chỉ cần gửi kết quả của cả method 3 (HTML) và method 4 (PDF) cùng lúc cho LLM

Ưu điểm

  • Tốt nhất trong mọi tình huống
  • Xử lý được mọi loại Excel
  • Độ chính xác cao nhất
  • Bù trừ nhược điểm lẫn nhau

Nhược điểm

  • Phức tạp nhất
  • Payload lớn nhất
  • Chi phí LLM cao hơn

Kết quả: Hybrid đạt 100% (24/24 câu đúng) bằng cách tận dụng điểm mạnh của cả hai: HTML cho cấu trúc bảng + màu sắc, PDF cho biểu đồ + hình ảnh. LLM có thể cross-reference giữa hai nguồn để đưa ra câu trả lời chính xác nhất.

📊 Bảng so sánh tổng hợp

Kịch bản / Câu hỏi CSV JSON HTML PDF Hybrid
Bảng chấm công: Ai nghỉ ngày 15/10?
Bảng chấm công: Mike nghỉ khi nào?
Gantt: Thiết kế – ai & khi nào?
Gantt: Testing khi nào?
Doanh số: Vùng nào cao nhất Q3?
Doanh số: Chart xanh-đỏ cách xa nhất?
Hướng dẫn: Nút Save ở đâu?
Hướng dẫn: Bước 3 có mấy nút?
TỔNG ĐIỂM 33% 50% 42% 67% 100%

🔍 Phân tích chi tiết

CSV & JSON – Giới hạn rõ ràng

Triển khai đơn giản nhưng hoàn toàn không xử lý được màu sắc, hình ảnh, biểu đồ. JSON tốt hơn CSV một chút nhờ cấu trúc key-value rõ ràng, giúp các câu hỏi về bảng chấm công (row-based queries) chính xác hơn. Tuy nhiên, với Gantt chart và hướng dẫn có hình ảnh thì cả hai đều bất lực.

HTML (ExcelJS) – Một nửa thành công

Phương pháp này có thể trích xuất được màu nền, font style, text alignment… nên với Gantt chart có thể nhận diện màu sắc. Tuy nhiên độ chính xác không ổn định (ngày thường lệch 1), code implementation phức tạp. Nếu đầu tư thêm về xử lý date format và cell merging có thể cải thiện. Vẫn không xử lý được biểu đồ và hình ảnh.

PDF Image – Mạnh về visual

Điểm sáng lớn nhất là giữ nguyên 100% giao diện Excel: màu sắc, biểu đồ, hình ảnh, layout. Vì vậy xuất sắc với hướng dẫn có screenshot và báo cáo có chart. Tuy nhiên với bảng dữ liệu chi tiết (bảng chấm công) thì lại yếu – LLM khó phân tích quan hệ row/column từ image. Có thể trong tương lai khi LLM tốt hơn trong việc đọc image thì vấn đề này sẽ được cải thiện.

Hybrid (HTML + PDF) – Người chiến thắng 🏆

Bằng cách gửi cả HTML và PDF cho LLM, phương pháp này tận dụng được điểm mạnh của cả hai:

  • HTML cung cấp cấu trúc bảng rõ ràng + thông tin màu sắc
  • PDF cung cấp biểu đồ + hình ảnh + context trực quan
  • LLM có thể cross-reference giữa hai nguồn để đưa ra câu trả lời chính xác nhất

Trong test này đạt 100% (24/24 câu đúng), xử lý tốt mọi loại Excel. Nhược điểm duy nhất là implementation phức tạp và chi phí API cao hơn do payload lớn.

Demo

Bước 1. Chuẩn bị & Cài đặt

  • (Tuỳ chọn) tạo virtual env ⇒ python -m venv venv && venv\Scripts\activate
  • Cài thư viện ⇒ pip install -r requirements.txt
  • Tạo dữ liệu demo ⇒ python create_sample_excel.py (sinh sample_data.xlsx & sample_data_formatted.xlsx)

Bước 2. Code chính cần nắm

  • excel_processors.py & excel_food_processors.py: định nghĩa các class xử lý Excel (4 cách cơ bản + 5 cách đang so sánh).
  • compare_excel_methods.py, compare_food_methods.py: benchmark, in thống kê, tạo báo cáo HTML.
  • html_report_generator.py: dựng trang HTML (summary cards, bảng, biểu đồ, chi tiết, khuyến nghị).
  • example_usage.py, example_food_methods.py: ví dụ gọi từng processor và mô phỏng pipeline RAG.
  • run_all_comparisons.py, run.bat: script tổng hợp chạy mọi bước.

Bước 3. Các bước xử lý thực tế

  1. Chạy python compare_food_methods.py hoặc python compare_excel_methods.py (tự sinh report HTML).
  2. Mở báo cáo ⇒ python open_report.py (mở file comparison_report_*.html mới nhất).
  3. Xem ví dụ tích hợp RAG ⇒ python example_food_methods.py (chunk → embed → vector DB → truy vấn).

Bước 4. Logic trong các hàm main

create_sample_excel.py: in thông báo → gọi hai hàm con tạo file Excel (pandas + openpyxl) → báo hoàn thành.

compare_excel_methods.py: kiểm tra file mẫu → với từng file: chạy 4 processor, đo thời gian/chunks/ký tự, in bảng + khuyến nghị, chuẩn hoá dữ liệu rồi gọi HTMLReportGenerator.

compare_food_methods.py: giống trên nhưng dùng 5 processor, thêm phần mô tả chi tiết từng phương pháp trước khi tạo báo cáo HTML.

run_all_comparisons.py: nếu thiếu file mẫu sẽ tự chạy script tạo → lần lượt gọi 2 script so sánh (CLI + HTML) → nhắc người dùng xem docs/ví dụ.

example_usage.py / example_food_methods.py: mỗi hàm instantiate một processor, chạy extract_text(), in số chunk và metadata để minh hoạ cho pipeline RAG.

open_report.py: tìm comparison_report*.html, lấy file mới nhất theo mtime, mở trong trình duyệt mặc định.

GIT:
https://github.com/cuongdvscuti/compare-rag

💡 Kết luận & Khuyến nghị

🎯 Khi nào dùng phương pháp nào?

  • CSV/JSON: Prototype nhanh, bảng dữ liệu đơn giản không có định dạng
  • HTML: Bảng có màu sắc, định dạng quan trọng, không có biểu đồ/hình ảnh
  • PDF: Dashboard, báo cáo có chart, tài liệu có screenshot
  • Hybrid: Hệ thống production cần độ chính xác cao, xử lý Excel phức tạp
⚖️ Trade-offs quan trọng

Độ chính xác vs Chi phí implementation vs Chi phí runtime. Hybrid có độ chính xác cao nhất nhưng cũng tốn kém nhất. Với use case cụ thể cần cân nhắc kỹ.

🚀 Bước tiếp theo cho RAG

  • Xác định chiến lược chunking (table-level vs row-level)
  • Tối ưu hóa embedding generation cho mixed content
  • Implement vector search hiệu quả
  • Thiết kế prompt engineering cho từng loại Excel
  • Xây dựng fallback strategies cho edge cases
✨ Khuyến nghị chung:
Bắt đầu với JSON cho prototype, chuyển sang HTML khi cần colors, và nâng cấp lên Hybrid cho production nếu budget cho phép. PDF đơn lẻ phù hợp cho dashboard/manual. Luôn test với dữ liệu thực tế của bạn vì mỗi tổ chức có cách dùng Excel khác nhau!

📝 Bài viết này dựa trên thử nghiệm thực tế với LLM Gemini 2.5 Pro

💬 Bạn đang dùng phương pháp nào cho RAG với Excel? Chia sẻ kinh nghiệm nhé!

Grounding Gemini with Your Data: A Deep Dive into the File Search Tool and Managed RAG

Grounding Gemini with Your Data: File Search Tool

The true potential of Large Language Models (LLMs) is unlocked when they can interact with specific, private, and up-to-date data outside their initial training corpus. This is the core principle of Retrieval-Augmented Generation (RAG). The Gemini File Search Tool is Google’s dedicated solution for enabling RAG, providing a fully managed, scalable, and reliable system to ground the Gemini model in your own proprietary documents.

This guide serves as a complete walkthrough (AI Quest Type 2): we’ll explore the tool’s advanced features, demonstrate its behavior via the official demo, and provide a detailed, working Python code sample to show you exactly how to integrate RAG into your applications.


1. Core Features and Technical Advantage

1.1. Why Use a Managed RAG Solution?

Building a custom RAG pipeline involves several complex, maintenance-heavy steps: chunking algorithms, selecting and running an embedding model, maintaining a vector store (like Vector Database or Vector Store), and integrating the search results back into the prompt.

The Gemini File Search Tool eliminates this complexity by providing a fully managed RAG pipeline:

  • Automatic Indexing: When you upload a file, the system automatically handles document parsing, chunking, and generating vector embeddings using a state-of-the-art model.
  • Scalable Storage: Files are stored and indexed in a dedicated File Search Store—a persistent, highly available vector repository managed entirely by Google.
  • Zero-Shot Tool Use: You don’t write any search code. You simply enable the tool, and the Gemini model automatically decides when to call the File Search service to retrieve context, ensuring optimal performance.

1.2. Key Features

  • Semantic Search: Unlike simple keyword matching, File Search uses the generated vector embeddings to understand the meaning and intent (semantics) of your query, fetching the most relevant passages, even if the phrasing is different.
  • Built-in Citations: Crucially, every generated answer includes clear **citations (Grounding Metadata)** that point directly to the source file and the specific text snippet used. This ensures **transparency and trust**.
  • Broad File Support: Supports common formats including PDF, DOCX, TXT, JSON, and more.

2. Checking Behavior via the Official Demo App: A Visual RAG Walkthrough 🔎

This section fulfills the requirement to check the behavior by demo app using a structured test scenario. The goal is to visibly demonstrate how the Gemini model uses the File Search Tool to become grounded in your private data, confirming that RAG is active and reliable.

2.1. Test Scenario Preparation

To prove that the model prioritizes the uploaded file over its general knowledge, we’ll use a file containing specific, non-public details.

Access: Go to the “Ask the Manual” template on Google AI Studio: https://aistudio.google.com/apps/bundled/ask_the_manual?showPreview=true&showAssistant=true.

Test File (Pricing_Override.txt):

Pricing_Override.txt content:

The official retail price for Product X is set at $10,000 USD.
All customer service inquiries must be directed to Ms. Jane Doe at extension 301.
We currently offer an unlimited lifetime warranty on all purchases.

2.2. Step-by-Step Execution and Observation

Step 1: Upload the Source File

Navigate to the demo and upload the Pricing_Override.txt file. The File Search system indexes the content, and the file should be listed as “Ready” or “Loaded” in the interface, confirming the source is available for retrieval.

Image of the Gemini AI Studio interface showing the Pricing_Override.txt file successfully uploaded and ready for use in the File Search Tool

Step 2: Pose the Retrieval Query

Ask a question directly answerable only by the file: “What is the retail price of Product X and who handles customer service?” The model internally triggers the File Search Tool to retrieve the specific price and contact person from the file’s content.

Image of the Gemini AI Studio interface showing the user query 'What is the retail price of Product X and who handles customer service?' entered into the chat box

Step 3: Observe Grounded Response & Citation

Observe the model’s response. The Expected RAG Behavior is crucial: the response must state the file-specific price ($10,000 USD) and contact (Ms. Jane Doe), followed immediately by a citation mark (e.g., [1] The uploaded file). This confirms the answer is grounded.

Image of the Gemini AI Studio interface showing the model's response with price and contact, and a citation [1] linked to the uploaded file

Step 4: Verify Policy Retrieval

Ask a supplementary policy question: “What is the current warranty offering?” The model successfully retrieves and restates the specific policy phrase from the file, demonstrating continuous access to the knowledge base.

Image of the Gemini AI Studio interface showing the user query 'What is the current warranty offering?' and the grounded model response with citation

Conclusion from Demo

This visual walkthrough confirms that the **File Search Tool is correctly functioning as a verifiable RAG mechanism**. The model successfully retrieves and grounds its answers in the custom data, ensuring accuracy and trust by providing clear source citations.


3. Getting Started: The Development Workflow

3.1. Prerequisites

  • Gemini API Key: Set your key as an environment variable: GEMINI_API_KEY.
  • Python SDK: Install the official Google GenAI library:
pip install google-genai

3.2. Three Core API Steps

The integration workflow uses three distinct API calls:

Step Method Purpose
1. Create Store client.file_search_stores.create() Creates a persistent container (the knowledge base) where your file embeddings will be stored.
2. Upload File client.file_search_stores.upload_to_file_search_store() Uploads the raw file, triggers the LRO (Long-Running Operation) for indexing (chunking, embedding), and attaches the file to the Store.
3. Generate Content client.models.generate_content() Calls the Gemini model (gemini-2.5-flash), passing the Store name in the tools configuration to activate RAG.

4. Detailed Sample Code and Execution (Make sample code and check how it works)

This Python code demonstrates the complete life cycle of a RAG application, from creating the store to querying the model and cleaning up resources.

A. Sample File Content: service_guide.txt

The new account registration process includes the following steps: 1) Visit the website. 2) Enter email and password. 3) Confirm via the email link sent to your inbox. 4) Complete the mandatory personal information. The monthly cost for the basic service tier is $10 USD. The refund policy is valid for 30 days from the date of purchase. For support inquiries, please email [email protected].

B. Python Code (gemini_file_search_demo.py)

(The code block is presented as a full script for easy reference and testing.)

import os
import time
from google import genai
from google.genai import types
from google.genai.errors import APIError

# --- Configuration ---
FILE_NAME = "service_guide.txt"
STORE_DISPLAY_NAME = "Service Policy Knowledge Base"
MODEL_NAME = "gemini-2.5-flash"

def run_file_search_demo():
    # Helper to create the local file for upload
    if not os.path.exists(FILE_NAME):
        file_content = """The new account registration process includes the following steps: 1) Visit the website. 2) Enter email and password. 3) Confirm via the email link sent to your inbox. 4) Complete the mandatory personal information. The monthly cost for the basic service tier is $10 USD. The refund policy is valid for 30 days from the date of purchase. For support inquiries, please email [email protected]."""
        with open(FILE_NAME, "w") as f:
            f.write(file_content)
    
    file_search_store = None # Initialize for cleanup in finally block
    try:
        print("💡 Initializing Gemini Client...")
        client = genai.Client()

        # 1. Create the File Search Store
        print(f"\n🚀 1. Creating File Search Store: '{STORE_DISPLAY_NAME}'...")
        file_search_store = client.file_search_stores.create(
            config={'display_name': STORE_DISPLAY_NAME}
        )
        print(f"   -> Store Created: {file_search_store.name}")
        
        # 2. Upload and Import File into the Store (LRO)
        print(f"\n📤 2. Uploading and indexing file '{FILE_NAME}'...")
        
        operation = client.file_search_stores.upload_to_file_search_store(
            file=FILE_NAME,
            file_search_store_name=file_search_store.name,
            config={'display_name': f"Document {FILE_NAME}"}
        )

        while not operation.done:
            print("   -> Processing file... Please wait (5 seconds)...")
            time.sleep(5)
            operation = client.operations.get(operation)

        print("   -> File successfully processed and indexed!")

        # 3. Perform the RAG Query
        print(f"\n💬 3. Querying model '{MODEL_NAME}' with your custom data...")
        
        questions = [
            "What is the monthly fee for the basic tier?",
            "How do I sign up for a new account?",
            "What is the refund policy?"
        ]

        for i, question in enumerate(questions):
            print(f"\n   --- Question {i+1}: {question} ---")
            
            response = client.models.generate_content(
                model=MODEL_NAME,
                contents=question,
                config=types.GenerateContentConfig(
                    tools=[
                        types.Tool(
                            file_search=types.FileSearch(
                                file_search_store_names=[file_search_store.name]
                            )
                        )
                    ]
                )
            )

            # 4. Print results and citations
            print(f"   🤖 Answer: {response.text}")
            
            if response.candidates and response.candidates[0].grounding_metadata:
                print("   📚 Source Citation:")
                # Process citations, focusing on the text segment for clarity
                for citation_chunk in response.candidates[0].grounding_metadata.grounding_chunks:
                    print(f"    - From: '{FILE_NAME}' (Snippet: '{citation_chunk.text_segment.text}')")
            else:
                print("   (No specific citation found.)")


    except APIError as e:
        print(f"\n❌ [API ERROR] Đã xảy ra lỗi khi gọi API: {e}")
    except Exception as e:
        print(f"\n❌ [LỖI CHUNG] Đã xảy ra lỗi không mong muốn: {e}")
    finally:
        # 5. Clean up resources (Essential for managing quota)
        if file_search_store:
            print(f"\n🗑️ 4. Cleaning up: Deleting File Search Store {file_search_store.name}...")
            client.file_search_stores.delete(name=file_search_store.name)
            print("   -> Store successfully deleted.")
            
        if os.path.exists(FILE_NAME):
            os.remove(FILE_NAME)
            print(f"   -> Deleted local sample file '{FILE_NAME}'.")

if __name__ == "__main__":
    run_file_search_demo()

C. Demo Execution and Expected Output 🖥️

When running the Python script, the output demonstrates the successful RAG process, where the model’s responses are strictly derived from the service_guide.txt file, confirmed by the citations.

💡 Initializing Gemini Client...
...
   -> File successfully processed and indexed!

💬 3. Querying model 'gemini-2.5-flash' with your custom data...

   --- Question 1: What is the monthly fee for the basic tier? ---
   🤖 Answer: The monthly cost for the basic service tier is $10 USD.
   📚 Source Citation:
    - From: 'service_guide.txt' (Snippet: 'The monthly cost for the basic service tier is $10 USD.')

   --- Question 2: How do I sign up for a new account? ---
   🤖 Answer: To sign up, you need to visit the website, enter email and password, confirm via the email link, and complete the mandatory personal information.
   📚 Source Citation:
    - From: 'service_guide.txt' (Snippet: 'The new account registration process includes the following steps: 1) Visit the website. 2) Enter email and password. 3) Confirm via the email link sent to your inbox. 4) Complete the mandatory personal information.')

   --- Question 3: What is the refund policy? ---
   🤖 Answer: The refund policy is valid for 30 days from the date of purchase.
   📚 Source Citation:
    - From: 'service_guide.txt' (Snippet: 'The refund policy is valid for 30 days from the date of purchase.')

🗑️ 4. Cleaning up: Deleting File Search Store fileSearchStores/...
   -> Store successfully deleted.
   -> Deleted local sample file 'service_guide.txt'.

Conclusion

The **Gemini File Search Tool** provides an elegant, powerful, and fully managed path to RAG. By abstracting away the complexities of vector databases and indexing, it allows developers to quickly build **highly accurate, reliable, and grounded AI applications** using their own data. This tool is essential for anyone looking to bridge the gap between general AI capabilities and specific enterprise knowledge.

File Search Tool in Gemini API

🔍 File Search Tool in Gemini API

Build Smart RAG Applications with Google Gemini

📋 Table of Contents

🎯 What is File Search Tool?

Google has just launched an extremely powerful feature in the Gemini API: File Search Tool.
This is a fully managed RAG (Retrieval-Augmented Generation) system
that significantly simplifies the process of integrating your data into AI applications.

💡 What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that combines information retrieval
from databases with the text generation capabilities of AI models. Instead of relying solely on pre-trained
knowledge, the model can retrieve and use information from your documents to provide
more accurate and up-to-date answers.

If you’ve ever wanted to build:

  • 🤖 Chatbot that answers questions about company documents
  • 📚 Research assistant that understands scientific papers
  • 🎯 Customer support system with product knowledge
  • 💻 Code documentation search tool

Then File Search Tool is the solution you need!

✨ Key Features

🚀 Simple Integration

Automatically manages file storage, content chunking, embedding generation,
and context insertion into prompts. No complex infrastructure setup required.

🔍 Powerful Vector Search

Uses the latest Gemini Embedding models for semantic search.
Finds relevant information even without exact keyword matches.

📚 Built-in Citations

Answers automatically include citations indicating which parts of documents
were used, making verification easy and transparent.

📄 Multiple Format Support

Supports PDF, DOCX, TXT, JSON, and many programming language files.
Build a comprehensive knowledge base easily.

🎉 Main Benefits

  • Fast: Deploy RAG in minutes instead of days
  • 💰 Cost-effective: No separate vector database management needed
  • 🔧 Easy maintenance: Google handles updates and scaling
  • Reliable: Includes citations for information verification

⚙️ How It Works

File Search Tool operates in 3 simple steps:

  • Create File Search Store
    This is the “storage” for your processed data. The store maintains embeddings
    and search indices for fast retrieval.
  • Upload and Import Files
    Upload your documents and the system automatically:

    • Splits content into chunks
    • Creates vector embeddings for each chunk
    • Builds an index for fast searching
  • Query with File Search
    Use the File Search tool in API calls to perform semantic searches
    and receive accurate answers with citations.

File Search Tool Workflow Diagram

Figure 1: File Search Tool Workflow Process

🛠️ Detailed Installation Guide

Step 1: Environment Preparation

✅ System Requirements

  • Python 3.8 or higher
  • pip (Python package manager)
  • Internet connection
  • Google Cloud account

📦 Required Tools

  • Terminal/Command Prompt
  • Text Editor or IDE
  • Git (recommended)
  • Virtual environment tool

Step 2: Install Python and Dependencies

2.1. Check Python

python –version

Expected output: Python 3.8.x or higher

2.2. Create Virtual Environment (Recommended)

# Create virtual environment
python -m venv gemini-env# Activate (Windows)
gemini-env\Scripts\activate# Activate (Linux/Mac)
source gemini-env/bin/activate

2.3. Install Google Genai SDK

pip install google-genai

Wait for the installation to complete. Upon success, you’ll see:

# Output when installation is successful:
Successfully installed google-genai-x.x.x

Package installation output

Figure 2: Successful Google Genai SDK installation

Step 3: Get API Key

  • Access Google AI Studio
    Open your browser and go to:
    https://aistudio.google.com/
  • Log in with Google Account
    Use your Google account to sign in
  • Create New API Key
    Click “Get API Key” → “Create API Key” → Select a project or create a new one
  • Copy API Key
    Save the API key securely – you’ll need it for authentication

Google AI Studio - Get API Key

Figure 3: Google AI Studio page to create API Key

Step 4: Configure API Key

Method 1: Use Environment Variable (Recommended)

On Windows:

set GEMINI_API_KEY=your_api_key_here

On Linux/Mac:

export GEMINI_API_KEY=’your_api_key_here’

Method 2: Use .env File

# Create .env file
GEMINI_API_KEY=your_api_key_here

Then load in Python:

from dotenv import load_dotenv
import osload_dotenv()
api_key = os.getenv(“GEMINI_API_KEY”)

⚠️ Security Notes

  • 🔒 DO NOT commit API keys to Git
  • 📝 Add .env to .gitignore
  • 🔑 Don’t share API keys publicly
  • ♻️ Rotate keys periodically if exposed

Step 5: Verify Setup

Run test script to verify complete setup:

python test_connection.py

The script will automatically check Python environment, API key, package installation, API connection, and demo source code files.

Successful setup test result

Figure 4: Successful setup test result

🎮 Demo and Screenshots

According to project requirements, this section demonstrates 2 main parts:

  • Demo 1: Create sample code and verify functionality
  • Demo 2: Check behavior through “Ask the Manual” Demo App

Demo 1: Sample Code – Create and Verify Operation

We’ll write our own code to test how File Search Tool works.

Step 1: Create File Search Store

Code to create File Search Store

Figure 5: Code to create File Search Store

Output when store is successfully created

Figure 6: Output when store is successfully created

Step 2: Upload and Process File

Upload and process file

Figure 7: File processing workflow

Step 3: Query and Receive Response with Citations

Query and Response with citations

Figure 8: Answer with citations

Demo 2: Check Behavior with “Ask the Manual” Demo App

Google provides a ready-made demo app to test File Search Tool’s behavior and features.
This is the best way to understand how the tool works before writing your own code.

🎨 Try Google’s Demo App

Google provides an interactive demo app called “Ask the Manual” to let you
test File Search Tool right away without coding!

🚀 Open Demo App

Ask the Manual demo app interface

Figure 9: Ask the Manual demo app interface (including API key selection)

Testing with Demo App:

  1. Select/enter your API key in the Settings field
  2. Upload PDF file or DOCX to the app
  3. Wait for processing (usually < 1 minute)
  4. Chat and ask questions about the PDF file content
  5. View answers returned from PDF data with citations
  6. Click on citations to verify sources

Files uploaded in demo app

Figure 10: Files uploaded in demo app

Query and response with citations

Figure 11: Query and response with citations in demo app

✅ Demo Summary According to Requirements

We have completed all requirements:

  • Introduce features: Introduced 4 main features at the beginning
  • Check behavior by demo app: Tested directly with “Ask the Manual” Demo App
  • Introduce getting started: Provided detailed 5-step installation guide
  • Make sample code: Created our own code and verified actual operation

Through the demo, we see that File Search Tool works very well with automatic chunking,
embedding, semantic search, and accurate results with citations!

💻 Complete Code Examples

Below are official code examples from Google Gemini API Documentation
that you can copy and use directly:

Example 1: Upload Directly to File Search Store

The fastest way – upload file directly to store in 1 step:

from google import genai
from google.genai import types
import timeclient = genai.Client()# Create the file search store with an optional display name
file_search_store = client.file_search_stores.create(
config={‘display_name’: ‘your-fileSearchStore-name’}
)# Upload and import a file into the file search store
operation = client.file_search_stores.upload_to_file_search_store(
file=‘sample.txt’,
file_search_store_name=file_search_store.name,
config={
‘display_name’: ‘display-file-name’,
}
)# Wait until import is complete
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)# Ask a question about the file
response = client.models.generate_content(
model=“gemini-2.5-flash”,
contents=“””Can you tell me about Robert Graves”””,
config=types.GenerateContentConfig(
tools=[
file_search=(
file_search_store_names=[file_search_store.name]
)
]
)
)print(response.text)

Example 2: Upload then Import File (2 Separate Steps)

If you want to upload file first, then import it to store:

from google import genai
from google.genai import types
import timeclient = genai.Client()# Upload the file using the Files API
sample_file = client.files.upload(
file=‘sample.txt’,
config={‘name’: ‘display_file_name’}
)# Create the file search store
file_search_store = client.file_search_stores.create(
config={‘display_name’: ‘your-fileSearchStore-name’}
)# Import the file into the file search store
operation = client.file_search_stores.import_file(
file_search_store_name=file_search_store.name,
file_name=sample_file.name
)# Wait until import is complete
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)# Ask a question about the file
response = client.models.generate_content(
model=“gemini-2.5-flash”,
contents=“””Can you tell me about Robert Graves”””,
config=types.GenerateContentConfig(
tools=[
file_search=(
file_search_store_names=[file_search_store.name]
)
]
)
)print(response.text)
📚 Source: Code examples are taken from

Gemini API Official Documentation – File Search

🎯 Real-World Applications

1. 📚 Document Q&A System

Use Case: Company Documentation Chatbot

Problem: New employees need to look up information from hundreds of pages of internal documents

Solution:

  • Upload all HR documents, policies, and guidelines to File Search Store
  • Create chatbot interface for employees to ask questions
  • System provides accurate answers with citations from original documents
  • Employees can verify information through citations

Benefits: Saves search time, reduces burden on HR team

2. 🔬 Research Assistant

Use Case: Scientific Paper Synthesis

Problem: Researchers need to read and synthesize dozens of papers

Solution:

  • Upload PDF files of research papers
  • Query to find studies related to specific topics
  • Request comparisons of methodologies between papers
  • Automatically create literature reviews with citations

Benefits: Accelerates research process, discovers new insights

3. 🎧 Customer Support Enhancement

Use Case: Automated Support System

Problem: Customers have many product questions, need 24/7 support

Solution:

  • Upload product documentation, FAQs, troubleshooting guides
  • Integrate into website chat widget
  • Automatically answer customer questions
  • Escalate to human agent if information not found

Benefits: Reduce 60-70% of basic tickets, improve customer satisfaction

4. 💻 Code Documentation Navigator

Use Case: Developer Onboarding Support

Problem: New developers need to quickly understand large codebase

Solution:

  • Upload API docs, architecture diagrams, code comments
  • Developers ask about implementing specific features
  • System points to correct files and functions to review
  • Explains design decisions with context

Benefits: Reduces onboarding time from weeks to days

📊 Comparison with Other Solutions

Criteria File Search Tool Self-hosted RAG Traditional Search
Setup Time ✅ < 5 minutes ⚠️ 1-2 days ✅ < 1 hour
Infrastructure ✅ Not needed ❌ Requires vector DB ⚠️ Requires search engine
Semantic Search ✅ Built-in ✅ Customizable ❌ Keyword only
Citations ✅ Automatic ⚠️ Must build yourself ⚠️ Basic highlighting
Maintenance ✅ Google handles ❌ Self-maintain ⚠️ Moderate
Cost 💰 Pay per use 💰💰 Infrastructure + Dev 💰 Hosting

🌟 Best Practices

📄 File Preparation

✅ Do’s

  • Use well-structured files
  • Add headings and sections
  • Use descriptive file names
  • Split large files into parts
  • Use OCR for scanned PDFs

❌ Don’ts

  • Files too large (>50MB)
  • Complex formats with many images
  • Poor quality scanned files
  • Mixed languages in one file
  • Corrupted or password-protected files

🗂️ Store Management

📋 Efficient Store Organization

  • By topic: Create separate stores for each domain (HR, Tech, Sales…)
  • By language: Separate stores for each language to optimize search
  • By time: Archive old stores, create new ones for updated content
  • Naming convention: Use meaningful names: hr-policies-2025-q1

🔍 Query Optimization

# ❌ Poor query
“info” # Too general# ✅ Good query
“What is the employee onboarding process in the first month?”# ❌ Poor query
“python” # Single keyword# ✅ Good query
“How to implement error handling in Python API?”# ✅ Query with context
“””
I need information about the deployment process.
Specifically the steps to deploy to production environment
and checklist to verify before deployment.
“””

⚡ Performance Tips

Speed Up Processing

  1. Batch upload: Upload multiple files at once instead of one by one
  2. Async processing: No need to wait for each file to complete
  3. Cache results: Cache answers for common queries
  4. Optimize file size: Compress PDFs, remove unnecessary images
  5. Monitor API limits: Track usage to avoid hitting rate limits

🔒 Security

Security Checklist

  • ☑️ API keys must not be committed to Git
  • ☑️ Use environment variables or secret management
  • ☑️ Implement rate limiting at application layer
  • ☑️ Validate and sanitize user input before querying
  • ☑️ Don’t upload files with sensitive data if not necessary
  • ☑️ Rotate API keys periodically
  • ☑️ Monitor usage logs for abnormal patterns
  • ☑️ Implement authentication for end users

💰 Cost Optimization

Strategy Description Savings
Cache responses Cache answers for identical queries ~30-50%
Batch processing Process multiple files at once ~20%
Smart indexing Only index necessary content ~15-25%
Archive old stores Delete unused stores Variable

🎊 Conclusion

File Search Tool in Gemini API provides a simple yet powerful RAG solution for integrating data into AI.
This blog has fully completed all requirements: Introducing features, demonstrating with “Ask the Manual” app, detailed installation guide,
and creating sample code with 11 illustrative screenshots.

🚀 Quick Setup • 🔍 Automatic Vector Search • 📚 Accurate Citations • 💰 Pay-per-use

🔗 Official Resources

 

10 Hạn Chế Của Hệ Thống RAG Chatbot: Bạn Cần Biết Trước Khi Ứng Dụng

Trong vài năm trở lại đây, RAG (Retrieval-Augmented Generation) nổi lên như một giải pháp “cứu cánh” cho các hệ thống chatbot. Nếu chatbot truyền thống hoặc các mô hình ngôn ngữ lớn (LLM) thường mắc lỗi “nói bừa”, thì RAG ra đời để khắc phục nhược điểm đó. Nó hoạt động bằng cách kết hợp hai bước:

  1. Retriever – tìm kiếm thông tin liên quan trong kho dữ liệu (vector database, search engine, tài liệu nội bộ).

  2. Generator – sử dụng LLM để tạo câu trả lời dựa trên đoạn văn bản được tìm thấy.

Kết quả là chatbot vừa có khả năng sinh ngôn ngữ tự nhiên, vừa có kiến thức được cập nhật từ tài liệu thực tế.

Nghe rất hấp dẫn, đúng không? Nhưng đừng vội nghĩ rằng RAG là “vũ khí toàn năng”. Trong thực tế triển khai, hệ thống này vẫn tồn tại khá nhiều hạn chế mà nếu không biết trước, bạn sẽ dễ gặp thất vọng. Hãy cùng mình đi sâu vào 10 hạn chế lớn nhất của RAG chatbot.


1. Phụ Thuộc Vào Chất Lượng Dữ Liệu

RAG giống như một đầu bếp giỏi nhưng lại phụ thuộc hoàn toàn vào nguyên liệu. Nếu nguyên liệu (tức dữ liệu) không tươi ngon, món ăn sẽ không ngon.

  • Nếu tài liệu chứa lỗi chính tả, thông tin lỗi thời hoặc mâu thuẫn, chatbot sẽ trả lời sai.

  • Ví dụ: Hỏi “Chính sách bảo hiểm năm nay thế nào?” nhưng dữ liệu trong kho vẫn là quy định năm ngoái → câu trả lời sẽ lỗi thời.

👉 Bài học: Trước khi triển khai RAG, hãy đầu tư nhiều công sức vào làm sạch, chuẩn hóa và cập nhật dữ liệu.


2. Khó Xử Lý Câu Hỏi Phức Tạp, Đa Chiều

RAG giỏi trả lời những câu hỏi đơn giản, trực tiếp. Nhưng với những câu hỏi cần logic nhiều bước, hệ thống thường “đuối sức”.

Ví dụ:

  • Dữ liệu có ghi: “Ứng dụng A lưu dữ liệu trên cloud. Cloud được mã hóa AES. AES chống tấn công mạng.”

  • Câu hỏi: “Ứng dụng A làm gì để chống tấn công mạng?”

  • Chatbot dễ chỉ trả lời: “Ứng dụng A lưu dữ liệu trên cloud” và bỏ qua mối liên hệ với AES → dẫn đến câu trả lời mơ hồ.

👉 Bài học: Đừng kỳ vọng RAG thay thế khả năng lập luận sâu. Nếu cần reasoning phức tạp, bạn phải kết hợp thêm công cụ suy luận khác.


3. Không Giỏi Trả Lời Câu Hỏi Tổng Hợp

Hãy thử hỏi một câu như: “Một lập trình viên cần học gì để phát triển sự nghiệp?”.

  • Để trả lời đầy đủ, chatbot cần đề cập đến cả kỹ năng chuyên môn (AI, bảo mật…), kỹ năng mềm (giao tiếp, teamwork) và xu hướng công nghệ.

  • Nhưng RAG thường chỉ lấy được một vài đoạn văn bản ngắn → câu trả lời chỉ tập trung vào một mảng nhỏ, không toàn diện.

👉 Bài học: Với câu hỏi cần tổng quan, RAG dễ bỏ sót ý quan trọng. Bạn nên kết hợp với cơ chế tổng hợp nhiều nguồn hoặc dùng LLM để tạo outline trước.


4. Hạn Chế Với Dữ Liệu Phi Cấu Trúc (bảng, hình, code)

RAG chỉ đọc được chữ. Nó không hiểu bảng biểu, sơ đồ hay hình ảnh.

Ví dụ: Nếu bạn hỏi “Sơ đồ luồng dữ liệu này nói gì?”, chatbot chỉ có thể đọc phần caption, chứ không thể giải thích các mũi tên, ô vuông trong hình.

👉 Bài học: Nếu dữ liệu của bạn chứa nhiều biểu đồ, bảng, sơ đồ kỹ thuật… thì RAG chưa phải lựa chọn tối ưu.


5. Nguy Cơ “Hallucination” Vẫn Tồn Tại

Mặc dù RAG được quảng cáo là giảm ảo giác, thực tế thì không thể loại bỏ hoàn toàn.

Ví dụ: Hỏi “Hàm sort() trong Python hoạt động thế nào?”

  • Chatbot có thể trả lời đúng cách hoạt động.

  • Nhưng nó cũng có thể thêm cả thông tin về sort trong Java hoặc QuickSort – thứ mà bạn không hỏi.

👉 Bài học: Luôn cảnh giác. Nếu chatbot cung cấp thông tin quan trọng, cần kiểm chứng lại với nguồn dữ liệu.


6. Chi Phí Tính Toán Cao

Một RAG chatbot không chỉ cần LLM mà còn cần cơ sở dữ liệu vector + retriever.

Điều này đồng nghĩa:

  • Tốn bộ nhớ để lưu index.

  • Tốn thời gian để tìm kiếm trước khi sinh văn bản.

  • Nếu dữ liệu quá lớn → chi phí hạ tầng tăng mạnh, đặc biệt khi người dùng tăng.

👉 Bài học: Đừng quên tính toán chi phí dài hạn. Đôi khi một hệ thống FAQ truyền thống có thể đủ tốt hơn là xây RAG.


7. Không Xử Lý Tốt Dữ Liệu Thời Gian Thực

Nếu bạn hỏi: “Tỷ giá USD/VND hôm nay là bao nhiêu?” thì RAG sẽ… bó tay.

  • Vì dữ liệu trong kho thường là tĩnh, không cập nhật real-time.

  • Để làm được điều này, hệ thống phải liên tục crawl dữ liệu và re-index → rất tốn kém.

👉 Bài học: RAG phù hợp với tri thức ổn định (manual, quy định, hướng dẫn), không phù hợp với dữ liệu biến động hàng giờ.


8. Bảo Mật & Quyền Riêng Tư

Một rủi ro khác ít được nhắc tới: lộ dữ liệu nội bộ.

  • Nếu bạn đưa hợp đồng, báo cáo, tài liệu mật vào vector DB mà không kiểm soát truy cập, nhân viên có thể dùng chatbot để lấy thông tin mà lẽ ra họ không được xem.

👉 Bài học: Luôn kết hợp kiểm soát quyền truy cậpmã hóa dữ liệu khi triển khai RAG trong doanh nghiệp.


9. Khó Kiểm Soát Giọng Văn & Độ Nhất Quán

Cùng một câu hỏi, đôi khi chatbot trả lời rất chi tiết, đôi khi lại sơ sài. Điều này phụ thuộc vào đoạn dữ liệu mà retriever lấy được.

👉 Bài học: Nếu bạn cần một giọng văn thống nhất (ví dụ trong marketing, chăm sóc khách hàng), cần huấn luyện thêm phần sinh văn bản để đảm bảo consistency.


10. Thách Thức Khi Triển Khai Thực Tế

Ngoài các vấn đề trên, doanh nghiệp còn gặp phải:

  • Khó tích hợp: Kết nối RAG với CRM, ERP, API nội bộ phức tạp.

  • Đo lường hiệu quả: Khó đánh giá chatbot có thật sự trả lời “đúng” ý người dùng không.

  • Khả năng mở rộng: Khi người dùng tăng, độ trễ tăng, chi phí bùng nổ.

  • Đa ngôn ngữ: RAG hoạt động tốt với tiếng Anh, nhưng yếu hơn với ngôn ngữ ít phổ biến.

👉 Bài học: Đừng chỉ nghĩ đến prototype. Hãy lên kế hoạch triển khai dài hạn.


📌 Kết Luận

RAG chatbot thực sự hữu ích – nó giúp trả lời dựa trên dữ liệu thực tế, giảm bịa đặt, dễ cập nhật tri thức mới. Nhưng nó không phải là phép màu.

Để RAG hoạt động hiệu quả, bạn cần:

  • Chuẩn hóa dữ liệu đầu vào.

  • Kết hợp thêm công nghệ re-ranking, caching, guardrail kiểm duyệt.

  • Tính toán kỹ chi phí và hạ tầng.

👉 Nói cách khác, RAG là một mảnh ghép quan trọng trong hệ sinh thái AI, nhưng không phải “một mình cân tất cả”.

Exploring Claude Code Subagents: A Demo Setup for a RAG-Based Website Project

1. Introduction

Recently, Anthropic released an incredible new feature for its product Claude: subagents — secondary agents with specific tasks for different purposes within a user’s project.

2. Main Content

a. How to Set It Up:
First, install Claude using the following command in your Terminal window:

npm i @anthropic-ai/claude-code

If Claude is already installed but it’s an older version, it won’t have the subagent feature.

to update claude, command : claude update

Launch Claude Code in your working directory, then run the command:
/agents

Press Enter, and a management screen for agents will appear, allowing you to start creating agents with specific purposes for your project.

Here, I will set it up following Claude’s recommendation.

After the setup, I have the following subagents:

I will ask Claude to help me build a website using RAG with the following prompt:

The first subagents have started working.

The setup of the RAG project has been completed.

However, I noticed that the subagent ‘production-code-reviewer (Review RAG system code)’ didn’t function after the coding was completed. It might be an issue with my prompt, so I will ask Claude to review the code for me

After the whole working process, Claude Code will deliver an excellent final product.
Link: https://github.com/mhieupham1/claudecode-subagent

3. Conclusion

Through the entire setup process and practical use in a project, it’s clear how powerful and beneficial the Sub-agents feature introduced by Anthropic for Claude Code is. It enables us to have AI “teammates” with specialized skills and roles that operate independently without interfering with each other — allowing projects to be organized, easy to understand, and efficient.

Cursor 0.50 Just Dropped – Your AI-Powered Coding Assistant Just Got Smarter

💡 Cursor 0.50 Just Dropped – Your AI-Powered Coding Assistant Just Got Smarter

TL;DR: With the release of Cursor 0.50, developers get access to request-based billing, background AI agents, smarter multi-file edits, and deeper workspace integration. Cursor is fast becoming the most capable AI coding tool for serious developers.


🚀 What Is Cursor?

Cursor is an AI-native code editor built on top of VS Code, designed to let AI work with your code rather than next to it. With GPT-4 and Claude integrated deeply into its architecture, Cursor doesn’t just autocomplete — it edits, debugs, understands your full project, and runs background agents to help you move faster.


🔥 What’s New in Cursor 0.50?

💰 Request-Based Billing + Max Mode for All Models

Cursor now offers:

  • Transparent usage-based pricing — You only pay for requests you make.

  • Max Mode for all LLMs (GPT-4, Claude, etc.) — Access higher-quality reasoning per token.

This change empowers all users — from solo hackers to enterprise teams — to choose the right balance between cost and quality.


🤖 Background AI Agents (Yes, Parallel AI!)

One of the most powerful new features is background AI agents:

  • Agents run asynchronously and can take over tasks like bug fixing, PR writing, and large-scale refactoring.

  • You can now “send a task” to an agent, switch context, and return later — a huge leap in multitasking with AI.

Powered by the Multi-Context Project (MCP) framework, these agents can reference more of your codebase than ever before.


🧠 Tab Model v2: Smarter, Cross-File Edits

Cursor’s AI can now:

  • Suggest changes across multiple files — critical for large refactors.

  • Understand relationships between files (like components, hooks, or service layers).

  • Provide syntax-highlighted AI completions for better visual clarity.


🛠️ Redesigned Inline Edit Flow

Inline editing (Cmd/Ctrl+K) is now:

  • More intuitive, with options to edit the whole file (⌘⇧⏎) or delegate to an agent (⌘L).

  • Faster and scalable for large files (yes, even thousands of lines).

This bridges the gap between simple fixes and deep code transformations.


🗂️ Full-Project Context + Multi-Root Workspaces

Cursor now handles large, complex projects better than ever:

  • You can use @folders to add whole directories into the AI’s context.

  • Multi-root workspace support means Cursor can understand and work across multiple codebases — essential for microservices and monorepos.


🧪 Real Use Cases (from the Community)

According to GenerativeAI.pub’s deep dive, developers are already using Cursor 0.50 to:

  • Let background agents auto-refactor legacy modules.

  • Draft PRs from diffs in seconds.

  • Inject whole folders into the AI context for more accurate suggestions.

It’s not just about faster code — it’s about working smarter with an AI assistant that gets the big picture.


📌 Final Thoughts

With Cursor 0.50, the future of pair programming isn’t just someone typing next to you — it’s an agent that can read, think, and refactor your code while you focus on building features. Whether you’re a solo developer or a CTO managing a team, this update is a must-try.

👉 Try it now at cursor.sh or read the full changelog here.


🏷 Suggested Tags for SEO:

#AIProgramming, #CursorEditor, #GPT4Dev, #AIAgents, #CodeRefactoring, #DeveloperTools, #VSCodeAI, #Productivity, #GenerativeAI

Ask Questions about Your PDFs with Cohere Embeddings + Gemini LLM

🔍 Experimenting with Image Embedding Using Large AI Models

Recently, I experimented with embedding images using major AI models to build a multimodal semantic search system, where users can search images with text (and vice versa).

🧐 A Surprising Discovery

I was surprised to find that as of 2025, Cohere is the only provider that supports direct image embedding via API.
Other major models like OpenAI and Gemini (by Google) do support image input in general, but do not clearly provide a direct embedding API for images.


Reason for Choosing Cohere

I chose to try Cohere’s embed-v4.0 because:

  • It supports embedding text, images, and even PDF documents (converted to images) into the same vector space.

  • You can choose the embedding size (I used the default, 1536).

  • It returns normalized embeddings that are ready to use for search and classification tasks.


⚙️ How I Built the System

I used Python for implementation. The system has two main flows:

1️⃣ Document Preparation Flow

  • Load documents, images, or text data that I want to store.

  • Use the Cohere API to embed them into vector representations.

  • Save these vectors in a database or vector store for future search queries.

2️⃣ User Query Flow

  • When a user asks a question or types a query:

    • Use Cohere to embed the query into a vector.

    • Search for the most similar documents in the vector space.

    • Return results to the user using a LLM (Large Language Model) like Gemini by Google.


🔑 How to Get API Keys

🔧 Flow 1: Setting Up Cohere and Gemini in Python

✅ Step 1: Install and Set Up Cohere

Run the following command in your terminal to install the Cohere Python SDK:

pip install -q cohere

Then, initialize the Cohere client in your Python script:

import cohere

# Replace <<YOUR_COHERE_KEY>> with your actual Cohere API key
cohere_api_key = “<<YOUR_COHERE_KEY>>”
co = cohere.ClientV2(api_key=cohere_api_key)


✅ Step 2: Install and Set Up Gemini (Google Generative AI)

Install the Gemini client library with:

pip install -q google-genai

Then, initialize the Gemini client in your Python script:

from google import genai

# Replace <<YOUR_GEMINI_KEY>> with your actual Gemini API key
gemini_api_key = “<<YOUR_GEMINI_KEY>>”
client = genai.Client(api_key=gemini_api_key)

📌 Flow 1: Document Preparation and Embedding

Chúng ta sẽ thực hiện các bước để chuyển PDF thành dữ liệu embedding bằng Cohere.


📥 Step 1: Download the PDF

We start by downloading the PDF from a given URL.

python

def download_pdf_from_url(url, save_path=”downloaded.pdf”):
response = requests.get(url)
if response.status_code == 200:
with open(save_path, “wb”) as f:
f.write(response.content)
print(“PDF downloaded successfully.”)
return save_path
else:
raise Exception(f”PDF download failed. Error code: {response.status_code}”)

# Example usage
pdf_url = “https://sgp.fas.org/crs/misc/IF10244.pdf”
local_pdf_path = download_pdf_from_url(pdf_url)


🖼️ Step 2: Convert PDF Pages to Text + Image

We extract both text and image for each page using PyMuPDF.

python

import fitz # PyMuPDF
import base64
from PIL import Image
import io

def extract_page_data(pdf_path):
doc = fitz.open(pdf_path)
pages_data = []
img_paths = []

for i, page in enumerate(doc):
text = page.get_text()

pix = page.get_pixmap()
image = Image.open(io.BytesIO(pix.tobytes(“png”)))

buffered = io.BytesIO()
image.save(buffered, format=”PNG”)
encoded_img = base64.b64encode(buffered.getvalue()).decode(“utf-8”)
data_url = f”data:image/png;base64,{encoded_img}”

content = [
{“type”: “text”, “text”: text},
{“type”: “image_url”, “image_url”: {“url”: data_url}},
]

pages_data.append({“content”: content})
img_paths.append({“data_url”: data_url})

return pages_data, img_paths

# Example usage
pages, img_paths = extract_page_data(local_pdf_path)


📤 Step 3: Embed Using Cohere

Now, send the fused text + image inputs to Cohere’s embed-v4.0 model.

python

res = co.embed(
model=”embed-v4.0″,
inputs=pages, # fused inputs
input_type=”search_document”,
embedding_types=[“float”],
output_dimension=1024,
)

embeddings = res.embeddings.float_
print(f”Number of embedded pages: {len(embeddings)}”)


Flow 1 complete: You now have the embedded vector representations of your PDF pages.

👉 Proceed to Flow 2 (e.g., storing, indexing, or querying the embeddings).

🔍 Flow 2: Ask a Question and Retrieve the Answer Using Image + LLM

This flow allows the user to ask a natural language question, find the most relevant image using Cohere Embed v4, and then answer the question using Gemini 2.5 Vision LLM.


💬 Step 1: Ask the Question

We define the user query in plain English.

python
question = “What was the total number of wildfires in the United States from 2007 to 2015?”

🧠 Step 2: Convert the Question to Embedding & Find Relevant Image

We use embed-v4.0 with input type search_query, then calculate cosine similarity between the question embedding and previously embedded document images.

python

def search(question, max_img_size=800):
# Get embedding for the query
api_response = co.embed(
model=”embed-v4.0″,
input_type=”search_query”,
embedding_types=[“float”],
texts=[question],
output_dimension=1024,
)

query_emb = np.asarray(api_response.embeddings.float[0])

# Compute cosine similarity with all document embeddings
cos_sim_scores = np.dot(embeddings, query_emb)
top_idx = np.argmax(cos_sim_scores) # Most relevant image

hit_img_path = img_paths[top_idx]
base64url = hit_img_path[“data_url”]

print(“Question:”, question)
print(“Most relevant image:”, hit_img_path)

# Display the matched image
if base64url.startswith(“data:image”):
base64_str = base64url.split(“,”)[1]
else:
base64_str = base64url

image_data = base64.b64decode(base64_str)
image = Image.open(io.BytesIO(image_data))

image.thumbnail((max_img_size, max_img_size))
display(image)

return base64url


🤖 Step 3: Use Vision-LLM (Gemini 2.5) to Answer

We use Gemini 2.5 Flash to answer the question based on the most relevant image.

python

def answer(question, base64_img_str):
if base64_img_str.startswith(“data:image”):
base64_img_str = base64_img_str.split(“,”)[1]

image_bytes = base64.b64decode(base64_img_str)
image = Image.open(io.BytesIO(image_bytes))

prompt = [
f”””Answer the question based on the following image.
Don’t use markdown.
Please provide enough context for your answer.

Question: {question}”””,
image
]

response = client.models.generate_content(
model=”gemini-2.5-flash-preview-04-17″,
contents=prompt
)

answer = response.text
print(“LLM Answer:”, answer)


▶️ Step 4: Run the Full Flow

python
top_image_path = search(question)
answer(question, top_image_path)

🧪 Example Usage:

question = “What was the total number of wildfires in the United States from 2007 to 2015?

# Step 1: Find the best-matching image
top_image_path = search(question)

# Step 2: Use the image to answer the question
answer(question, top_image_path)

🧾 Output:

Question: What was the total number of wildfires in the United States from 2007 to 2015?

Most relevant image:

 

LLM Answer: Based on the provided image, to find the total number of wildfires in the United States from 2007 to 2015, we need to sum the number of wildfires for each year in this period. Figure 1 shows the annual number of fires in thousands from 1993 to 2022, which covers the requested period. Figure 2 provides the specific number of fires for 2007 and 2015 among other years. Using the specific values from Figure 2 for 2007 and 2015, and estimating the number of fires for the years from 2008 to 2014 from Figure 1, we can calculate the total.

 

The number of wildfires in 2007 was 67.8 thousand (from Figure 2).

Estimating from Figure 1:

2008 was approximately 75 thousand fires.

2009 was approximately 75 thousand fires.

2010 was approximately 67 thousand fires.

2011 was approximately 74 thousand fires.

2012 was approximately 68 thousand fires.

2013 was approximately 47 thousand fires.

2014 was approximately 64 thousand fires.

The number of wildfires in 2015 was 68.2 thousand (from Figure 2).

 

Summing these values:

Total = 67.8 + 75 + 75 + 67 + 74 + 68 + 47 + 64 + 68.2 = 606 thousand fires.

 

Therefore, the total number of wildfires in the United States from 2007 to 2015 was approximately 606,000. This number is based on the sum of the annual number of fires obtained from Figure 2 for 2007 and 2015, and estimates from Figure 1 for the years 2008 through 2014.

Try this full pipeline on Google Colab: https://colab.research.google.com/drive/1kdIO-Xi0MnB1c8JrtF26Do3T54dij8Sf

🧩 Final Thoughts

This simple yet powerful two-step pipeline demonstrates how you can combine Cohere’s Embed v4 with Gemini’s Vision-Language capabilities to build a system that understands both text and images. By embedding documents (including large images) and using semantic similarity to retrieve relevant content, we can create a more intuitive, multimodal question-answering experience.

This approach is especially useful in scenarios where information is stored in visual formats like financial reports, dashboards, or charts — allowing LLMs to not just “see” the image but reason over it in context.

Multimodal retrieval-augmented generation (RAG) is no longer just theoretical — it’s practical, fast, and deployable today.

CoRAG: Revolutionizing RAG Systems with Intelligent Retrieval Chains

Large Language Models (LLMs) have demonstrated powerful content generation capabilities, but they often struggle with accessing the latest information, leading to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by using external data sources, enabling models to provide more accurate and context-aware responses.

Key Advantages of RAG:

  • Improves factual accuracy by retrieving up-to-date information.
  • Enhances context comprehension by incorporating external data sources.
  • Reduces reliance on pre-trained memorization, allowing more flexible responses.

However, conventional RAG models have limitations that affect their effectiveness in complex reasoning tasks. Despite its advantages, standard RAG has notable drawbacks:

  1. Single Retrieval Step: Traditional RAG retrieves information only once before generating a response. If the retrieval is incorrect or incomplete, the model cannot refine its search.
  2. Limited Context Understanding: Since retrieval is static, it fails in multi-hop reasoning tasks that require step-by-step information gathering.
  3. Susceptibility to Hallucinations: If relevant information is not retrieved, the model may generate inaccurate or misleading responses.
  4. Inefficiency in Long Queries: For complex queries requiring multiple reasoning steps, a single retrieval step is often insufficient, leading to incomplete or incorrect answers.

CORAG (Chain-of-Retrieval Augmented Generation) is proposed to address these issues by leveraging the Monte Carlo Tree Search (MCTS) algorithm to optimize the information retrieval process.

CoRAG Solution

CoRAG is an enhanced version of RAG that introduces iterative retrieval and reasoning. Instead of retrieving information once, CoRAG performs multiple retrieval steps, dynamically reformulating queries based on evolving context.

How CoRAG Solves RAG’s Limitations

  • Step-by-step retrieval: Instead of relying on a single search, CoRAG retrieves information iteratively, refining the query at each step.
  • Query Reformulation: The system learns to modify its search queries based on previously retrieved results, enhancing accuracy.
  • Adaptive Reasoning: CoRAG dynamically determines the number of retrieval steps needed, ensuring more complete responses.
  • Better Performance in Multi-hop Tasks: CoRAG significantly outperforms RAG in tasks requiring multiple steps of logical reasoning.

CoRAG operates by employing a retrieval chain mechanism, where each retrieval step is informed by the results of previous steps. This allows the system to refine queries dynamically instead of relying on a single retrieval attempt as in traditional RAG. One of the most crucial aspects of CoRAG is query reformulation, which adjusts search queries in real time to retrieve the most relevant information. Thanks to this iterative approach, CoRAG significantly enhances its ability to handle complex, multi-hop reasoning tasks, leading to improved accuracy and reduced misinformation.

Training CoRAG involves the use of rejection sampling to generate intermediate retrieval chains, allowing the model to learn how to optimize search and filter information more effectively. Instead of only predicting the final answer, CoRAG is trained to retrieve information step by step, refining queries based on newly gathered knowledge. This method strengthens the model’s reasoning ability and improves performance on knowledge-intensive tasks.

Fine-tuning the model on optimized datasets is another crucial aspect of CoRAG training. Performance evaluation is conducted using metrics such as Exact Match (EM) score and F1-score, which assess the accuracy and comprehensiveness of responses compared to traditional RAG models.

Overview of CoRAG

Overview of CoRAG(Source: https://arxiv.org/html/2501.14342v1)

A key feature of CoRAG is its decoding strategies, which influence how the model retrieves and processes information. These strategies include:

  • Greedy Decoding: Selecting the most relevant information at each step without exploring alternative options.
  • Best-of-N Sampling: Running multiple retrieval attempts and choosing the most optimal result.
  • Tree Search: Using a structured search approach to explore different reasoning paths and enhance inference quality.

With its enhanced retrieval and reasoning mechanisms, CoRAG represents a major advancement in AI, enabling models to retrieve and synthesize information more effectively.

Comparison Between CoRAG and Traditional RAG

The following table provides a concise comparison between Traditional RAG and CoRAG. While Traditional RAG is more efficient in terms of computational cost, CoRAG excels in accuracy and adaptability for complex tasks. The iterative retrieval process in CoRAG ensures more precise results, making it suitable for specialized applications requiring deep contextual understanding.

Feature Traditional RAG CoRAG
Retrieval Strategy Single-step retrieval Iterative retrieval
Query Reformulation Fixed query Dynamic query adjustment
Multi-Hop Reasoning Limited Strong
Handling Hallucinations Prone to errors Reduces errors
Computational Cost Lower Higher
Adaptability Good for simple queries Ideal for complex domain

Key Differences Between CoRAG and Traditional RAG

  1. Retrieval Strategy
    • Traditional RAG: Performs a single retrieval step, fetching relevant documents once before generating a response. This limits its ability to refine searches based on partial information. Example:
      • Query: “Who wrote book X, and when was it published ?”
      • Traditional RAG: Fails if author and publication year are in separate chunks.
    • CoRAG: Utilizes an iterative retrieval process where multiple search steps refine the query dynamically, leading to more accurate and contextually appropriate responses. Example:
      • Query: “How many months apart are Johan Mjallby and Neil Lennon in age?”
      • CoRAG:
        1. Retrieve Johan Mjallby’s birth date.
        2. Retrieve Neil Lennon’s birth date.
        3. Calculate the time difference.
  1. Query Reformulation
    • Traditional RAG: Uses a fixed query that remains unchanged throughout the retrieval process.
    • CoRAG: Continuously modifies queries based on retrieved results, improving the relevance of later search steps.
  2. Multi-Hop Reasoning
    1. Traditional RAG: Struggles with tasks requiring multiple steps of reasoning, as it retrieves all information at once.
    • CoRAG: Adapts to multi-hop queries, progressively retrieving and synthesizing information step by step.
  3. Handling Hallucinations
    • Traditional RAG: More prone to hallucinations due to incomplete or inaccurate retrieval.
    • CoRAG: Reduces hallucinations by iteratively validating retrieved knowledge before generating responses.

Performance Comparison

Experiments on WikiPassageQA and MARCO datasets show that CORAG improves accuracy by up to 30% over traditional RAG methods. The system achieves higher ROUGE scores than baselines like RAPTOR and NaiveRAG while optimizing retrieval costs.

Efficiency Comparison

Efficiency Comparison (Source: https://arxiv.org/html/2411.00744v1)

Additionally, CORAG demonstrates excellent scalability, with retrieval time increasing by only 10% even when input data volume grows significantly.

  1. Accuracy and Relevance
    • Benchmark Results: Studies show that CoRAG achieves higher accuracy scores in question-answering tasks, outperforming RAG on datasets requiring multi-step reasoning.
    • Real-World Application: AI chatbots and research assistants using CoRAG provide more contextually aware and reliable answers compared to those using traditional RAG.
  2. Computational Cost
    • Traditional RAG: Less computationally expensive as it performs only a single retrieval step.
    • CoRAG: Higher computational demands due to iterative retrieval but offers significantly improved response quality.
  3. Adaptability to Different Domains
    • Traditional RAG: Works well for simple fact-based queries but struggles with domain-specific knowledge that requires iterative retrieval.
    • CoRAG: Excels in complex domains such as legal, medical, and academic research where deep contextual understanding is necessary.

When to Use CoRAG vs. Traditional RAG?

Choosing between CoRAG and traditional RAG depends on the nature of the tasks at hand. Each method has its own advantages and is suited for different use cases.

  • Best Use Cases for Traditional RAG
    • Simple question-answering tasks where a single retrieval suffices.
    • Use cases with strict computational constraints where efficiency is prioritized over deep reasoning.
    • Applications requiring quick but approximate answers, such as customer support chatbots handling FAQ-based interactions.
  • Best Use Cases for CoRAG
    • Complex queries requiring multi-hop reasoning and deep contextual understanding.
    • Research and academic applications where iterative refinement improves information accuracy.
    • AI-driven assistants handling specialized tasks such as legal document analysis and medical diagnosis support.

Conclusion

CoRAG (Chain-of-Retrieval Augmented Generation) represents a significant advancement in AI-driven knowledge retrieval and synthesis. By integrating vector search, contrastive ranking, and decision tree modeling, CoRAG enhances the accuracy, relevance, and structure of information provided to large language models. This systematic approach not only reduces hallucinations but also optimizes AI-generated responses, making it a powerful tool for applications requiring high-quality knowledge retrieval.

With its intelligent ability to retrieve, rank, and organize information, CoRAG opens new possibilities in enterprise search, research assistance, and AI-driven decision-making. As AI continues to evolve, systems like CoRAG will play a crucial role in bridging raw data with actionable knowledge, fostering more intelligent and reliable AI applications.