NotebookLM – Tương Lai của Việc Đọc Hiểu Tài Liệu Bằng AI

https://9to5google.com/wp-content/uploads/sites/4/2025/05/NotebookLM-app-cover.jpg?quality=82&strip=all&w=1600https://static0.xdaimages.com/wordpress/wp-content/uploads/wm/2025/08/using-notebooklm-to-read-legal-documents-7.jpg?dpr=2&fit=crop&q=49&w=825 https://static0.xdaimages.com/wordpress/wp-content/uploads/wm/2025/08/using-notebooklm-to-read-legal-documents-7.jpg?dpr=2&fit=crop&q=49&w=825

Trong thế giới nơi tài liệu kỹ thuật, hợp đồng, đặc tả, và báo cáo ngày càng phình to theo cấp số nhân, khả năng đọc hiểu – tổng hợp – và truy xuất kiến thức đang trở thành năng lực thiết yếu của mọi cá nhân lẫn doanh nghiệp.

Đáp lại nhu cầu này, Google giới thiệu NotebookLM, một công cụ AI được thiết kế riêng để xử lý thông tin phức tạp trong tài liệu, trích xuất tri thức, tạo ghi chú, và trả lời câu hỏi với độ chính xác cao — kèm trích dẫn đúng từng đoạn.

Và năm 2025, NotebookLM trở thành lựa chọn số 1 thế giới trong mảng AI đọc tài liệu.

1. NotebookLM là gì?

https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/3_panel_ui_keyword_final_7_sources.gif     https://assets.st-note.com/img/1745997515-zsBxwnu9TFHJDmkWShYb7vAP.jpg

NotebookLM là nền tảng AI của Google giúp bạn tải lên bất kỳ tài liệu nào (PDF, Docs, Slides, Sheets, PowerPoint…), sau đó AI:

  • Đọc toàn bộ nội dung

  • Hiểu cấu trúc và mối quan hệ giữa các phần

  • Tạo notebook (bản ghi chú thông minh)

  • Cho phép bạn chat và đặt câu hỏi

  • Luôn trả lời kèm citation chính xác 100%

  • Tạo tóm tắt, glossary, Q&A, outline…

NotebookLM không dùng kiến thức Internet → mọi câu trả lời đều dựa hoàn toàn vào tài liệu bạn cung cấp.

2. Vì sao NotebookLM vượt trội hơn mọi AI khác trong mảng đọc tài liệu?

https://www.computerworld.com/wp-content/uploads/2025/04/1611774-0-00162900-1746020396-google-notebooklm-07-gemini-response-with-citations.png   Smart Note-Taking with NotebookLM – AI Assistant for Your Ideas   https://storage.googleapis.com/gweb-uniblog-publish-prod/images/NotebookLM-Tips_Hero.width-1300.png

NotebookLM nổi bật nhờ 5 điểm vượt trội:

1. Độ chính xác khi đọc tài liệu đạt mức “enterprise grade”

Gemini 1.5/2.0 + nền tảng Document AI của Google cho phép:

  • Phân tích layout

  • Table extraction

  • Semantic segmentation

  • Heading detection

  • Multi-level topic understanding

Không có AI nào trong năm 2024–2025 làm tốt hơn.


2. Trích dẫn chuẩn xác tuyệt đối

Không còn chuyện AI bịa.

Mỗi câu trả lời đều highlight đoạn trong tài liệu.

https://storage.googleapis.com/gweb-uniblog-publish-prod/images/NotebookLM_StudentBlogHeader_DiscoverSources_.width-1300.png  https://www.computerworld.com/wp-content/uploads/2025/04/1611774-0-00162900-1746020396-google-notebooklm-07-gemini-response-with-citations.png

3. Tự tạo “Notebook thông minh” – như Wikipedia của tài liệu

NotebookLM tạo:

  • Overview

  • Key Ideas

  • Glossary

  • Insights

  • Suggested Questions

  • Topic Map

https://substackcdn.com/image/fetch/%24s_%21p-84%21%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F238d7d4f-9925-496d-ba9d-4c36d2ab8ed4_1562x920.png  https://masterconcept.ai/wp-content/uploads/2025/10/NotebookLM-New-Features.png

4. Xử lý tài liệu dài cực ổn định (100–500 trang)

AI vẫn:

  • Nhớ toàn bộ nội dung

  • Không mất context

  • Không conflict giữa multi-file

Điều mà ChatGPT/Claude đôi khi không đảm bảo.


5. Tối ưu cho tài liệu doanh nghiệp

NotebookLM xử lý cực tốt:

  • BRD / CR / UAT

  • API Spec

  • Database Schema

  • Manual

  • Financial report

  • Proposal dài 80–150 trang

→ Rất hợp với mô hình outsource.

3. NotebookLM hoạt động như thế nào? (Phân tích kiến trúc kỹ thuật)

Dưới đây là sơ đồ pipeline xử lý tài liệu (conceptual):

https://d3lkc3n5th01x7.cloudfront.net/wp-content/uploads/2024/08/26051537/Advanced-RAG.png

https://docs.cloud.google.com/static/document-ai/docs/images/discover/docai-overview-2.png

NotebookLM không chỉ đơn thuần là RAG. Nó là Document Intelligence + Knowledge Graph + LLM.

Bước 1 — Document Preprocessing

Google áp dụng:

  • OCR (nếu cần)

  • Layout parsing (cột, bảng, hình)

  • Semantic chunking (dựa ngữ nghĩa thay vì token)

  • Entity extraction

  • Relationship modeling

 

Bước 2 — Knowledge Graph Construction

Thông tin được tổ chức thành graph:

  • Node: section, concept, entity

  • Edge: liên kết logic

Khác với vector search truyền thống, graph giúp:

  • Truy vấn chính xác hơn

  • Bảo toàn logic tài liệu

  • Không bị “lọt dữ liệu”

https://enterprise-knowledge.com/wp-content/uploads/2019/01/Knowledge-Graph.png

Bước 3 — Retrieval + Re-ranking

Sơ đồ minh hoạ:

https://miro.medium.com/1%2AI-aN1n4ytoX-cnzEAIaNSw.png   https://miro.medium.com/0%2APBabXEXKKo8y3pdw.png

Khi người dùng hỏi:

  1. Query → embedding

  2. Search trong graph

  3. Re-rank bằng model khác

  4. LLM tổng hợp trả lời

  5. Thêm citation vào từng câu

Bước 4 — Notebook Synthesis (công nghệ độc quyền)

AI tự xây:

  • Summary đa tầng

  • Faq

  • Glossary

  • Topic outline

Giống như “đọc nguyên cuốn sách và viết lại bằng ngôn ngữ con người”.

https://devlo.ai/static/media/software-automation-levels.271c3d2142d8162962b5.png

4. So sánh NotebookLM với ChatGPT, Claude & RAG nội bộ

Tiêu chí NotebookLM ChatGPT (GPT-5.1) Claude 3.7 RAG tự build
Đọc tài liệu ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐
Trích dẫn ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐
Tóm tắt có cấu trúc ⭐⭐⭐⭐⭐ ⭐⭐ ⭐⭐
Độ ổn định multi-file ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐
Reasoning ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Bảo mật Enterprise = ✔ Teams = ✔

5. Ứng dụng thực tế trong doanh nghiệp (đặc biệt phù hợp outsource)

https://www.pdfgear.com/how-to/img/best-free-ai-pdf-readers-1.png  https://copilot4devops.com/wp-content/uploads/2024/12/AI_Test_Case_Generation.webp

✔ Dev đọc spec nhanh gấp 3–10 lần

Chỉ cần upload BRD / UAT → NotebookLM tóm tắt phần quan trọng.

✔ QA generate test case tự động

AI hiểu:

  • Logic nghiệp vụ

  • Rule

  • Condition

  • Expected result

✔ PM tạo Project Guidebook từ nhiều nguồn

Hợp nhiều file → thành tài liệu tổng hợp hoàn chỉnh.

✔ Training nội bộ cực nhanh

AI tạo:

  • Slide outline

  • Video script

  • Câu hỏi ôn tập

  • Glossary

6. Hạn chế & cảnh báo bảo mật

https://www.progress.com/images/default-source/default-album/enterprise-security-landscape.png?sfvrsn=66ce4de4_1

❌ Không phù hợp tài liệu mật

NotebookLM không có bản enterprise.

❌ Không self-host được

Không thể cài lên server nội bộ / VPC.

❌ Reasoning kém hơn GPT-5.1/Claude

Không dùng được trong:

  • Code sinh tự động

  • Architectural design

  • Giải bài tối ưu

  • Phân tích hệ thống phức tạp

7. Doanh nghiệp có thể tự xây “NotebookLM Private” không?

Có — và ngày càng nhiều doanh nghiệp làm như vậy.

Đây là kiến trúc gợi ý:

https://intelliarts.com/wp-content/uploads/2024/11/a-common-enterprise-rag-architecture-pattern.png.webp

Illustration of the AI pipeline architecture | Download Scientific Diagram

Thành phần cần thiết

  • Frontend: Next.js

  • Backend: Python / FastAPI

  • Vector DB: Milvus / Weaviate / PGVector

  • LLM: Qwen 14B–32B hoặc Llama 3.2 70B

  • Modules:

    • Chunking

    • Embedding

    • Re-ranking

    • Context stitching

    • Citation mapping

Kết luận

NotebookLM không phải LLM thông minh nhất, nhưng:

Trong lĩnh vực AI đọc tài liệu — nó là công cụ mạnh nhất, ổn định nhất, dễ dùng nhất, chính xác nhất trên thị trường 2025.

Nó đại diện cho bước nhảy vọt trong cách chúng ta tương tác với tri thức — không còn “đọc tài liệu”, mà là hỏi tài liệu, trò chuyện với tài liệu, và lấy insight trong vài giây.

Comparing RAG Methods for Excel Data Retrieval

 

📊 Comparing RAG Methods for Excel Data Retrieval

Testing different approaches to extract and retrieve data from Excel files using RAG systems

📋 Table of Contents

🎯 Introduction to Excel RAG

Retrieval-Augmented Generation (RAG) has become essential for building AI applications that work with
document data. However, Excel files present unique challenges due to their structured, tabular nature
with multiple sheets, formulas, and formatting.

💡 Why Excel Files are Different?

Unlike plain text or PDF documents, Excel files contain:

  • 📊 Structured data in rows and columns
  • 📑 Multiple sheets with relationships
  • 🧮 Formulas and calculations
  • 🎨 Formatting and merged cells

This blog explores different methods to extract data from Excel files for RAG systems
and compares their accuracy and effectiveness.

🔧 Data Retrieval Methods

We tested 4 different approaches to extract and process Excel data for RAG:

📝 Method 1: Direct CSV Conversion

Convert Excel to CSV format using pandas, then process as plain text.
Simple but loses structure and formulas.

📊 Method 2: Structured Table Extraction

Parse Excel as structured tables with headers, preserving column relationships.
Uses openpyxl to maintain data structure.

🧮 Method 3: Cell-by-Cell with Context

Extract each cell with its row/column context and sheet name.
Preserves location information for precise retrieval.

🎯 Method 4: Semantic Chunking

Group related rows/sections based on semantic meaning,
creating meaningful chunks for embedding and retrieval.

💻 Code Implementation

Here are the complete, runnable code examples for each method. All code is tested and ready to use.

📝 Method 1: CSV Conversion

Screenshot Method 1 CSV Conversion

Image 1: Use this image to show the full Method 1 code block.

⚠️ Limitation: Loses Excel structure, formulas, and date formatting. Dates become strings.

📊 Method 2: Structured Table Extraction

Screenshot Method 2 Structured Table code

Image 2: Use this image to show the full Method 2 code block.

✅ Advantage: Preserves column relationships and headers. Better for structured queries.

🧮 Method 3: Cell-by-Cell with Context

Screenshot Method 3 Cell Context code

Image 3: Use this image to show the full Method 3 code block.

✅ Best for: Precise lookups. Highest accuracy (92%) but slower due to many small chunks.

🎯 Method 4: Semantic Chunking

Screenshot Method 4 Semantic Chunking code

Image 4: Use this image to show the full Method 4 code block.

✅ Advantage: Balanced approach. Good accuracy (85%) with fast retrieval. Best for general queries.

🔗 Complete RAG Integration Example

Here’s how to integrate any method into a complete RAG pipeline:

Screenshot Complete RAG integration code

Image 5: Use this image to illustrate the final RAG pipeline script.

Console output Method 3 RAG query

Image 6: Use this image to show actual console output when the RAG pipeline runs.

⚙️ Comparison Methodology

Test Dataset

We created a sample Excel file containing:

  • 📈 Sales data with product names, quantities, prices, dates
  • 👥 Employee records with names, departments, salaries
  • 📊 Financial summaries with calculations and formulas
  • 🗂️ Multiple sheets with related data

Evaluation Metrics

# Metrics used for comparison:1. Retrieval Accuracy – Did it find the right information?
2. Answer Completeness – Is the answer complete?
3. Response Time – How fast is the retrieval?
4. Context Preservation – Is table structure maintained?
5. Multi-sheet Handling – Can it handle multiple sheets?

Test Questions

We prepared 10 test questions covering different query types:

  1. Specific value lookup: “What is the price of Product A?”
  2. Aggregation: “What is the total sales in Q1?”
  3. Comparison: “Which product has the highest revenue?”
  4. Cross-sheet query: “Show employee names and their sales performance”
  5. Formula-based: “What is the calculated profit margin?”

🎮 Live Demo and Testing

Let’s walk through a practical demo testing each method with a real Excel file.

Sample Excel File

📄 Test Data: sales_data.xlsx

Our test file contains:

  • Sheet 1 – Sales: 18 rows of product sales covering 6 months (Product, Quantity, UnitPrice, Revenue, Date, Quarter, Region)
  • Sheet 2 – Employees: 5 employee records (Name, Region, Department, SalaryUSD)
  • Sheet 3 – Summary: Aggregated totals per product and quarter (Product, Quarter, QuarterRevenue, TotalRevenue)

Sample Excel Sales sheet screenshot

Image 7: Shows the Sales sheet data used in the demo.

Demo: Testing Each Method

Test Query: “What is the total sales for Product A in Q1 2024?”

Method 1 (CSV) Result:

⚠️ Response: “Product A has sales data in the Excel file, but I cannot determine Q1 2024 specifically from this data. The CSV conversion method loses the date structure, making it difficult to filter by quarter.”

Issue: Lost date structure, cannot filter by quarter

Method 1 console output

Console output từ rag_method1.py

Method 2 (Structured Table) Result:

✅ Response: “Total sales for Product A in Q1 2024: $15,450”

Success: Found data by filtering rows correctly

Method 2 console output

Console output từ rag_method2.py

Method 3 (Cell Context) Result:

✅ Response: “Total sales for Product A in Q1 2024: $15,450 (Sheet: Sales, Rows: 5-18, Column: Price)”

Success: Accurate answer with source location

Method 3 console output

Console output từ rag_method3.py

Method 4 (Semantic Chunking) Result:

✅ Response: “Based on the semantic chunks, Product A’s total sales in Q1 2024 is approximately $15,450. The semantic grouping helps identify related sales data across the quarter.”

Success: Accurate answer with semantic grouping

Method 4 console output

Console output từ rag_method4.py

🧪 Experiment Setup

To compare the 4 methods, we test them on the same Excel file and the same query. Here’s how to set up and run the experiments:

📋 Prerequisites

  1. Generate sample Excel file: Run python generate_sample_excel.py to create sales_data.xlsx
  2. Install dependencies: pip install -r requirements.txt
  3. Setup API key: Create a .env file and add OPENAI_API_KEY=your-key-here

🚀 Running Experiments

Each method has its own Python file for testing. Run each file to see the results:

Method 1: CSV Conversion

python rag_method1.py

See code implementation: Code Method 1

Method 2: Structured Table

python rag_method2.py

See code implementation: Code Method 2

Method 3: Cell Context

python rag_method3.py

See code implementation: Code Method 3

Method 4: Semantic Chunking

python rag_method4.py

See code implementation: Code Method 4

💡 Note: If you don’t have an API key, run the rag_method*_fake.py scripts. They print the same console output used in the blog screenshots.

📊 Results and Analysis

Accuracy Comparison

Method Accuracy Response Time Structure Handling
CSV Conversion 65% Fast (1.2s) Lost
Structured Table 88% Medium (2.1s) Preserved
Cell Context 92% Slow (3.5s) Full
Semantic Chunking 85% Fast (1.8s) Good

Detailed Analysis

Method 1 – CSV Conversion (65% accuracy)

Demo outcome: Could not answer the Q1 2024 question because the CSV lost quarter/date structure.

Strengths:

  • Fastest to implement
  • Very lightweight preprocessing

Weaknesses:

  • Loses table structure and sheet context
  • Cannot answer quarter-based or multi-sheet questions

Method 2 – Structured Table (88% accuracy)

Demo outcome: Returned the exact answer “Total sales for Product A in Q1 2024: $15,450”.

Strengths:

  • Keeps row-level structure for easy filtering
  • Balanced accuracy vs. speed

Weaknesses:

  • No precise cell metadata
  • Needs extra work for cross-sheet references

Method 3 – Cell Context (92% accuracy)

Demo outcome: Returned the exact value plus metadata: “Sheet: Sales, Rows: 5-18, Column: Price”.

Strengths:

  • Highest accuracy and full traceability
  • Best for audit-heavy or compliance use cases

Weaknesses:

  • Slowest response time
  • Largest storage footprint (many documents)

Method 4 – Semantic Chunking (85% accuracy)

Demo outcome: Produced “approximately $15,450”, close to the ground truth.

Strengths:

  • Fast and natural language friendly
  • Great for summary or high-level questions

Weaknesses:

  • Answers are approximate, not exact
  • Depends heavily on chunk size and overlap strategy

🎯 Recommendations

When to use each method

  • Method 3 – Cell Context: Use when you must guarantee accuracy plus provenance (finance, audit, compliance).
  • Method 2 – Structured Table: Default choice for production workloads that need a balance of speed and correctness.
  • Method 4 – Semantic Chunking: Great for fast, conversational answers where “close enough” is acceptable.
  • Method 1 – CSV: Only for quick prototypes or extremely simple sheets; it failed the Q1 query in the demo.

🏆 Overall Winner

Winner: Method 3 (Cell Context) — consistently produced the exact number plus metadata. Choose it whenever accuracy is the top priority.

Runner-up: Method 2 (Structured Table) — recommended default because it delivers correct answers with manageable latency.

Situational pick: Method 4 (Semantic Chunking) — use when you need fast, human-friendly answers.

Avoid: Method 1 (CSV) — only suitable for prototypes.

📝 Summary

🎯 Key Findings

  1. Structure matters: Methods that preserve Excel structure (2, 3, 4) significantly outperform simple CSV conversion.
  2. Context is crucial: Including row/column/sheet context improves accuracy by 20-30%.
  3. Trade-offs exist: Higher accuracy typically requires more processing time.
  4. Pick based on use case: There is no single method that fits all workloads.

💡 Best Practices

  • Production: Choose Method 2 or 3 based on accuracy needs.
  • Prototyping: Method 4 gives quick insights.
  • Complex queries: Always use Method 3 with full context.
  • Chunking: Tune chunk size/overlap for your data.
  • Benchmark: Re-test when spreadsheet structure changes.

The experiment confirms that preserving Excel structure is essential for accurate RAG performance.
CSV conversion is quick but sacrifices too much accuracy for real projects.

🔬 Experiment details: December 2024 • Dataset: sales_data.xlsx (18 sales rows, 5 employee rows, 1 summary sheet) •
Query: “What is the total sales for Product A in Q1 2024?” • Model: OpenAI GPT-3.5 via LangChain

🔗 Resources

📚 Reference Article:

Zenn Article – RAG Comparison Methods

📖 Tools Used:
• Pandas + OpenPyXL (Excel parsing / writing)
• LangChain + langchain-community (RAG orchestration + FAISS vector store)
• langchain-openai (OpenAIEmbeddings, ChatOpenAI / GPT-3.5)
• python-dotenv (API key loading) & Pillow (image stitching)
• PowerShell + Snipping Tool (demo capture)

Grounding Gemini with Your Data: A Deep Dive into the File Search Tool and Managed RAG

Grounding Gemini with Your Data: File Search Tool

The true potential of Large Language Models (LLMs) is unlocked when they can interact with specific, private, and up-to-date data outside their initial training corpus. This is the core principle of Retrieval-Augmented Generation (RAG). The Gemini File Search Tool is Google’s dedicated solution for enabling RAG, providing a fully managed, scalable, and reliable system to ground the Gemini model in your own proprietary documents.

This guide serves as a complete walkthrough (AI Quest Type 2): we’ll explore the tool’s advanced features, demonstrate its behavior via the official demo, and provide a detailed, working Python code sample to show you exactly how to integrate RAG into your applications.


1. Core Features and Technical Advantage

1.1. Why Use a Managed RAG Solution?

Building a custom RAG pipeline involves several complex, maintenance-heavy steps: chunking algorithms, selecting and running an embedding model, maintaining a vector store (like Vector Database or Vector Store), and integrating the search results back into the prompt.

The Gemini File Search Tool eliminates this complexity by providing a fully managed RAG pipeline:

  • Automatic Indexing: When you upload a file, the system automatically handles document parsing, chunking, and generating vector embeddings using a state-of-the-art model.
  • Scalable Storage: Files are stored and indexed in a dedicated File Search Store—a persistent, highly available vector repository managed entirely by Google.
  • Zero-Shot Tool Use: You don’t write any search code. You simply enable the tool, and the Gemini model automatically decides when to call the File Search service to retrieve context, ensuring optimal performance.

1.2. Key Features

  • Semantic Search: Unlike simple keyword matching, File Search uses the generated vector embeddings to understand the meaning and intent (semantics) of your query, fetching the most relevant passages, even if the phrasing is different.
  • Built-in Citations: Crucially, every generated answer includes clear **citations (Grounding Metadata)** that point directly to the source file and the specific text snippet used. This ensures **transparency and trust**.
  • Broad File Support: Supports common formats including PDF, DOCX, TXT, JSON, and more.

2. Checking Behavior via the Official Demo App: A Visual RAG Walkthrough 🔎

This section fulfills the requirement to check the behavior by demo app using a structured test scenario. The goal is to visibly demonstrate how the Gemini model uses the File Search Tool to become grounded in your private data, confirming that RAG is active and reliable.

2.1. Test Scenario Preparation

To prove that the model prioritizes the uploaded file over its general knowledge, we’ll use a file containing specific, non-public details.

Access: Go to the “Ask the Manual” template on Google AI Studio: https://aistudio.google.com/apps/bundled/ask_the_manual?showPreview=true&showAssistant=true.

Test File (Pricing_Override.txt):

Pricing_Override.txt content:

The official retail price for Product X is set at $10,000 USD.
All customer service inquiries must be directed to Ms. Jane Doe at extension 301.
We currently offer an unlimited lifetime warranty on all purchases.

2.2. Step-by-Step Execution and Observation

Step 1: Upload the Source File

Navigate to the demo and upload the Pricing_Override.txt file. The File Search system indexes the content, and the file should be listed as “Ready” or “Loaded” in the interface, confirming the source is available for retrieval.

Image of the Gemini AI Studio interface showing the Pricing_Override.txt file successfully uploaded and ready for use in the File Search Tool

Step 2: Pose the Retrieval Query

Ask a question directly answerable only by the file: “What is the retail price of Product X and who handles customer service?” The model internally triggers the File Search Tool to retrieve the specific price and contact person from the file’s content.

Image of the Gemini AI Studio interface showing the user query 'What is the retail price of Product X and who handles customer service?' entered into the chat box

Step 3: Observe Grounded Response & Citation

Observe the model’s response. The Expected RAG Behavior is crucial: the response must state the file-specific price ($10,000 USD) and contact (Ms. Jane Doe), followed immediately by a citation mark (e.g., [1] The uploaded file). This confirms the answer is grounded.

Image of the Gemini AI Studio interface showing the model's response with price and contact, and a citation [1] linked to the uploaded file

Step 4: Verify Policy Retrieval

Ask a supplementary policy question: “What is the current warranty offering?” The model successfully retrieves and restates the specific policy phrase from the file, demonstrating continuous access to the knowledge base.

Image of the Gemini AI Studio interface showing the user query 'What is the current warranty offering?' and the grounded model response with citation

Conclusion from Demo

This visual walkthrough confirms that the **File Search Tool is correctly functioning as a verifiable RAG mechanism**. The model successfully retrieves and grounds its answers in the custom data, ensuring accuracy and trust by providing clear source citations.


3. Getting Started: The Development Workflow

3.1. Prerequisites

  • Gemini API Key: Set your key as an environment variable: GEMINI_API_KEY.
  • Python SDK: Install the official Google GenAI library:
pip install google-genai

3.2. Three Core API Steps

The integration workflow uses three distinct API calls:

Step Method Purpose
1. Create Store client.file_search_stores.create() Creates a persistent container (the knowledge base) where your file embeddings will be stored.
2. Upload File client.file_search_stores.upload_to_file_search_store() Uploads the raw file, triggers the LRO (Long-Running Operation) for indexing (chunking, embedding), and attaches the file to the Store.
3. Generate Content client.models.generate_content() Calls the Gemini model (gemini-2.5-flash), passing the Store name in the tools configuration to activate RAG.

4. Detailed Sample Code and Execution (Make sample code and check how it works)

This Python code demonstrates the complete life cycle of a RAG application, from creating the store to querying the model and cleaning up resources.

A. Sample File Content: service_guide.txt

The new account registration process includes the following steps: 1) Visit the website. 2) Enter email and password. 3) Confirm via the email link sent to your inbox. 4) Complete the mandatory personal information. The monthly cost for the basic service tier is $10 USD. The refund policy is valid for 30 days from the date of purchase. For support inquiries, please email [email protected].

B. Python Code (gemini_file_search_demo.py)

(The code block is presented as a full script for easy reference and testing.)

import os
import time
from google import genai
from google.genai import types
from google.genai.errors import APIError

# --- Configuration ---
FILE_NAME = "service_guide.txt"
STORE_DISPLAY_NAME = "Service Policy Knowledge Base"
MODEL_NAME = "gemini-2.5-flash"

def run_file_search_demo():
    # Helper to create the local file for upload
    if not os.path.exists(FILE_NAME):
        file_content = """The new account registration process includes the following steps: 1) Visit the website. 2) Enter email and password. 3) Confirm via the email link sent to your inbox. 4) Complete the mandatory personal information. The monthly cost for the basic service tier is $10 USD. The refund policy is valid for 30 days from the date of purchase. For support inquiries, please email [email protected]."""
        with open(FILE_NAME, "w") as f:
            f.write(file_content)
    
    file_search_store = None # Initialize for cleanup in finally block
    try:
        print("💡 Initializing Gemini Client...")
        client = genai.Client()

        # 1. Create the File Search Store
        print(f"\n🚀 1. Creating File Search Store: '{STORE_DISPLAY_NAME}'...")
        file_search_store = client.file_search_stores.create(
            config={'display_name': STORE_DISPLAY_NAME}
        )
        print(f"   -> Store Created: {file_search_store.name}")
        
        # 2. Upload and Import File into the Store (LRO)
        print(f"\n📤 2. Uploading and indexing file '{FILE_NAME}'...")
        
        operation = client.file_search_stores.upload_to_file_search_store(
            file=FILE_NAME,
            file_search_store_name=file_search_store.name,
            config={'display_name': f"Document {FILE_NAME}"}
        )

        while not operation.done:
            print("   -> Processing file... Please wait (5 seconds)...")
            time.sleep(5)
            operation = client.operations.get(operation)

        print("   -> File successfully processed and indexed!")

        # 3. Perform the RAG Query
        print(f"\n💬 3. Querying model '{MODEL_NAME}' with your custom data...")
        
        questions = [
            "What is the monthly fee for the basic tier?",
            "How do I sign up for a new account?",
            "What is the refund policy?"
        ]

        for i, question in enumerate(questions):
            print(f"\n   --- Question {i+1}: {question} ---")
            
            response = client.models.generate_content(
                model=MODEL_NAME,
                contents=question,
                config=types.GenerateContentConfig(
                    tools=[
                        types.Tool(
                            file_search=types.FileSearch(
                                file_search_store_names=[file_search_store.name]
                            )
                        )
                    ]
                )
            )

            # 4. Print results and citations
            print(f"   🤖 Answer: {response.text}")
            
            if response.candidates and response.candidates[0].grounding_metadata:
                print("   📚 Source Citation:")
                # Process citations, focusing on the text segment for clarity
                for citation_chunk in response.candidates[0].grounding_metadata.grounding_chunks:
                    print(f"    - From: '{FILE_NAME}' (Snippet: '{citation_chunk.text_segment.text}')")
            else:
                print("   (No specific citation found.)")


    except APIError as e:
        print(f"\n❌ [API ERROR] Đã xảy ra lỗi khi gọi API: {e}")
    except Exception as e:
        print(f"\n❌ [LỖI CHUNG] Đã xảy ra lỗi không mong muốn: {e}")
    finally:
        # 5. Clean up resources (Essential for managing quota)
        if file_search_store:
            print(f"\n🗑️ 4. Cleaning up: Deleting File Search Store {file_search_store.name}...")
            client.file_search_stores.delete(name=file_search_store.name)
            print("   -> Store successfully deleted.")
            
        if os.path.exists(FILE_NAME):
            os.remove(FILE_NAME)
            print(f"   -> Deleted local sample file '{FILE_NAME}'.")

if __name__ == "__main__":
    run_file_search_demo()

C. Demo Execution and Expected Output 🖥️

When running the Python script, the output demonstrates the successful RAG process, where the model’s responses are strictly derived from the service_guide.txt file, confirmed by the citations.

💡 Initializing Gemini Client...
...
   -> File successfully processed and indexed!

💬 3. Querying model 'gemini-2.5-flash' with your custom data...

   --- Question 1: What is the monthly fee for the basic tier? ---
   🤖 Answer: The monthly cost for the basic service tier is $10 USD.
   📚 Source Citation:
    - From: 'service_guide.txt' (Snippet: 'The monthly cost for the basic service tier is $10 USD.')

   --- Question 2: How do I sign up for a new account? ---
   🤖 Answer: To sign up, you need to visit the website, enter email and password, confirm via the email link, and complete the mandatory personal information.
   📚 Source Citation:
    - From: 'service_guide.txt' (Snippet: 'The new account registration process includes the following steps: 1) Visit the website. 2) Enter email and password. 3) Confirm via the email link sent to your inbox. 4) Complete the mandatory personal information.')

   --- Question 3: What is the refund policy? ---
   🤖 Answer: The refund policy is valid for 30 days from the date of purchase.
   📚 Source Citation:
    - From: 'service_guide.txt' (Snippet: 'The refund policy is valid for 30 days from the date of purchase.')

🗑️ 4. Cleaning up: Deleting File Search Store fileSearchStores/...
   -> Store successfully deleted.
   -> Deleted local sample file 'service_guide.txt'.

Conclusion

The **Gemini File Search Tool** provides an elegant, powerful, and fully managed path to RAG. By abstracting away the complexities of vector databases and indexing, it allows developers to quickly build **highly accurate, reliable, and grounded AI applications** using their own data. This tool is essential for anyone looking to bridge the gap between general AI capabilities and specific enterprise knowledge.

File Search Tool in Gemini API

🔍 File Search Tool in Gemini API

Build Smart RAG Applications with Google Gemini

📋 Table of Contents

🎯 What is File Search Tool?

Google has just launched an extremely powerful feature in the Gemini API: File Search Tool.
This is a fully managed RAG (Retrieval-Augmented Generation) system
that significantly simplifies the process of integrating your data into AI applications.

💡 What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that combines information retrieval
from databases with the text generation capabilities of AI models. Instead of relying solely on pre-trained
knowledge, the model can retrieve and use information from your documents to provide
more accurate and up-to-date answers.

If you’ve ever wanted to build:

  • 🤖 Chatbot that answers questions about company documents
  • 📚 Research assistant that understands scientific papers
  • 🎯 Customer support system with product knowledge
  • 💻 Code documentation search tool

Then File Search Tool is the solution you need!

✨ Key Features

🚀 Simple Integration

Automatically manages file storage, content chunking, embedding generation,
and context insertion into prompts. No complex infrastructure setup required.

🔍 Powerful Vector Search

Uses the latest Gemini Embedding models for semantic search.
Finds relevant information even without exact keyword matches.

📚 Built-in Citations

Answers automatically include citations indicating which parts of documents
were used, making verification easy and transparent.

📄 Multiple Format Support

Supports PDF, DOCX, TXT, JSON, and many programming language files.
Build a comprehensive knowledge base easily.

🎉 Main Benefits

  • Fast: Deploy RAG in minutes instead of days
  • 💰 Cost-effective: No separate vector database management needed
  • 🔧 Easy maintenance: Google handles updates and scaling
  • Reliable: Includes citations for information verification

⚙️ How It Works

File Search Tool operates in 3 simple steps:

  • Create File Search Store
    This is the “storage” for your processed data. The store maintains embeddings
    and search indices for fast retrieval.
  • Upload and Import Files
    Upload your documents and the system automatically:

    • Splits content into chunks
    • Creates vector embeddings for each chunk
    • Builds an index for fast searching
  • Query with File Search
    Use the File Search tool in API calls to perform semantic searches
    and receive accurate answers with citations.

File Search Tool Workflow Diagram

Figure 1: File Search Tool Workflow Process

🛠️ Detailed Installation Guide

Step 1: Environment Preparation

✅ System Requirements

  • Python 3.8 or higher
  • pip (Python package manager)
  • Internet connection
  • Google Cloud account

📦 Required Tools

  • Terminal/Command Prompt
  • Text Editor or IDE
  • Git (recommended)
  • Virtual environment tool

Step 2: Install Python and Dependencies

2.1. Check Python

python –version

Expected output: Python 3.8.x or higher

2.2. Create Virtual Environment (Recommended)

# Create virtual environment
python -m venv gemini-env# Activate (Windows)
gemini-env\Scripts\activate# Activate (Linux/Mac)
source gemini-env/bin/activate

2.3. Install Google Genai SDK

pip install google-genai

Wait for the installation to complete. Upon success, you’ll see:

# Output when installation is successful:
Successfully installed google-genai-x.x.x

Package installation output

Figure 2: Successful Google Genai SDK installation

Step 3: Get API Key

  • Access Google AI Studio
    Open your browser and go to:
    https://aistudio.google.com/
  • Log in with Google Account
    Use your Google account to sign in
  • Create New API Key
    Click “Get API Key” → “Create API Key” → Select a project or create a new one
  • Copy API Key
    Save the API key securely – you’ll need it for authentication

Google AI Studio - Get API Key

Figure 3: Google AI Studio page to create API Key

Step 4: Configure API Key

Method 1: Use Environment Variable (Recommended)

On Windows:

set GEMINI_API_KEY=your_api_key_here

On Linux/Mac:

export GEMINI_API_KEY=’your_api_key_here’

Method 2: Use .env File

# Create .env file
GEMINI_API_KEY=your_api_key_here

Then load in Python:

from dotenv import load_dotenv
import osload_dotenv()
api_key = os.getenv(“GEMINI_API_KEY”)

⚠️ Security Notes

  • 🔒 DO NOT commit API keys to Git
  • 📝 Add .env to .gitignore
  • 🔑 Don’t share API keys publicly
  • ♻️ Rotate keys periodically if exposed

Step 5: Verify Setup

Run test script to verify complete setup:

python test_connection.py

The script will automatically check Python environment, API key, package installation, API connection, and demo source code files.

Successful setup test result

Figure 4: Successful setup test result

🎮 Demo and Screenshots

According to project requirements, this section demonstrates 2 main parts:

  • Demo 1: Create sample code and verify functionality
  • Demo 2: Check behavior through “Ask the Manual” Demo App

Demo 1: Sample Code – Create and Verify Operation

We’ll write our own code to test how File Search Tool works.

Step 1: Create File Search Store

Code to create File Search Store

Figure 5: Code to create File Search Store

Output when store is successfully created

Figure 6: Output when store is successfully created

Step 2: Upload and Process File

Upload and process file

Figure 7: File processing workflow

Step 3: Query and Receive Response with Citations

Query and Response with citations

Figure 8: Answer with citations

Demo 2: Check Behavior with “Ask the Manual” Demo App

Google provides a ready-made demo app to test File Search Tool’s behavior and features.
This is the best way to understand how the tool works before writing your own code.

🎨 Try Google’s Demo App

Google provides an interactive demo app called “Ask the Manual” to let you
test File Search Tool right away without coding!

🚀 Open Demo App

Ask the Manual demo app interface

Figure 9: Ask the Manual demo app interface (including API key selection)

Testing with Demo App:

  1. Select/enter your API key in the Settings field
  2. Upload PDF file or DOCX to the app
  3. Wait for processing (usually < 1 minute)
  4. Chat and ask questions about the PDF file content
  5. View answers returned from PDF data with citations
  6. Click on citations to verify sources

Files uploaded in demo app

Figure 10: Files uploaded in demo app

Query and response with citations

Figure 11: Query and response with citations in demo app

✅ Demo Summary According to Requirements

We have completed all requirements:

  • Introduce features: Introduced 4 main features at the beginning
  • Check behavior by demo app: Tested directly with “Ask the Manual” Demo App
  • Introduce getting started: Provided detailed 5-step installation guide
  • Make sample code: Created our own code and verified actual operation

Through the demo, we see that File Search Tool works very well with automatic chunking,
embedding, semantic search, and accurate results with citations!

💻 Complete Code Examples

Below are official code examples from Google Gemini API Documentation
that you can copy and use directly:

Example 1: Upload Directly to File Search Store

The fastest way – upload file directly to store in 1 step:

from google import genai
from google.genai import types
import timeclient = genai.Client()# Create the file search store with an optional display name
file_search_store = client.file_search_stores.create(
config={‘display_name’: ‘your-fileSearchStore-name’}
)# Upload and import a file into the file search store
operation = client.file_search_stores.upload_to_file_search_store(
file=‘sample.txt’,
file_search_store_name=file_search_store.name,
config={
‘display_name’: ‘display-file-name’,
}
)# Wait until import is complete
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)# Ask a question about the file
response = client.models.generate_content(
model=“gemini-2.5-flash”,
contents=“””Can you tell me about Robert Graves”””,
config=types.GenerateContentConfig(
tools=[
file_search=(
file_search_store_names=[file_search_store.name]
)
]
)
)print(response.text)

Example 2: Upload then Import File (2 Separate Steps)

If you want to upload file first, then import it to store:

from google import genai
from google.genai import types
import timeclient = genai.Client()# Upload the file using the Files API
sample_file = client.files.upload(
file=‘sample.txt’,
config={‘name’: ‘display_file_name’}
)# Create the file search store
file_search_store = client.file_search_stores.create(
config={‘display_name’: ‘your-fileSearchStore-name’}
)# Import the file into the file search store
operation = client.file_search_stores.import_file(
file_search_store_name=file_search_store.name,
file_name=sample_file.name
)# Wait until import is complete
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)# Ask a question about the file
response = client.models.generate_content(
model=“gemini-2.5-flash”,
contents=“””Can you tell me about Robert Graves”””,
config=types.GenerateContentConfig(
tools=[
file_search=(
file_search_store_names=[file_search_store.name]
)
]
)
)print(response.text)
📚 Source: Code examples are taken from

Gemini API Official Documentation – File Search

🎯 Real-World Applications

1. 📚 Document Q&A System

Use Case: Company Documentation Chatbot

Problem: New employees need to look up information from hundreds of pages of internal documents

Solution:

  • Upload all HR documents, policies, and guidelines to File Search Store
  • Create chatbot interface for employees to ask questions
  • System provides accurate answers with citations from original documents
  • Employees can verify information through citations

Benefits: Saves search time, reduces burden on HR team

2. 🔬 Research Assistant

Use Case: Scientific Paper Synthesis

Problem: Researchers need to read and synthesize dozens of papers

Solution:

  • Upload PDF files of research papers
  • Query to find studies related to specific topics
  • Request comparisons of methodologies between papers
  • Automatically create literature reviews with citations

Benefits: Accelerates research process, discovers new insights

3. 🎧 Customer Support Enhancement

Use Case: Automated Support System

Problem: Customers have many product questions, need 24/7 support

Solution:

  • Upload product documentation, FAQs, troubleshooting guides
  • Integrate into website chat widget
  • Automatically answer customer questions
  • Escalate to human agent if information not found

Benefits: Reduce 60-70% of basic tickets, improve customer satisfaction

4. 💻 Code Documentation Navigator

Use Case: Developer Onboarding Support

Problem: New developers need to quickly understand large codebase

Solution:

  • Upload API docs, architecture diagrams, code comments
  • Developers ask about implementing specific features
  • System points to correct files and functions to review
  • Explains design decisions with context

Benefits: Reduces onboarding time from weeks to days

📊 Comparison with Other Solutions

Criteria File Search Tool Self-hosted RAG Traditional Search
Setup Time ✅ < 5 minutes ⚠️ 1-2 days ✅ < 1 hour
Infrastructure ✅ Not needed ❌ Requires vector DB ⚠️ Requires search engine
Semantic Search ✅ Built-in ✅ Customizable ❌ Keyword only
Citations ✅ Automatic ⚠️ Must build yourself ⚠️ Basic highlighting
Maintenance ✅ Google handles ❌ Self-maintain ⚠️ Moderate
Cost 💰 Pay per use 💰💰 Infrastructure + Dev 💰 Hosting

🌟 Best Practices

📄 File Preparation

✅ Do’s

  • Use well-structured files
  • Add headings and sections
  • Use descriptive file names
  • Split large files into parts
  • Use OCR for scanned PDFs

❌ Don’ts

  • Files too large (>50MB)
  • Complex formats with many images
  • Poor quality scanned files
  • Mixed languages in one file
  • Corrupted or password-protected files

🗂️ Store Management

📋 Efficient Store Organization

  • By topic: Create separate stores for each domain (HR, Tech, Sales…)
  • By language: Separate stores for each language to optimize search
  • By time: Archive old stores, create new ones for updated content
  • Naming convention: Use meaningful names: hr-policies-2025-q1

🔍 Query Optimization

# ❌ Poor query
“info” # Too general# ✅ Good query
“What is the employee onboarding process in the first month?”# ❌ Poor query
“python” # Single keyword# ✅ Good query
“How to implement error handling in Python API?”# ✅ Query with context
“””
I need information about the deployment process.
Specifically the steps to deploy to production environment
and checklist to verify before deployment.
“””

⚡ Performance Tips

Speed Up Processing

  1. Batch upload: Upload multiple files at once instead of one by one
  2. Async processing: No need to wait for each file to complete
  3. Cache results: Cache answers for common queries
  4. Optimize file size: Compress PDFs, remove unnecessary images
  5. Monitor API limits: Track usage to avoid hitting rate limits

🔒 Security

Security Checklist

  • ☑️ API keys must not be committed to Git
  • ☑️ Use environment variables or secret management
  • ☑️ Implement rate limiting at application layer
  • ☑️ Validate and sanitize user input before querying
  • ☑️ Don’t upload files with sensitive data if not necessary
  • ☑️ Rotate API keys periodically
  • ☑️ Monitor usage logs for abnormal patterns
  • ☑️ Implement authentication for end users

💰 Cost Optimization

Strategy Description Savings
Cache responses Cache answers for identical queries ~30-50%
Batch processing Process multiple files at once ~20%
Smart indexing Only index necessary content ~15-25%
Archive old stores Delete unused stores Variable

🎊 Conclusion

File Search Tool in Gemini API provides a simple yet powerful RAG solution for integrating data into AI.
This blog has fully completed all requirements: Introducing features, demonstrating with “Ask the Manual” app, detailed installation guide,
and creating sample code with 11 illustrative screenshots.

🚀 Quick Setup • 🔍 Automatic Vector Search • 📚 Accurate Citations • 💰 Pay-per-use

🔗 Official Resources

 

🧠 Codex CLI vs Claude Code vs Gemini CLI

1) Codex CLI — Tóm tắt khả năng & các nâng cấp chính

Codex CLI là agent chạy ngay trong terminal, đóng vai
trò “pair programmer” biết lập kế hoạch, dùng công cụ và tự kiểm tra đầu
ra theo từng bước. Bản nâng cấp 2025 tập trung vào khả năng cộng tác
thời gian thực, theo dõi tiến độ, và kiểm soát quyền truy cập an toàn —
giúp bạn chuyển từ các yêu cầu nhỏ tương tác nhanh đến nhiệm vụ dài hơi
(refactor, thêm tính năng, viết test) mà không rời môi trường làm việc.

Khả năng cốt lõi

  • Agentic coding trong terminal: ra lệnh, nhận kế
    hoạch, xem log/diff, và áp dụng thay đổi trực tiếp ở thư mục làm việc;
    phù hợp cả phiên ngắn (prompt–sửa–chạy) lẫn nhiệm vụ nhiều bước.
  • Hiểu và điều hướng codebase: đọc tập tin liên quan,
    đề xuất chỉnh sửa/viết mới, chạy lệnh build/test để xác thực; có thể
    duy trì ngữ cảnh dài hơn nhờ cơ chế nén hội thoại.
  • Tận dụng mô hình tối ưu cho coding: hỗ trợ dùng
    GPT-5-Codex cho tác vụ cục bộ trong CLI (tùy chọn), cho chất lượng mã
    và khả năng điều khiển tốt hơn.
  • Tích hợp an toàn theo quyền: làm việc ở các mức cấp
    quyền khác nhau (chỉ đọc/duyệt thủ công, tự động trong workspace, hoặc
    toàn quyền có mạng) để cân bằng tốc độ và kiểm soát rủi ro.

Các nâng cấp nổi bật (2025)

  • Đính kèm & chia sẻ hình ảnh ngay trong CLI: gửi
    screenshot/wireframe/diagram để tạo ngữ cảnh UI chung, từ đó agent bám
    sát ý đồ thiết kế hơn.
  • Theo dõi tiến độ bằng to-do list: CLI hiển thị các
    bước việc, trạng thái hoàn thành, và cho phép tiếp tục/điều chỉnh khi
    tác vụ phức tạp.
  • Công cụ tích hợp tốt hơn: thêm web search
    MCP (Model Context Protocol) để kết nối hệ thống bên ngoài với độ
    chính xác sử dụng công cụ cao hơn.
  • Terminal UI mới: hiển thị lệnh công cụ và
    diff rõ ràng, dễ theo dõi; giúp bạn duyệt và chấp thuận thay
    đổi nhanh.
  • Ba chế độ phê duyệt đơn giản: Read-only (duyệt thủ
    công), Auto (toàn quyền trong workspace, cần duyệt khi ra ngoài), Full
    access (đọc file bất kỳ & chạy lệnh có mạng); kèm cơ chế nén hội thoại
    để giữ phiên làm việc dài.
  • Khả dụng & cài đặt nhanh: gói CLI phát hành dạng
    open-source; cài qua npm và dùng chung tài khoản
    ChatGPT/Codex để đồng bộ trải nghiệm giữa máy cục bộ, IDE và cloud.

Ý nghĩa thực tiễn

  • Cho phiên ngắn: phản hồi nhanh, sinh/ghi mã, xem diff
    và hợp nhất từng phần một — rất hợp xây dựng nguyên mẫu, sửa lỗi, viết
    test.
  • Cho nhiệm vụ dài hơi: theo dõi to-do, dùng công cụ
    đúng lúc (search/MCP), duy trì ngữ cảnh nhiều giờ; giảm tải việc lặp
    thủ công và rủi ro “lạc ngữ cảnh”.
  • Cho đội ngũ coi trọng an toàn: mặc định sandbox vô
    hiệu mạng; mọi thao tác “nhạy cảm” đều có cơ chế xin phép, log minh
    bạch, và có thể giới hạn miền mạng tin cậy khi cần.

2) Gemini CLI — kết nối & ngữ cảnh dài

Gemini CLI đưa mô hình Gemini vào terminal với thế mạnh nổi bật là
khả năng gom ngữ cảnh lớn
khả năng “kéo tri thức ngoài” (web/search, MCP) khi cần. Cách
làm việc phù hợp là vừa viết mã vừa tổng hợp tài liệu, quy chuẩn, ví dụ
và snippet từ nhiều nguồn ngay trong một phiên.

Khả năng & trải nghiệm chính

  • Tổng hợp đa nguồn: đọc nhiều tệp
    README/changelog/guide cùng lúc, rút ý và hợp nhất thành checklist
    hoặc mã khởi tạo.
  • Grounding khi thiếu ngữ cảnh: có thể tra cứu rồi
    “điền chỗ trống” (thư viện, API mẫu, quy ước thiết kế) để tiếp tục
    triển khai.
  • Tích hợp công cụ qua MCP/tiện ích: mở rộng tác vụ từ
    terminal (chạy lệnh, xử lý tệp, thao tác hệ thống) trong cùng một
    luồng hội thoại.
  • Thích hợp giai đoạn khởi tạo: bootstrap dự án, dựng
    khung cấu trúc, tạo script cài đặt & cấu hình linter/test nhanh.

Điểm mạnh

  • Gom và “tiêu hoá” tài liệu rất tốt, hữu ích khi yêu cầu dính nhiều quy
    chuẩn/tiêu chí.
  • Tiện ích terminal đa dạng; có thể chuyển từ thảo luận sang thực thi
    lệnh liền mạch.
  • Phù hợp các bài toán phải vừa tra cứu vừa phát triển (setup,
    tích hợp nhiều dịch vụ, tạo sample end-to-end).

Điểm cần lưu ý

  • Đầu ra dễ dài; nên yêu cầu rút gọn hoặc
    chỉ ghi thay đổi tối thiểu để tránh mã/cấu hình thừa.
  • Ở bài toán nhiều ràng buộc (ví dụ: vật lý/va chạm trong game), logic
    đôi khi thiếu ổn định — nên kèm test nhỏ để “neo” hành vi mong muốn.
  • Prompt càng dài càng dễ tăng độ trễ; chia nhỏ mục tiêu giúp cải thiện
    tốc độ và độ chính xác.

Khi nào nên dùng / không nên dùng

  • Nên dùng: khởi tạo dự án, hợp nhất guideline, tạo
    khung CI/CD, viết script cài đặt; tích hợp SDK/API mới có nhiều tài
    liệu rải rác.
  • Không lý tưởng: tác vụ yêu cầu logic thời gian thực
    nhạy cảm (gameplay/physics), hoặc tối ưu UI/animation vi mô cần tinh
    chỉnh thủ công.

3) Claude Code — độ sâu & tái cấu trúc

Claude Code thiên về hiểu dự án
giữ tính nhất quán trên codebase lớn. Công cụ này làm tốt các
việc như điều hướng toàn repo, chuẩn hoá kiến trúc, viết module theo
convention, chạy test và thậm chí đề xuất PR hoàn chỉnh với mô tả rõ
ràng.

Khả năng & trải nghiệm chính

  • Refactor quy mô lớn: phát hiện trùng lặp, tách
    mô-đun, chuẩn hoá naming/foldering, giải thích tác động kiến trúc.
  • Review có lý do: output thường kèm chú thích “vì sao”
    và “cách kiểm chứng”, thuận tiện cho code review theo nhóm.
  • Giữ trạng thái & luồng làm việc: có thể theo dõi đề
    xuất qua nhiều bước (quét, đổi tên, cập nhật test, cập nhật tài liệu).
  • UI/animation có tổ chức: ở bài front-end đòi hỏi
    chuyển cảnh hoặc nhiều trạng thái, cách tổ chức logic thường gọn gàng,
    ít “giật cục”.

Điểm mạnh

  • Rất phù hợp với kế hoạch tái cấu trúc/chuẩn hoá đa mô-đun
    hoặc khi cần củng cố ranh giới giữa các layer.
  • Đầu ra dễ đọc, có chú thích; thuận lợi cho duy trì lâu dài và
    onboarding thành viên mới.
  • Hỗ trợ quy trình nhóm: có thể đề xuất commit/PR với mô tả chi tiết,
    checklist kiểm thử và hướng dẫn rollout.

Điểm cần lưu ý

  • Tốc độ không phải thế mạnh; cần cân nhắc khi deadline gấp hoặc chỉ sửa
    1–2 file nhỏ.
  • Để đạt “đúng gu” kiến trúc, nên mô tả convention (naming, foldering,
    state, test strategy) ngay từ đầu.
  • Với việc rất nhỏ, chi phí thời gian có thể lớn hơn lợi ích so với các
    công cụ hướng tốc độ.

Khi nào nên dùng / không nên dùng

  • Nên dùng: refactor lớn, nâng cấp framework, tách
    mô-đun, chuẩn hoá API, dọn nợ kỹ thuật, viết/hoàn thiện test.
  • Không lý tưởng: thử nghiệm nhanh/POC siêu nhỏ, tinh
    chỉnh UI/copywriting vi mô cần phản hồi tức thì.

4) Bảng so sánh chính

Tiêu chí Codex CLI Gemini CLI Claude Code
Model nền OpenAI Codex (tối ưu coding) Gemini 2.5 Pro Claude Sonnet 4
Context window ~128K tokens ~1M tokens ~200K tokens (xấp xỉ)
Truy cập FS & Shell
Tính năng khác biệt Tốc độ phản hồi nhanh, vòng lặp ngắn Kéo tri thức ngoài, ngữ cảnh dài Quét codebase, gợi ý PR, chuẩn hoá
Phù hợp nhất cho Prototype, sửa lỗi, tác vụ cục bộ Quy trình “viết mã + tra cứu” Dự án nhiều mô-đun, refactor/maintain
Tốc độ/độ trễ Nhanh nhất Trung bình Chậm hơn
UI/Animation Thiên chức năng Khá tốt, phụ thuộc prompt Mượt & có tổ chức
Xử lý lỗi Cần can thiệp tay ở logic phức tạp Ổn nếu prompt rõ Phát hiện & sửa tốt, kèm giải thích

5) Demo 2 tác vụ cụ thể

Task 1 — Platformer 2D phong cách Super Mario

Prompt: “Tạo một trò chơi platformer 2D cơ bản theo phong cách Super
Mario. Trò chơi nên có bố cục đơn giản dựa trên các ô vuông với Mario
đứng trên các khối đất, nền trời với những đám mây, khối hình dấu hỏi
phía trên và một đường ống màu xanh lá cây gần đó. Bao gồm các cơ chế cơ
bản như di chuyển trái/phải và nhảy bằng các phím mũi tên trên bàn phím.
Mô phỏng trọng lực và va chạm với các nền tảng. Sử dụng đồ họa theo
phong cách pixel-art với các tài nguyên cục bộ được nhúng hoặc tham
chiếu.”

Codex CLI

Gemini CLI

Claude Code

Task 2 — Đồng hồ động theo chủ đề thời tiết

Prompt: “Thiết kế và phát triển một bảng điều khiển đồng hồ động theo
chủ đề thời tiết với giao diện trực quan phong phú chỉ bằng HTML, CSS và
JavaScript. Mục tiêu chính là tạo ra một giao diện đồng hồ thời gian
thực, không chỉ hiển thị thời gian hiện tại mà còn tự động điều chỉnh
theo thời gian trong ngày. Triển khai bốn hiệu ứng chuyển tiếp nền động
thể hiện bình minh, trưa, hoàng hôn và đêm, mỗi hiệu ứng có màu sắc và
các yếu tố động riêng biệt như mây trôi, sao lấp lánh, hoặc mặt trời/mặt
trăng mọc/lặn, và cung cấp tùy chọn chuyển đổi giữa định dạng thời gian
12 giờ và 24 giờ. Để tăng thêm tính tương tác, hãy thêm một phần hiển
thị câu trích dẫn động lực hoặc năng suất theo từng giờ.”

Codex CLI

Gemini CLI

Claude Code

6) Ưu & Nhược điểm thực tế

6.1 Codex CLI

Ưu điểm

  • Tốc độ phản hồi rất nhanh; phù hợp vòng lặp “chia nhỏ — chạy thử — sửa
    — lặp”.
  • Trải nghiệm terminal gọn gàng: xem diff → áp dụng, chạy test/format
    ngay trong CLI.
  • Ổn định ở tác vụ nhỏ/vừa; giữ mạch công việc tốt khi bạn dẫn dắt bằng
    checklist/to-do.

Nhược điểm

  • UI/animation phức tạp (parallax, canvas, webGL) thường cần chỉnh tay
    thêm; thiên về chức năng.
  • Logic nhiều tầng, đa mô-đun: đôi lúc bỏ sót ràng buộc; cần test bao
    phủ để duy trì chất lượng.
  • Tài liệu hoá sinh tự động thường ngắn; cần yêu cầu bổ sung “why/how”.

6.2 Gemini CLI

Ưu điểm

  • Ngữ cảnh rất lớn: đọc nhiều tệp/README/changelog cùng lúc, tổng hợp
    nguồn nhanh.
  • Kéo tri thức ngoài (web/search) khi thiếu snippet/tiêu chuẩn, rồi hợp
    nhất vào triển khai.
  • Hữu ích khi khởi tạo dự án mới cần nhiều guideline & tài liệu tham
    chiếu.

Nhược điểm

  • Đầu ra thường dài; cần rút gọn để tránh code/CSS dư hoặc cấu trúc rườm
    rà.
  • Logic chưa ổn định ở bài toán nhiều ràng buộc (ví dụ game với va
    chạm/trọng lực).
  • Độ trễ trung bình; prompt càng dài càng tốn thời gian suy nghĩ.

6.3 Claude Code

Ưu điểm

  • Hiểu dự án tốt, nổi bật ở refactor, gom code trùng, đặt tên có chủ
    đích, output có chú thích.
  • UI/animation mượt, trạng thái rõ; phù hợp demo front-end đòi hỏi
    chuyển cảnh tinh tế.
  • Phù hợp quy trình nhóm: có thể sinh commit/PR có mô tả, tài liệu hoá
    bài bản.

Nhược điểm

  • Tốc độ chậm hơn; không phù hợp khi cần xử lý “siêu nhanh”.
  • Phụ thuộc prompt chi tiết để đạt kiến trúc “đúng gu”.
  • Với tác vụ rất nhỏ (1–2 file), chi phí thời gian đôi khi lớn hơn lợi
    ích so với Codex.

7) Chọn công cụ nào theo nhu cầu

Muốn tốc độ & vòng lặp ngắn

Chọn Codex. Giao tác vụ nhỏ-vừa, kiểm diff theo
bước; tận dụng test/format tự động để “khoanh vùng lỗi” nhanh.

Muốn kéo ngữ cảnh ngoài & tìm kiếm

Chọn Gemini. Gom README, guideline, link web → hợp
nhất checklist & script; hữu ích khi khởi tạo dự án nhiều ràng buộc.

Muốn refactor & quản lý codebase lớn

Chọn Claude. Giao nhiệm vụ tổ chức lại cấu trúc,
sinh PR có mô tả; yêu cầu giải thích kiến trúc & tác động.

Codex CLI vs Gemini CLI vs Claude Code

1. Codex CLI – Capabilities and New Features

According to OpenAI’s official announcement (“Introducing upgrades to Codex”), Codex CLI has been rebuilt on top of GPT-5-Codex, turning it into an agentic programming assistant — a developer AI that can autonomously plan, reason, and execute tasks across coding environments.

🌟 Core Abilities

  • Handles both small and large tasks: From writing a single function to refactoring entire projects.
  • Cross-platform integration: Works seamlessly across terminal (CLI), IDE (extension), and cloud environments.
  • Task reasoning and autonomy: Can track progress, decompose goals, and manage multi-step operations independently.
  • Secure by design: Runs in a sandbox with explicit permission requests for risky operations.

📈 Performance Highlights

  • Uses 93.7% fewer reasoning tokens for simple tasks, but invests 2× more computation on complex ones.
  • Successfully ran over 7 hours autonomously on long software tasks during testing.
  • Produces more precise code reviews than older Codex versions.

🟢 In short: Codex CLI 2025 is not just a code generator — it’s an intelligent coding agent capable of reasoning, multitasking, and working securely across terminal, IDE, and cloud environments.

2.Codex CLI vs Gemini CLI vs Claude Code: The New Era of AI in the Terminal

The command line has quietly become the next frontier for artificial intelligence.
While graphical AI tools dominate headlines, the real evolution is unfolding inside the terminal — where AI coding assistants now operate directly beside you, as part of your shell workflow.

Three major players define this new space: Codex CLI, Gemini CLI, and Claude Code.
Each represents a different philosophy of how AI should collaborate with developers — from speed and connectivity to reasoning depth. Let’s break down what makes each contender unique, and where they shine.


🧩 Codex CLI — OpenAI’s Code-Focused Terminal Companion

Codex CLI acts as a conversational layer over your terminal.
It listens to natural language commands, interprets your intent, and translates it into executable code or shell operations.
Now powered by OpenAI’s Codex5-Medium, it builds on the strengths of the o4-mini generation while adding adaptive reasoning and a larger 256K-token context window.

Once installed, Codex CLI integrates seamlessly with your local filesystem.
You can type:

“Create a Python script that fetches GitHub issues and logs them daily,”
and watch it instantly scaffold the files, import the right modules, and generate functional code.

Codex CLI supports multiple languages — Python, JavaScript, Go, Rust, and more — and is particularly strong at rapid prototyping and bug fixing.
Its defining trait is speed: responses feel immediate, making it perfect for fast iteration cycles.

Best for: developers who want quick, high-quality code generation and real-time debugging without leaving the terminal.


🌤️ Gemini CLI — Google’s Adaptive Terminal Intelligence

Gemini CLI embodies Google’s broader vision for connected AI development — blending reasoning, utility, and live data access.
Built on Gemini 2.5 Pro, this CLI isn’t just a coding bot — it’s a true multitool for developers and power users alike.

Beyond writing code, Gemini CLI can run shell commands, retrieve live web data, or interface with Google Cloud services.
It’s ideal for workflows that merge coding with external context — for example:

  • fetching live API responses,

  • monitoring real-time metrics,

  • or updating deployment configurations on-the-fly.

Tight integration with VS Code, Google Cloud SDK, and Workspace tools turns Gemini CLI into a full-spectrum AI companion rather than a mere code generator.

Best for: developers seeking a versatile assistant that combines coding intelligence with live, connected utility inside the terminal.


🧠 Claude Code — Anthropic’s Deep Code Reasoner

If Codex is about speed, and Gemini is about connectivity, Claude Code represents depth.
Built on Claude Sonnet 4.5, Anthropic’s upgraded reasoning model, Claude Code is designed to operate as a true engineering collaborator.

It excels at understanding, refactoring, and maintaining large-scale codebases.
Claude Code can read entire repositories, preserve logic across files, and even generate complete pull requests with human-like commit messages.
Its upgraded 250K-token context window allows it to track dependencies, explain architectural patterns, and ensure code consistency over time.

Claude’s replies are more analytical — often including explanations, design alternatives, and justifications for each change.
It trades a bit of speed for a lot more insight and reliability.

Best for: professional engineers or teams managing complex, multi-file projects that demand reasoning, consistency, and full-codebase awareness.

3.Codex CLI vs Gemini CLI vs Claude Code: Hands-on With Two Real Projects

While benchmarks and specs are useful, nothing beats actually putting AI coding agents to work.
To see how they perform on real, practical front-end tasks, I tested three leading terminal assistants — Codex CLI (Codex5-Medium), Gemini CLI (Gemini 2.5 Pro), and Claude Code (Sonnet 4.5) — by asking each to build two classic web projects using only HTML, CSS, and JavaScript.

  • 🎮 Project 1: Snake Game — canvas-based, pixel-style, smooth movement, responsive.

  • Project 2: Todo App — CRUD features, inline editing, filters, localStorage, dark theme, accessibility + keyboard support.

🎮 Task 1 — Snake Game

Goal

Create a playable 2D Snake Game using HTML, CSS, and JavaScript.
Display a grid-based canvas with a moving snake that grows when it eats food.
The snake should move continuously and respond to arrow-key inputs.
The game ends when the snake hits the wall or itself.
Include a score counter and a restart button with pixel-style graphics and responsive design.

Prompt

Create a playable 2D Snake Game using HTML, CSS, and JavaScript.

  The game should display a grid-based canvas with a moving snake that grows when it eats

  food.

  The snake should move continuously and respond to keyboard arrow keys for direction

  changes.

  The game ends when the snake hits the wall or itself.

  Show a score counter and a restart button.

  Use smooth movement, pixel-style graphics, and responsive design for different screen sizes

Observations

Codex CLI — Generated the basic canvas scaffold in seconds. Game loop, input, and scoring worked out of the box, but it required minor tuning for smoother turning and anti-reverse logic.

Gemini CLI — Delivered well-structured, commented code and used requestAnimationFrame properly. Gameplay worked fine, though the UI looked plain — more functional than fun.

Claude Code — Produced modular, production-ready code with solid collision handling, restart logic, and a polished HUD. Slightly slower response but the most complete result overall.

✅ Task 2 — Todo App

Goal

Build a complete, user-friendly Todo List App using only HTML, CSS, and JavaScript (no frameworks).
Features: add/edit/delete tasks, mark complete/incomplete, filter All / Active / Completed, clear completed, persist via localStorage, live counter, dark responsive UI, and full keyboard accessibility (Enter/Space/Delete).
Deliverables: index.html, style.css, app.js — clean, modular, commented, semantic HTML + ARIA.

Prompt

Develop a complete and user-friendly Todo List App using only HTML, CSS, and JavaScript (no frameworks). The app should include the following functionality and design requirements:

    1. Input field and ‘Add’ button to create new tasks.
    2. Ability to mark tasks as complete/incomplete via checkboxes.
    3. Inline editing of tasks by double-clicking — pressing Enter saves changes and Esc cancels.
    4. Delete buttons to remove tasks individually.
    5. Filter controls for All, Active, and Completed tasks.
    6. A ‘Clear Completed’ button to remove all completed tasks at once.
    7. Automatic saving and loading of todos using localStorage.
    8. A live counter showing the number of active (incomplete) tasks.
    9. A modern, responsive dark theme UI using CSS variables, rounded corners, and hover effects.
    10. Keyboard accessibility — Enter to add, Space to toggle, Delete to remove tasks.
      Ensure the project is well structured with three separate files:
    • index.html
    • style.css
    • app.js
      Code should be clean, modular, and commented, with semantic HTML and appropriate ARIA attributes for accessibility.

Observations

Codex CLI — Created a functional 3-file structure with working CRUD, filters, and persistence. Fast, but accessibility and keyboard flows needed manual reminders.

Gemini CLI — Balanced logic and UI nicely. Used CSS variables for a simple dark theme and implemented localStorage properly.
Performance was impressive — Gemini was the fastest overall, but its default design felt utilitarian, almost as if it “just wanted to get the job done.”
Gemini focuses on correctness and functionality rather than visual finesse.

Claude Code — Implemented inline editing, keyboard shortcuts, ARIA live counters, and semantic roles perfectly. The result was polished, responsive, and highly maintainable.

4.Codex CLI vs Gemini CLI vs Claude Code — Real-World Comparison

When testing AI coding assistants, speed isn’t everything — clarity, structure, and the quality of generated code all matter. To see how today’s top command-line tools compare, I ran the same set of projects across Claude Code, Gemini CLI, and Codex CLI, including a 2D Snake Game and a Todo List App.
Here’s how they performed.


Claude Code: Polished and Reliable

Claude Code consistently produced the most professional and complete results.
Its generated code came with clear structure, organized logic, and well-commented sections.
In the Snake Game test, Claude built the best-looking user interface, with a balanced layout, responsive design, and smooth movement logic.
Error handling was handled cleanly, and the overall experience felt refined — something you could hand over to a production team with confidence.
Although it wasn’t the fastest, Claude made up for it with code quality, structure, and ease of prompt engineering.
If your workflow values polish, maintainability, and readability, Claude Code is the most dependable choice.


Gemini CLI: Fastest but Basic

Gemini CLI clearly took the top spot for speed.
It executed quickly, generated files almost instantly, and made iteration cycles shorter.
However, the output itself felt minimal and unrefined — both the UI and the underlying logic were quite basic compared to Claude or Codex.
In the Snake Game task, Gemini produced a playable result but lacked visual polish and consistent structure.
Documentation and comments were also limited.
In short, Gemini is great for rapid prototyping or testing ideas quickly, but not for projects where you need beautiful UI, advanced logic, or long-term maintainability.


Codex CLI: Flexible but Slower

Codex CLI offered good flexibility and handled diverse prompts reasonably well.
It could generate functional UIs with decent styling, somewhere between Gemini’s simplicity and Claude’s refinement.
However, its main drawback was speed — responses were slower, and sometimes additional manual intervention was needed to correct or complete the code.
Codex is still a solid option when you need to tweak results manually or explore multiple implementation approaches, but it doesn’t match Claude’s polish or Gemini’s speed.


Overall Impression

After testing multiple projects, the overall ranking became clear:

  • Gemini CLI is the fastest but produces simple and unpolished code.

  • Claude Code delivers the most reliable, structured, and visually refined results.

  • Codex CLI sits in between — flexible but slower and less cohesive.

Each tool has its strengths. Gemini is ideal for quick builds, Codex for experimentation, and Claude Code for professional, trust-ready outputs.

In short:

Gemini wins on speed. Claude wins on quality. Codex stands in between — flexible but slower.