So Sánh Các Phương Pháp Xử Lý Excel Cho RAG

Posted on November 21, 2025November 24, 2025 by Cuong Dinh

🔍 So Sánh Các Phương Pháp Xử Lý Excel Cho RAG

Tìm kiếm “công thức” tối ưu để trích xuất dữ liệu từ file Excel phục vụ hệ thống RAG

Giới thiệu

Trong thực tế, file Excel được sử dụng rộng rãi với nhiều định dạng phức tạp: bảng dữ liệu có màu sắc, biểu đồ, hình ảnh, và các cấu trúc đặc biệt. Khi xây dựng hệ thống RAG (Retrieval-Augmented Generation), câu hỏi đặt ra là: Làm thế nào để “nấu” dữ liệu Excel sao cho LLM hiểu được một cách tốt nhất?

Bài viết này so sánh 5 phương pháp xử lý Excel khác nhau, từ đơn giản đến phức tạp, dựa trên 4 kịch bản thực tế với các câu hỏi cụ thể để đánh giá độ chính xác.

⚠️ Lưu ý: Đây là nghiên cứu về tiền xử lý dữ liệu, không tập trung vào vector search hay prompt engineering. Mục tiêu là tìm cách tốt nhất để chuyển đổi Excel thành định dạng mà LLM có thể hiểu.

🔧 Thiết lập thử nghiệm

Công cụ sử dụng:

Ngôn ngữ: TypeScript
LLM: Gemini 2.5 Pro
Thư viện: XLSX, ExcelJS, JSZip, LibreOffice

4 kịch bản test:

Bảng chấm công: Quản lý ngày làm việc/nghỉ phép hàng tháng
Biểu đồ Gantt: Quản lý dự án với màu sắc phân chia thời gian
Báo cáo doanh số: Bảng số liệu kèm biểu đồ
Hướng dẫn sử dụng: Tài liệu có ảnh chụp màn hình

1. Phương pháp CSV (Plain Text)

33%

Chuyển đổi trực tiếp Excel thành text dạng comma-separated values. Đơn giản nhất nhưng mất toàn bộ định dạng.

Cách triển khai: Sử dụng thư viện XLSX với hàm sheet_to_csv()

Ưu điểm

✓Triển khai đơn giản
✓Xử lý nhanh
✓Dung lượng nhỏ

Nhược điểm

✗Mất định dạng cell
✗Không có thông tin màu sắc
✗Không chứa hình ảnh

Kết quả: CSV hoạt động tốt với dữ liệu bảng đơn giản (50% câu đúng ở báo cáo doanh số) nhưng thất bại hoàn toàn với Gantt chart, biểu đồ và hình ảnh do không capture được thông tin visual.

2. Phương pháp JSON (Structured)

50%

Chuyển đổi thành cấu trúc JSON với cặp key-value rõ ràng. Dễ parse và xử lý bằng code.

Cách triển khai: Sử dụng thư viện XLSX với hàm sheet_to_json()

Ưu điểm

✓Cấu trúc rõ ràng
✓Dễ parse và query
✓Tốt cho bảng đơn giản

Nhược điểm

✗Không có styling
✗Không có hình ảnh
✗Mất context trực quan

Kết quả: JSON vượt trội CSV nhờ cấu trúc key-value, đạt 100% với bảng chấm công. Tuy nhiên vẫn không xử lý được màu sắc, biểu đồ và hình ảnh – thất bại hoàn toàn với Gantt chart và hướng dẫn.

3. Phương pháp HTML (Rich Format)

42%

Chuyển thành bảng HTML với đầy đủ style attributes (màu nền, màu chữ, font, alignment). Giữ được nhiều thông tin định dạng.

Cách triển khai: Sử dụng ExcelJS để trích xuất chi tiết style và chuyển thành HTML table với inline CSS

Ưu điểm

✓Giữ được màu sắc
✓Bảo toàn formatting
✓Có font styles

Nhược điểm

✗Implementation phức tạp
✗Không có hình ảnh
✗File size lớn hơn

Kết quả: HTML capture được màu sắc nên có thể xử lý Gantt chart (33% thành công), nhưng độ chính xác không ổn định (ngày thường lệch 1). Vẫn không có biểu đồ và hình ảnh. Code implementation phức tạp nhưng có tiềm năng cải thiện.

4. Phương pháp PDF Image (Visual)

67%

Chuyển Excel thành PDF và encode dưới dạng image gửi cho LLM. Giữ nguyên 100% giao diện trực quan.

Cách triển khai: Sử dụng LibreOffice CLI để convert Excel → ODS → áp dụng page template → PDF, sau đó encode base64

Ưu điểm

✓Độ trung thực visual 100%
✓Có biểu đồ
✓Có hình ảnh gốc

Nhược điểm

✗Khó trích xuất bảng chi tiết
✗File size lớn
✗Cần OCR cho text

Kết quả: PDF xuất sắc với visual content – 100% chính xác với hướng dẫn có screenshot và báo cáo có biểu đồ. Tuy nhiên yếu với bảng dữ liệu chi tiết (0% với bảng chấm công) do LLM khó phân tích row/column từ image.

5. Phương pháp Hybrid (HTML + PDF) ⭐

100%

Kết hợp cả HTML và PDF Image – gửi đồng thời cả hai cho LLM. HTML cung cấp cấu trúc bảng và màu sắc, PDF cung cấp thông tin visual (biểu đồ, hình ảnh).

Cách triển khai: Không cần code mới – chỉ cần gửi kết quả của cả method 3 (HTML) và method 4 (PDF) cùng lúc cho LLM

Ưu điểm

✓Tốt nhất trong mọi tình huống
✓Xử lý được mọi loại Excel
✓Độ chính xác cao nhất
✓Bù trừ nhược điểm lẫn nhau

Nhược điểm

✗Phức tạp nhất
✗Payload lớn nhất
✗Chi phí LLM cao hơn

Kết quả: Hybrid đạt 100% (24/24 câu đúng) bằng cách tận dụng điểm mạnh của cả hai: HTML cho cấu trúc bảng + màu sắc, PDF cho biểu đồ + hình ảnh. LLM có thể cross-reference giữa hai nguồn để đưa ra câu trả lời chính xác nhất.

📊 Bảng so sánh tổng hợp

Kịch bản / Câu hỏi	CSV	JSON	HTML	PDF	Hybrid
Bảng chấm công: Ai nghỉ ngày 15/10?	✗	✓	✓	✗	✓
Bảng chấm công: Mike nghỉ khi nào?	△	✓	✓	✗	✓
Gantt: Thiết kế – ai & khi nào?	✗	✗	△	✓	✓
Gantt: Testing khi nào?	✗	✗	△	△	✓
Doanh số: Vùng nào cao nhất Q3?	✓	✓	✗	✓	✓
Doanh số: Chart xanh-đỏ cách xa nhất?	✗	✗	✗	✓	✓
Hướng dẫn: Nút Save ở đâu?	✗	✗	✗	✓	✓
Hướng dẫn: Bước 3 có mấy nút?	✗	✗	✗	✓	✓
TỔNG ĐIỂM	33%	50%	42%	67%	100%

🔍 Phân tích chi tiết

CSV & JSON – Giới hạn rõ ràng

Triển khai đơn giản nhưng hoàn toàn không xử lý được màu sắc, hình ảnh, biểu đồ. JSON tốt hơn CSV một chút nhờ cấu trúc key-value rõ ràng, giúp các câu hỏi về bảng chấm công (row-based queries) chính xác hơn. Tuy nhiên, với Gantt chart và hướng dẫn có hình ảnh thì cả hai đều bất lực.

HTML (ExcelJS) – Một nửa thành công

Phương pháp này có thể trích xuất được màu nền, font style, text alignment… nên với Gantt chart có thể nhận diện màu sắc. Tuy nhiên độ chính xác không ổn định (ngày thường lệch 1), code implementation phức tạp. Nếu đầu tư thêm về xử lý date format và cell merging có thể cải thiện. Vẫn không xử lý được biểu đồ và hình ảnh.

PDF Image – Mạnh về visual

Điểm sáng lớn nhất là giữ nguyên 100% giao diện Excel: màu sắc, biểu đồ, hình ảnh, layout. Vì vậy xuất sắc với hướng dẫn có screenshot và báo cáo có chart. Tuy nhiên với bảng dữ liệu chi tiết (bảng chấm công) thì lại yếu – LLM khó phân tích quan hệ row/column từ image. Có thể trong tương lai khi LLM tốt hơn trong việc đọc image thì vấn đề này sẽ được cải thiện.

Hybrid (HTML + PDF) – Người chiến thắng 🏆

Bằng cách gửi cả HTML và PDF cho LLM, phương pháp này tận dụng được điểm mạnh của cả hai:

HTML cung cấp cấu trúc bảng rõ ràng + thông tin màu sắc
PDF cung cấp biểu đồ + hình ảnh + context trực quan
LLM có thể cross-reference giữa hai nguồn để đưa ra câu trả lời chính xác nhất

Trong test này đạt 100% (24/24 câu đúng), xử lý tốt mọi loại Excel. Nhược điểm duy nhất là implementation phức tạp và chi phí API cao hơn do payload lớn.

Demo

Bước 1. Chuẩn bị & Cài đặt

(Tuỳ chọn) tạo virtual env ⇒ python -m venv venv && venv\Scripts\activate
Cài thư viện ⇒ pip install -r requirements.txt
Tạo dữ liệu demo ⇒ python create_sample_excel.py (sinh sample_data.xlsx & sample_data_formatted.xlsx)

Bước 2. Code chính cần nắm

excel_processors.py & excel_food_processors.py: định nghĩa các class xử lý Excel (4 cách cơ bản + 5 cách đang so sánh).
compare_excel_methods.py, compare_food_methods.py: benchmark, in thống kê, tạo báo cáo HTML.
html_report_generator.py: dựng trang HTML (summary cards, bảng, biểu đồ, chi tiết, khuyến nghị).
example_usage.py, example_food_methods.py: ví dụ gọi từng processor và mô phỏng pipeline RAG.
run_all_comparisons.py, run.bat: script tổng hợp chạy mọi bước.

Bước 3. Các bước xử lý thực tế

Chạy python compare_food_methods.py hoặc python compare_excel_methods.py (tự sinh report HTML).
Mở báo cáo ⇒ python open_report.py (mở file comparison_report_*.html mới nhất).
Xem ví dụ tích hợp RAG ⇒ python example_food_methods.py (chunk → embed → vector DB → truy vấn).

Bước 4. Logic trong các hàm `main`

create_sample_excel.py: in thông báo → gọi hai hàm con tạo file Excel (pandas + openpyxl) → báo hoàn thành.

compare_excel_methods.py: kiểm tra file mẫu → với từng file: chạy 4 processor, đo thời gian/chunks/ký tự, in bảng + khuyến nghị, chuẩn hoá dữ liệu rồi gọi HTMLReportGenerator.

compare_food_methods.py: giống trên nhưng dùng 5 processor, thêm phần mô tả chi tiết từng phương pháp trước khi tạo báo cáo HTML.

run_all_comparisons.py: nếu thiếu file mẫu sẽ tự chạy script tạo → lần lượt gọi 2 script so sánh (CLI + HTML) → nhắc người dùng xem docs/ví dụ.

example_usage.py / example_food_methods.py: mỗi hàm instantiate một processor, chạy extract_text(), in số chunk và metadata để minh hoạ cho pipeline RAG.

open_report.py: tìm comparison_report*.html, lấy file mới nhất theo mtime, mở trong trình duyệt mặc định.

GIT:
https://github.com/cuongdvscuti/compare-rag

💡 Kết luận & Khuyến nghị

🎯 Khi nào dùng phương pháp nào?

CSV/JSON: Prototype nhanh, bảng dữ liệu đơn giản không có định dạng
HTML: Bảng có màu sắc, định dạng quan trọng, không có biểu đồ/hình ảnh
PDF: Dashboard, báo cáo có chart, tài liệu có screenshot
Hybrid: Hệ thống production cần độ chính xác cao, xử lý Excel phức tạp

⚖️ Trade-offs quan trọng

Độ chính xác vs Chi phí implementation vs Chi phí runtime. Hybrid có độ chính xác cao nhất nhưng cũng tốn kém nhất. Với use case cụ thể cần cân nhắc kỹ.

🚀 Bước tiếp theo cho RAG

Xác định chiến lược chunking (table-level vs row-level)
Tối ưu hóa embedding generation cho mixed content
Implement vector search hiệu quả
Thiết kế prompt engineering cho từng loại Excel
Xây dựng fallback strategies cho edge cases

✨ Khuyến nghị chung:
Bắt đầu với JSON cho prototype, chuyển sang HTML khi cần colors, và nâng cấp lên Hybrid cho production nếu budget cho phép. PDF đơn lẻ phù hợp cho dashboard/manual. Luôn test với dữ liệu thực tế của bạn vì mỗi tổ chức có cách dùng Excel khác nhau!

📝 Bài viết này dựa trên thử nghiệm thực tế với LLM Gemini 2.5 Pro

💬 Bạn đang dùng phương pháp nào cho RAG với Excel? Chia sẻ kinh nghiệm nhé!

Context Engineering cho AI Agents – Tóm tắt từ Anthropic

Posted on October 14, 2025 by Cuong Dinh

Context Engineering cho AI Agents

Tóm tắt từ bài viết của Anthropic về nghệ thuật quản lý context trong phát triển AI

🎯 Context Engineering là gì?

Context Engineering là tập hợp các chiến lược để tuyển chọn và duy trì bộ tokens (thông tin) tối ưu trong quá trình AI agents hoạt động.

Nó bao gồm việc quản lý toàn bộ trạng thái context như:

System prompts (hướng dẫn hệ thống)
Tools (công cụ)
Model Context Protocol (MCP)
External data (dữ liệu bên ngoài)
Message history (lịch sử hội thoại)
Các thông tin khác trong context window

💡 Bản chất: Context Engineering là nghệ thuật và khoa học về việc tuyển chọn thông tin nào sẽ đưa vào context window giới hạn từ vũ trụ thông tin liên tục phát triển của agent.

🔄 Khác biệt giữa Context Engineering và Prompt Engineering

📝 Prompt Engineering

Focus: Cách viết instructions (hướng dẫn)
Phạm vi: Tối ưu hóa system prompts
Use case: Tác vụ đơn lẻ, one-shot
Tính chất: Rời rạc, tĩnh

Ví dụ: “Tóm tắt văn bản này thành 3 điểm chú trọng số liệu tài chính”

🧠 Context Engineering

Focus: Model nhìn thấy gì trong context window
Phạm vi: Toàn bộ trạng thái thông tin
Use case: Multi-turn, tác vụ dài hạn
Tính chất: Lặp lại, động, liên tục

Ví dụ: Quyết định agent nên xem toàn bộ tài liệu, 3 phần cuối, hay bản tóm tắt đã chuẩn bị?

🎭 Ẩn dụ: Prompt engineering là “nói cho ai đó biết phải làm gì”, còn context engineering là “quyết định nên cung cấp nguồn lực gì cho họ”.

⚡ Tại sao Context Engineering quan trọng hơn?

Khi AI agents thực hiện các tác vụ phức tạp trên nhiều vòng lặp, chúng tạo ra ngày càng nhiều dữ liệu. Thông tin này phải được tinh chỉnh theo chu kỳ. Context engineering xảy ra mỗi khi chúng ta quyết định đưa gì vào model – đây là quá trình lặp đi lặp lại, không phải một lần.

⚠️ Những điều cần chú ý khi phát triển AI Agents

1. 🎯 Vấn đề “Goldilocks Zone” cho System Prompts

System prompts cần nằm ở “vùng vừa phải” giữa hai thái cực:

❌ Quá cứng nhắc: Hardcode logic if-else phức tạp → agent dễ vỡ, khó bảo trì

❌ Quá mơ hồ: Hướng dẫn chung chung, giả định context chung → thiếu tín hiệu cụ thể

✅ Vùng tối ưu: Đủ cụ thể để dẫn dắt hành vi, nhưng đủ linh hoạt để cung cấp heuristics mạnh mẽ

2. 🧹 “Context Rot” – Sự suy giảm độ chính xác

Khi context window dài ra, độ chính xác của model giảm xuống:

Giới hạn chú ý: LLMs giống con người – không thể nhớ mọi thứ khi quá tải. Nhiều tokens ≠ chính xác hơn
Context rot: Context càng dài, độ chính xác truy xuất càng giảm. Thêm 100 trang logs có thể che mất chi tiết quan trọng duy nhất
Kiến trúc transformer: Tạo n² mối quan hệ giữa các tokens (10K tokens = 100M quan hệ, 100K tokens = 10B quan hệ)

💡 Giải pháp: Implement pagination, range selection, filtering, truncation với giá trị mặc định hợp lý

3. 🔧 Quản lý Tools hiệu quả

Giữ tools riêng biệt: Không tạo 2 tools cùng làm việc giống nhau (VD: cùng fetch news)
Mô tả rõ ràng: Viết tool descriptions như hướng dẫn nhân viên mới – rõ ràng, tránh mơ hồ
Token-efficient: Giới hạn tool responses (VD: Claude Code giới hạn 25,000 tokens mặc định)
Error handling tốt: Error messages phải cụ thể, actionable, không phải error codes mơ hồ

4. 📊 Just-in-Time Context Retrieval

Thay vì load toàn bộ dữ liệu trước, hãy fetch dữ liệu động khi cần:

Tránh overload context window
Giảm token costs
Ngăn context poisoning (nhiễu thông tin)
Tương tự cách con người dùng hệ thống indexing bên ngoài

5. 🎨 Ba chiến lược cho tác vụ dài hạn

📦 Compaction (Nén thông tin)

Tóm tắt context cũ, giữ lại thông tin quan trọng

📝 Structured Note-Taking

Agent tự ghi chú có cấu trúc về những gì đã làm

🤖 Multi-Agent Architecture

Spawn sub-agents nhỏ cho các tác vụ hẹp, trả về kết quả ngắn gọn

6. 🎯 Ưu tiên Context theo tầm quan trọng

🔴 High Priority (luôn có trong context): Tác vụ hiện tại, kết quả tool gần đây, hướng dẫn quan trọng

🟡 Medium Priority (khi có không gian): Examples, quyết định lịch sử

⚪ Low Priority (on-demand): Nội dung file đầy đủ, documentation mở rộng

7. 📈 Monitoring và Iteration

Theo dõi liên tục:

Token usage per turn
Tool call frequency
Context window utilization
Performance ở các độ dài context khác nhau
Recall vs Precision khi rút gọn context

💡 Quy trình: Bắt đầu đơn giản → Test → Xác định lỗi → Thêm hướng dẫn cụ thể → Loại bỏ redundancy → Lặp lại

💡 Kết luận

Context engineering là kỹ năng then chốt để xây dựng AI agents hiệu quả. Khác với prompt engineering tập trung vào “cách viết instructions”, context engineering quan tâm đến “môi trường thông tin toàn diện” mà agent hoạt động.

Thành công không nằm ở việc tìm từ ngữ hoàn hảo, mà là tối ưu hóa cấu hình context để tạo ra hành vi mong muốn một cách nhất quán.

🎯 Nguyên tắc cốt lõi: Tìm bộ tokens nhỏ nhất có tín hiệu cao nhất để tối đa hóa khả năng đạt được kết quả mong muốn. Mỗi từ không cần thiết, mỗi mô tả tool thừa, mỗi dữ liệu cũ đều làm giảm hiệu suất agent.

Revolutionizing Test Automation with Playwright Agents

Posted on October 14, 2025October 14, 2025 by Cuong Dinh

🎭 Revolutionizing Test Automation with Playwright Agents

How AI-Powered Agents are Transforming E2E Testing

📅 October 2025
⏱️ 5 min read
🏷️ Testing, AI, Automation

Imagine this: You describe what you want to test, and AI generates comprehensive test plans, writes the actual test code, and even fixes failing tests automatically. This isn’t science fiction—it’s Playwright Agents, and it’s available today.

Playwright has introduced three powerful AI agents that work together to revolutionize how we approach test automation: the Planner, Generator, and Healer. Let’s dive deep into how these agents are changing the game.

What Are Playwright Agents?

Playwright Agents are AI-powered tools that automate the entire test creation and maintenance lifecycle. They can work independently or sequentially in an agentic loop, producing comprehensive test coverage for your product without the traditional manual overhead.

🎯

Planner Agent

The Planner is your AI test strategist. It explores your application and produces detailed, human-readable test plans in Markdown format.

How It Works:

Input: A clear request (e.g., “Generate a plan for guest checkout”), a seed test that sets up your environment, and optionally a Product Requirements Document
Process: Runs the seed test to understand your app’s structure, analyzes user flows, and identifies test scenarios
Output: Structured Markdown test plans saved in specs/ directory with detailed steps and expected results

💡 Pro Tip: The Planner uses your seed test as context, so it understands your custom fixtures, authentication flows, and project setup automatically!

Example Output:

# TodoMVC Application - Basic Operations Test Plan

## Test Scenarios

### 1. Adding New Todos
#### 1.1 Add Valid Todo
**Steps:**
1. Click in the "What needs to be done?" input field
2. Type "Buy groceries"
3. Press Enter key

**Expected Results:**
- Todo appears in the list with unchecked checkbox
- Counter shows "1 item left"
- Input field is cleared and ready for next entry

⚡

Generator Agent

The Generator transforms your human-readable test plans into executable Playwright test code, verifying selectors and assertions in real-time.

Key Features:

Live Verification: Checks selectors against your actual app while generating code
Smart Assertions: Uses Playwright’s catalog of assertions for robust validation
Context Aware: Inherits setup from seed tests and maintains consistency
Best Practices: Generates code following Playwright conventions and modern patterns

Generated Test Example:

// spec: specs/basic-operations.md
// seed: tests/seed.spec.ts
import { test, expect } from '../fixtures';

test.describe('Adding New Todos', () => {
  test('Add Valid Todo', async ({ page }) => {
    // Type and submit todo
    const todoInput = page.getByRole('textbox', { 
      name: 'What needs to be done?' 
    });
    await todoInput.fill('Buy groceries');
    await todoInput.press('Enter');
    
    // Verify todo appears
    await expect(page.getByText('Buy groceries')).toBeVisible();
    await expect(page.getByText('1 item left')).toBeVisible();
    await expect(todoInput).toHaveValue('');
  });
});

🔧

Healer Agent

The Healer is your automated maintenance engineer. When tests fail, it diagnoses issues and applies fixes automatically.

Healing Process:

Step 1: Replays the failing test steps to understand the failure context
Step 2: Inspects the current UI to locate equivalent elements or alternative flows
Step 3: Suggests patches like locator updates, wait adjustments, or data corrections
Step 4: Re-runs the test until it passes or determines the functionality is actually broken

🎯 Smart Decisions: If the Healer can’t fix a test after multiple attempts, it marks the test as skipped and flags it as a potential real bug in your application!

Common Fixes Applied:

Updating selectors when UI structure changes
Adding appropriate waits for dynamic content
Adjusting test data to match new requirements
Handling new dialog boxes or pop-ups

🤖 Working with Claude Code

Playwright Agents integrate seamlessly with Claude Code, enabling natural language test automation directly from your terminal.

Setup Process:

# Initialize Playwright Agents for Claude Code
npx playwright init-agents --loop=claude

# This generates agent definitions optimized for Claude Code
# under .github/ directory with MCP tools and instructions

1
Initialize: Run the init command to generate agent definitions

2
Plan: Ask Claude Code to use the Planner: “Use 🎭 planner to create a test plan for user registration”

3
Generate: Command the Generator: “Use 🎭 generator to create tests from specs/registration.md”

4
Heal: Let the Healer fix issues: “Use 🎭 healer to fix all failing tests”

Benefits with Claude Code:

Natural Language Control: Command agents using simple English instructions
Context Awareness: Claude Code understands your project structure and requirements
Iterative Refinement: Easily adjust and improve tests through conversation
Automatic Updates: Regenerate agents when Playwright updates to get latest features

The Complete Workflow

Here’s how the three agents work together to create comprehensive test coverage:

1. 🎯 Planner explores your app
   └─> Produces: specs/user-flows.md

2. ⚡ Generator reads the plan
   └─> Produces: tests/user-registration.spec.ts
               tests/user-login.spec.ts
               tests/checkout.spec.ts

3. Run tests: npx playwright test
   └─> Some tests fail due to UI changes

4. 🔧 Healer analyzes failures
   └─> Updates selectors automatically
   └─> Tests now pass ✅

Why This Matters

Traditional E2E testing requires significant manual effort:

Writing detailed test plans takes hours
Converting plans to code is tedious and error-prone
Maintaining tests as UI changes is a constant battle
New team members need extensive training

Playwright Agents eliminate these pain points by:

✅ Generating plans in minutes instead of hours
✅ Producing production-ready test code automatically
✅ Self-healing tests that adapt to UI changes
✅ Making test automation accessible to everyoneDEMO:

Github source : https://github.com/cuongdvscuti/agent-playwright

Ready to Transform Your Testing?

Playwright Agents represent a fundamental shift in how we approach test automation. By combining AI with Playwright’s powerful testing capabilities, you can achieve comprehensive test coverage with a fraction of the traditional effort.

Whether you’re starting a new project or maintaining an existing test suite, Playwright Agents can help you move faster, catch more bugs, and spend less time on maintenance.

Get Started with Playwright Agents

📚 Documentation

🐙 GitHub Repo

💬 Discord Community

OpenAI AgentKit vs Dify

Posted on October 9, 2025October 14, 2025 by Cuong Dinh

🤖 OpenAI AgentKit vs Dify

A Comprehensive Technical Comparison of Two Leading AI Agent Development Platforms

Last Updated: October 2025 | DevDay 2025 Analysis

Executive Summary: OpenAI AgentKit and Dify represent two distinct approaches to AI agent development. AgentKit, announced at OpenAI’s DevDay 2025, offers a comprehensive, proprietary toolkit designed to streamline agent creation within the OpenAI ecosystem. Dify, an open-source platform, provides extensive flexibility with multi-provider LLM support and full infrastructure control. This guide examines both platforms in depth to help you make an informed decision.

🚀 Platform Overview

OpenAI AgentKit

Launched October 2025 at DevDay, AgentKit is OpenAI’s complete toolkit for building production-ready AI agents with minimal friction.

Proprietary platform by OpenAI
Cloud-based deployment
Deep OpenAI ecosystem integration
Enterprise-grade security built-in
Visual drag-and-drop builder
Rapid prototyping (agents in hours, not months)

Dify

Open-source LLMOps platform with 180,000+ developers, supporting comprehensive AI application development with full control.

100% open-source platform
Self-hosted or cloud deployment
Multi-provider LLM support (GPT, Claude, Llama, etc.)
Complete data sovereignty
Extensive RAG capabilities
Active community of 180,000+ developers

🎯 OpenAI AgentKit – Core Features

🎨 Agent Builder

A visual canvas for creating and versioning multi-agent workflows using drag-and-drop functionality. Developers can design complex agent interactions without extensive coding.

Visual workflow designer
Version control for agent workflows
Multi-agent orchestration
Real-time collaboration
70% faster iteration cycles reported

💬 ChatKit

Embeddable, customizable chat interfaces that can be integrated directly into your applications with your own branding and workflows.

White-label chat interfaces
Custom branding options
Pre-built UI components
Seamless product integration
Mobile-responsive design

🔌 Connector Registry

Centralized admin dashboard for managing secure connections between agents and both internal tools and third-party systems.

Pre-built connectors: Dropbox, Google Drive, SharePoint, Teams
Secure data access management
Admin control panel
Third-party MCP server support
Enterprise-grade security controls

📊 Evaluation & Optimization

Comprehensive tools for measuring and improving agent performance with automated testing and optimization.

Datasets for component testing
End-to-end trace grading
Automated prompt optimization
Third-party model evaluation support
Custom grading criteria

🔒 Security & Guardrails

Built-in security layers protecting against data leakage, jailbreaks, and unintended behaviors.

PII leak detection and prevention
Jailbreak protection
Content filtering
OpenAI’s standard security measures
Compliance-ready infrastructure

⚡ Performance

Optimized for rapid development and deployment with impressive benchmarks demonstrated at DevDay 2025.

Live demo: 2 agents built in <8 minutes
Hours to deploy vs months traditionally
Built on Responses API
Integration with GPT-5 Codex
Dynamic thinking time adjustment

🎯 Real-World Success Story

Ramp (Fintech): Built a complete procurement agent in just a few hours instead of months using AgentKit. Their teams reported a 70% reduction in iteration cycles, launching agents in two sprints rather than two quarters. Agent Builder enabled seamless collaboration between product, legal, and engineering teams on the same visual canvas.

🛠️ Dify – Core Features

🎯 Visual Workflow Builder

Intuitive canvas for building and testing AI workflows with comprehensive model support and visual orchestration.

Drag-and-drop workflow design
Support for 100+ LLM models
Real-time debugging with node inspection
Variable tracking during execution
Instant step-by-step testing

🧠 Comprehensive Model Support

Seamless integration with hundreds of proprietary and open-source LLMs from multiple providers.

OpenAI: GPT-3.5, GPT-4, GPT-5
Anthropic: Claude models
Open-source: Llama3, Mistral, Qwen
Self-hosted model support
Any OpenAI API-compatible model

📚 RAG Pipeline

Extensive Retrieval-Augmented Generation capabilities covering the entire document lifecycle.

Document ingestion from multiple formats
PDF, PPT, Word extraction
Vector database integration
Advanced retrieval strategies
Metadata-based filtering for security

🤖 Agent Node System

Flexible agent architecture with customizable strategies for autonomous decision-making within workflows.

Plug-in “Agent Strategies”
Autonomous task handling
Custom tool integration
Multi-agent collaboration
Dynamic workflow adaptation

🎛️ Prompt Engineering IDE

Intuitive interface for crafting, testing, and comparing prompts across different models.

Visual prompt editor
Model performance comparison
A/B testing capabilities
Text-to-speech integration
Template management

📊 Observability & Operations

Full visibility into AI application performance with comprehensive logging and monitoring.

Complete execution logs
Cost tracking per execution
Conversation auditing
Performance metrics dashboard
Version control for workflows

🏢 Enterprise Features

Production-ready infrastructure with enterprise-grade security and scalability.

Self-hosted deployment options
AWS Marketplace integration
Custom branding and white-labeling
SSO and access control
Multi-tenant architecture

🌐 Open Source Advantage

Community-driven development with transparent roadmap and extensive customization options.

180,000+ developer community
34,800+ GitHub stars
Regular feature updates
Community plugins and extensions
Full code access and customization

🎯 Real-World Success Story

Volvo Cars: Uses Dify for rapid AI validation and deployment, enabling teams to quickly design and deploy complex NLP pipelines. This approach significantly improved assessment product quality while reducing both cost and time to market. Dify’s democratized AI development allows even non-technical team members to contribute to AI initiatives.

⚖️ Detailed Comparison

Feature / Aspect	OpenAI AgentKit	Dify
Launch Date	October 2025 (DevDay 2025)	May 2023 (Established platform)
Source Model	Proprietary, closed-source	100% open-source (GitHub)
Ecosystem	OpenAI-exclusive (GPT models)	Multi-provider (100+ LLMs from dozens of providers)
Deployment Options	Cloud-based on OpenAI platform only	Self-hosted, cloud, or hybrid deployment
Data Sovereignty	Managed by OpenAI infrastructure	Full control – host anywhere, complete data ownership
Model Support	OpenAI models (GPT-3.5, GPT-4, GPT-5, Codex)	GPT, Claude, Llama3, Mistral, Qwen, self-hosted models, any OpenAI-compatible API
Visual Builder	✓ Agent Builder (drag-and-drop, currently in beta)	✓ Visual workflow canvas (production-ready)
RAG Capabilities	Limited documentation available	Extensive: document ingestion, retrieval, PDF/PPT/Word extraction, vector databases, metadata filtering
Chat Interface	ChatKit (embeddable, customizable)	Built-in chat UI with full customization
Connectors	Connector Registry (Dropbox, Drive, SharePoint, Teams, MCP servers) – Limited beta	Extensive integration options, custom API connections, community plugins
Evaluation Tools	Datasets, trace grading, automated prompt optimization, custom graders	Full observability, debugging tools, version control, execution logs
Security Features	PII detection, jailbreak protection, OpenAI security standards, guardrails	Self-managed security, SSO, access control, custom security policies
Community Size	New (launched Oct 2025), growing adoption	180,000+ developers, 59,000+ end users, 34,800+ GitHub stars
Pricing Model	Included with standard API pricing, enterprise features for some components	Free tier, Professional ($59/month), Team ($159/month), Enterprise (custom)
Development Speed	Hours to build agents (demo showed <8 minutes for 2 agents)	Rapid prototyping, established workflow templates
Customization	Within OpenAI ecosystem constraints	Unlimited – full code access, custom modifications possible
Learning Curve	Low – designed for ease of use	Low to medium – extensive documentation and community support
Best For	OpenAI-committed teams, rapid prototyping, enterprise users wanting managed solution	Multi-provider needs, data sovereignty requirements, open-source advocates, full customization
Production Readiness	ChatKit & Evals: Generally available Agent Builder: Beta Connector Registry: Limited beta	Fully production-ready, battle-tested by 180,000+ developers
API Integration	Built on OpenAI Responses API	RESTful API, webhook support, extensive integration options

✅ Pros & Cons Analysis

OpenAI AgentKit

Advantages

Rapid Development: Build functional agents in hours rather than months with visual tools
Seamless Integration: Deep integration with OpenAI ecosystem and GPT models
Enterprise Security: Built-in guardrails, PII protection, and OpenAI security standards
Managed Infrastructure: No DevOps burden, fully managed by OpenAI
Cutting-Edge Models: Immediate access to latest GPT models and features
Live Demo Success: Proven capability (2 agents in <8 minutes)
Unified Toolkit: All necessary tools in one platform
Evaluation Tools: Comprehensive testing and optimization features

Limitations

Vendor Lock-in: Exclusively tied to OpenAI ecosystem
Limited Model Choice: Cannot use Claude, Llama, or other non-OpenAI models
New Platform: Just launched (Oct 2025), limited production track record
Beta Features: Key components still in beta (Agent Builder, Connector Registry)
No Data Sovereignty: Data managed by OpenAI, not self-hostable
Closed Source: Cannot inspect or modify underlying code
Pricing Uncertainty: Costs tied to OpenAI API pricing model
Limited Customization: Constrained by platform design decisions

Dify

Advantages

Open Source Freedom: Full code access, unlimited customization, no vendor lock-in
Multi-Provider Support: Use any LLM – GPT, Claude, Llama, Mistral, or self-hosted models
Data Sovereignty: Complete control over data, self-hosting options
Extensive RAG: Comprehensive document processing and retrieval capabilities
Large Community: 180,000+ developers, active development, extensive resources
Production Proven: Battle-tested since 2023, used by major companies like Volvo
Flexible Deployment: Cloud, self-hosted, or hybrid options
Cost Control: Use cheaper models or self-hosted options, transparent pricing
No Vendor Dependencies: Switch providers or models without platform changes

Limitations

DevOps Responsibility: Self-hosting requires infrastructure management
Learning Curve: More complex than managed solutions for beginners
No Native OpenAI Features: Latest OpenAI-specific features may lag
Security Setup: Must configure own security measures for self-hosted
Community Support: Relies on community vs dedicated support team
Integration Effort: May require more work to integrate custom tools
Scalability Management: Need to handle scaling for high-traffic scenarios

💡 Use Cases & Applications

OpenAI AgentKit – Ideal Use Cases

🏢 Enterprise Rapid Prototyping

Large organizations already invested in OpenAI wanting to quickly deploy AI agents across multiple departments without heavy technical overhead.

🚀 Startup MVPs

Startups needing to build and iterate on AI-powered products rapidly with minimal infrastructure investment and maximum speed to market.

💼 Business Process Automation

Companies automating internal workflows like procurement, customer support, or data analysis using OpenAI’s latest models.

🔬 Research & Development

Teams exploring cutting-edge AI capabilities with OpenAI’s latest models and wanting managed infrastructure for experiments.

Dify – Ideal Use Cases

🏦 Regulated Industries

Banking, healthcare, or government organizations requiring full data sovereignty, self-hosting, and complete audit trails.

🌐 Multi-Model Applications

Projects needing to leverage multiple LLM providers for cost optimization, feature diversity, or redundancy.

🛠️ Custom AI Solutions

Development teams building highly customized AI applications requiring deep integration with existing systems and workflows.

📚 Knowledge Management

Organizations building comprehensive RAG systems with complex document processing, vector search, and metadata filtering needs.

🎓 Educational & Research

Academic institutions and researchers needing transparent, customizable AI systems with full control over model selection and data.

🌍 Global Operations

International companies needing to deploy AI across multiple regions with varying data residency requirements.

💰 Pricing Comparison

OpenAI AgentKit Pricing

Model: Included with standard OpenAI API pricing. You pay for:

API calls to GPT models (token-based pricing)
Standard OpenAI usage fees apply
Enterprise features may have additional costs
Connector Registry requires Global Admin Console (available for Enterprise/Edu)

Advantage: No separate platform fee, but tied to OpenAI’s pricing

Consideration: Costs can scale significantly with high usage; no control over rate changes

Dify Pricing

Sandbox (Free):

200 OpenAI calls included
Core features access
Ideal for testing and small projects

Professional ($59/month):

For independent developers & small teams
Production AI applications
Increased resources and team collaboration

Team ($159/month):

Medium-sized teams
Higher throughput requirements
Advanced collaboration features

Enterprise (Custom):

Custom deployment options
Dedicated support
SLA guarantees
On-premise or private cloud hosting

Self-Hosted (Free):

Deploy on your own infrastructure at no platform cost
Only pay for your chosen LLM provider (can use cheaper options)
Complete cost control

🎯 Decision Framework: Which Platform Should You Choose?

Choose OpenAI AgentKit If:

You’re already heavily invested in the OpenAI ecosystem
You want the fastest possible time-to-market with minimal setup
Your use case doesn’t require data to stay on-premise
You prefer managed infrastructure over self-hosting
You need the latest GPT models immediately upon release
Your team lacks DevOps resources for infrastructure management
Budget allows for OpenAI’s premium pricing model
You value tight integration over flexibility
Compliance allows cloud-based AI processing
You’re comfortable with platform limitations for ease of use

Choose Dify If:

You need to use multiple LLM providers or specific models
Data sovereignty and privacy are critical requirements
You want complete control over your AI infrastructure
Your organization requires self-hosted solutions
Cost optimization through model flexibility is important
You have DevOps capability for self-hosting
You need extensive RAG and document processing capabilities
Open-source transparency is a requirement
You want to avoid vendor lock-in
Your use case requires deep customization
You’re in a regulated industry (banking, healthcare, government)
You prefer community-driven development

🔮 Future Outlook & Roadmap

OpenAI AgentKit Roadmap

OpenAI plans to add standalone Workflows API and agent deployment options to ChatGPT. Expect rapid iteration and new features as the platform matures beyond beta stage.

Dify Development

Active open-source development with regular releases. Community-driven feature requests and transparent roadmap on GitHub. Continuous improvements to RAG, workflows, and integrations.

Market Competition

Both platforms face competition from LangChain, n8n, Zapier Central, and others. The AI agent space is rapidly evolving with new players entering regularly.

Convergence Trends

Expect features to converge over time as both platforms mature. Visual builders, multi-agent orchestration, and evaluation tools are becoming industry standards.

🎓 Final Recommendation

For most organizations: The choice depends on your priorities. If you value speed, simplicity, and are committed to OpenAI, AgentKit offers the fastest path to production agents. If you need flexibility, data control, and multi-provider support, Dify provides superior long-term value despite requiring more initial setup.

Hybrid Approach: Some organizations use AgentKit for rapid prototyping and Dify for production deployments where data sovereignty and model flexibility matter. This combines the speed of AgentKit with the control of Dify.

Last Updated: October 2025 | Based on OpenAI DevDay 2025 announcements

Sources: Official OpenAI documentation, Dify GitHub repository, TechCrunch, VentureBeat, Medium technical analyses

This comparison is for informational purposes. Features and pricing subject to change. Always consult official documentation for the most current information.

Building Intelligent AI Agents with OpenAI: From Raw API to Official Agents SDK

Introduction

Artificial Intelligence agents are revolutionizing how we interact with technology. Unlike traditional chatbots that simply respond to queries, AI agents can understand context, make decisions, and use tools to accomplish complex tasks autonomously. This project demonstrates how to build progressively sophisticated AI agents using both the OpenAI API and the official OpenAI Agents SDK.

Whether you’re a beginner exploring AI development or an experienced developer looking to integrate intelligent agents into your applications, this sample project provides practical, hands-on examples comparing two approaches: custom implementation using raw OpenAI API and using the official Agents SDK.

What is an AI Agent?

An AI agent is an autonomous system powered by a language model that can:

Understand natural language instructions
Make intelligent decisions about which tools to use
Execute functions to interact with external systems
Reason about results and provide meaningful responses
Collaborate with other agents to solve complex problems

Think of it as giving your AI assistant a toolbox. Instead of just talking, it can now check the weather, perform calculations, search databases, and much more.

Project Overview

The OpenAI AgentKit Sample Project demonstrates six levels of AI agent sophistication across two implementation approaches:

OpenAI API Approach (Custom Implementation)

1. Basic Agent

A foundational implementation showing how to set up OpenAI’s Chat Completions API.

What you’ll learn:

Setting up the OpenAI client
Configuring system and user messages
Managing model parameters (temperature, tokens)
Handling API responses

2. Agent with Tools

Introduces function calling where the agent decides when and how to use specific tools.

Available Tools:

Weather Tool: Retrieves current weather information
Calculator Tool: Performs mathematical operations
Time Tool: Gets current date and time across timezones

3. Advanced Agent

Production-ready example with sophisticated features including detailed logging, error handling, and multiple complex tools.

Enhanced Capabilities:

Wikipedia search integration
Sentiment analysis
Timezone-aware time retrieval
Comprehensive error handling
Performance statistics and logging

OpenAI Agents SDK Approach (Official Framework)

4. SDK Basic Agent

Simple agent using the official OpenAI Agents SDK with automatic agent loop and simplified API.

Key Features:

Uses Agent and run from @openai/agents
Automatic conversation management
Clean, minimal code

5. SDK Agent with Tools

Agent with tools using proper SDK conventions and automatic schema generation.

Tools:

Weather lookup with Zod validation
Mathematical calculations
Time zone support

Key Features:

Tools defined with tool() helper
Zod-powered parameter validation
Automatic schema generation from TypeScript types

6. SDK Multi-Agent System

Sophisticated multi-agent system with specialized agents and handoffs.

Agents:

WeatherExpert: Handles weather queries
MathExpert: Performs calculations
KnowledgeExpert: Searches knowledge base
Coordinator: Routes requests to specialists

Technology Stack

OpenAI API
GPT-4o-mini model for intelligent responses
@openai/agents
Official OpenAI Agents SDK
Zod
Runtime type validation and schema generation
Node.js
Runtime environment (22+ required for SDK)
Express.js
Web server framework
dotenv
Environment variable management

Getting Started

Prerequisites

Node.js 22 or higher (required for OpenAI Agents SDK)
OpenAI API key (get one at https://platform.openai.com/api-keys)

Installation

1. Clone or download the project

cd openai-agentkit-sample

2. Install dependencies

npm install

This will install:

openai – Raw OpenAI API client
@openai/agents – Official Agents SDK
zod – Schema validation
Other dependencies

3. Configure environment variables

cp .env.example .env

Edit .env and add your OpenAI API key:

OPENAI_API_KEY=sk-your-actual-api-key-here

Running the Examples

Start the web server:

npm start

Open http://localhost:3000 in your browser

Run OpenAI API examples:

npm run example:basic      # Basic agent
npm run example:tools      # Agent with tools
npm run example:advanced   # Advanced agent

Run OpenAI Agents SDK examples:

npm run example:sdk-basic  # SDK basic agent
npm run example:sdk-tools  # SDK with tools
npm run example:sdk-multi  # Multi-agent system

Comparing the Two Approaches

OpenAI API (Custom Implementation)

Pros:

Full control over every aspect
Deep understanding of agent mechanics
Maximum flexibility
No framework constraints

Cons:

More code to write and maintain
Manual agent loop implementation
Manual tool schema definition
More error-prone

Example – Tool Definition (Raw API):

const weatherTool = {
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get the current weather in a given location',
    parameters: {
      type: 'object',
      properties: {
        location: {
          type: 'string',
          description: 'The city and country',
        },
        unit: {
          type: 'string',
          enum: ['celsius', 'fahrenheit'],
        },
      },
      required: ['location'],
    },
  },
};
// Manual tool execution
function executeFunction(functionName, args) {
  switch (functionName) {
    case 'get_weather':
      return getWeather(args.location, args.unit);
    // ... more cases
  }
}

OpenAI Agents SDK (Official Framework)

Pros:

Less code, faster development
Automatic agent loop
Automatic schema generation from Zod
Built-in handoffs for multi-agent systems
Production-ready patterns
Type-safe with TypeScript

Cons:

Less control over internals
Framework learning curve
Tied to SDK conventions
Node.js 22+ requirement

Example – Tool Definition (Agents SDK):

import { tool } from '@openai/agents';
import { z } from 'zod';
const getWeatherTool = tool({
  name: 'get_weather',
  description: 'Get the current weather for a given location',
  parameters: z.object({
    location: z.string().describe('The city and country'),
    unit: z.enum(['celsius', 'fahrenheit']).optional().default('celsius'),
  }),
  async execute({ location, unit }) {
    // Tool implementation
    return JSON.stringify({ temperature: 22, condition: 'Sunny' });
  },
});
// Automatic execution - no switch statement needed!
const agent = new Agent({
  tools: [getWeatherTool],
});

Key Concepts

Function Calling / Tool Usage

Both approaches support function calling, where the AI model can “call” functions you define:

Define tool: Describe function, parameters, and purpose
Model decides: Model automatically decides when to use tools
Execute tool: Your code executes the function
Return result: Send result back to model
Final response: Model uses result to create answer

OpenAI Agents SDK Advantages

The Agents SDK provides several powerful features:

Automatic Schema Generation:

// SDK automatically generates JSON schema from Zod!
z.object({
  city: z.string(),
  unit: z.enum(['celsius', 'fahrenheit']).optional(),
})

Agent Handoffs:

const coordinator = new Agent({
  handoffs: [weatherAgent, mathAgent, knowledgeAgent],
});
// Coordinator can automatically route to specialists

Built-in Agent Loop:

// SDK handles the entire conversation loop
const result = await run(agent, "What's the weather in Hanoi?");
console.log(result.finalOutput);

Practical Use Cases

Customer Service Automation

Answer questions using knowledge bases
Check order status
Process refunds
Escalate to human agents
Route to specialized agents

Personal Assistant Applications

Schedule management
Email drafting
Research and information gathering
Task automation
Multi-task coordination

Data Analysis Tools

Query databases
Generate reports
Perform calculations
Visualize insights
Collaborate across data sources

Best Practices

1. Clear Tool Descriptions

Make function descriptions detailed and specific:

Good:
description: 'Get the current weather including temperature, conditions, and humidity for a specific city and country'
Bad:
description: 'Get weather'

2. Use Zod for Validation (SDK)

parameters: z.object({
  email: z.string().email(),
  age: z.number().min(0).max(120),
  role: z.enum(['admin', 'user', 'guest']),
})

3. Error Handling

Always implement comprehensive error handling:

async execute({ city }) {
  try {
    const result = await weatherAPI.get(city);
    return JSON.stringify(result);
  } catch (error) {
    return JSON.stringify({ error: error.message });
  }
}

4. Tool Modularity

Create small, focused tools rather than monolithic ones:

// Good - specific tools
const getWeatherTool = tool({...});
const getForecastTool = tool({...});
// Bad - one giant tool
const weatherAndForecastAndHistoryTool = tool({...});

Multi-Agent Patterns

The Agents SDK excels at multi-agent workflows:

Specialist Pattern

const weatherExpert = new Agent({
  name: 'WeatherExpert',
  tools: [getWeatherTool],
});
const mathExpert = new Agent({
  name: 'MathExpert',
  tools: [calculateTool],
});
const coordinator = new Agent({
  handoffs: [weatherExpert, mathExpert],
});

Hierarchical Delegation

Coordinator receives user request
Analyzes which specialist is needed
Hands off to appropriate agent
Aggregates results
Returns unified response

API Endpoints

The project includes a web server with both approaches:

Raw API:

POST /api/chat/basic – Basic chat completion
POST /api/chat/with-tools – Manual tool handling

Agents SDK:

POST /api/chat/agents-sdk – SDK-powered agent with tools

When to Use Which Approach?

Use OpenAI API (Custom Implementation) When:

You need full control and customization
Learning how agents work at a low level
Implementing highly custom logic
Working with existing codebases
Framework constraints are a concern

Use OpenAI Agents SDK When:

Building production applications quickly
Need multi-agent workflows
Want type-safe tool definitions
Prefer less boilerplate code
Following best practices matters
Team collaboration is important

Performance Considerations

Model Selection: GPT-4o-mini offers great balance of capability and cost
Caching: Consider caching frequent queries
Async Operations: Use Promise.all() for parallel tool execution
Response Streaming: Implement for better UX
Rate Limiting: Monitor and manage API rate limits

Troubleshooting

Issue: “Invalid API Key”

Verify .env file contains correct API key
Check key is active in OpenAI dashboard

Issue: Tools Not Being Called

Ensure tool descriptions are clear and specific
Try more explicit user prompts
Check parameter schemas are correctly formatted

Issue: “Unsupported tool type”

Use tool() helper with Agents SDK
Ensure Zod schemas are properly defined
Check you’re importing from @openai/agents

Resources

Comparison Table

Feature	Raw OpenAI API	Agents SDK
Code Lines	~200 for basic agent with tools	~50 for same functionality
Schema Definition	Manual JSON	Automatic from Zod
Agent Loop	Manual implementation	Built-in
Type Safety	Limited	Full TypeScript support
Multi-Agent	Manual implementation	Built-in handoffs
Learning Curve	Steep	Moderate
Flexibility	Maximum	High
Production Ready	Requires work	Out-of-the-box
Node.js Requirement	18+	22+

Conclusion

This project demonstrates two powerful approaches to building AI agents:

Raw OpenAI API: Provides deep understanding and maximum control. Perfect for learning and custom implementations.
OpenAI Agents SDK: Offers productivity, type safety, and production-ready patterns. Ideal for building real applications quickly.

Both approaches have their place. Start with the SDK for production work, but understanding the raw API approach gives you insights into how agents actually work.

Next Steps

Experiment: Run all six examples
Compare: Notice the differences in code complexity
Customize: Create your own tools
Integrate: Connect real APIs
Deploy: Move to production with proper error handling
Scale: Implement multi-agent systems for complex tasks

Contributing

Contributions, suggestions, and improvements are welcome! Feel free to:

Report issues
Submit pull requests
Share your custom tools
Suggest new examples

Demo

Github : https://github.com/cuongdvscuti/openai-agentkit-scuti

License

MIT License – Feel free to use this project for learning, development, or commercial purposes.

Ready to build your own AI agents?
Clone the repository, follow the setup instructions, and start with whichever approach fits your needs. The future of intelligent automation is in your hands!

OpenAI DevDay 2025: Cách Mạng Hóa Phát Triển Ứng Dụng AI

Posted on October 9, 2025October 13, 2025 by Cuong Dinh

OpenAI DevDay 2025: Những Đột Phá Mới Trong Thế Giới AI

🚀 OpenAI DevDay 2025: Cách Mạng Hóa Phát Triển Ứng Dụng AI

📅 Sự kiện: 6 tháng 10, 2025 tại San Francisco

OpenAI DevDay 2025 đã mang đến những đột phá công nghệ AI ấn tượng với hơn 1,500 nhà phát triển tham dự và hàng chục nghìn người theo dõi trực tuyến. CEO Sam Altman đã công bố loạt tính năng mới làm thay đổi cách chúng ta xây dựng và triển khai ứng dụng AI.

800M+

Người dùng ChatGPT hàng tuần

4M+

Nhà phát triển

Tokens/phút qua API

🎯 I. Tính Năng và Dịch Vụ Mới

1. ChatGPT Apps SDK – Ứng Dụng Tương Tác Trong ChatGPT

Apps in ChatGPT: Người dùng có thể chat trực tiếp với ứng dụng ngay trong giao diện ChatGPT mà không cần chuyển tab hay mở ứng dụng khác
Apps SDK: Công cụ phát triển mới dựa trên Model Context Protocol (MCP) – một chuẩn mở cho phép nhà phát triển xây dựng ứng dụng tương tác ngay trong ChatGPT
Đối tác ra mắt: Coursera, Canva, Zillow, Figma, Spotify, Expedia, Booking.com
Tính năng nổi bật: ChatGPT tự động gợi ý ứng dụng phù hợp trong cuộc trò chuyện, ví dụ khi bạn nói về lập kế hoạch du lịch, nó sẽ gợi ý Expedia
Monetization: Sắp có giao thức thương mại điện tử mới cho phép thanh toán ngay trong ChatGPT

2. AgentKit – Bộ Công Cụ Xây Dựng AI Agent Chuyên Nghiệp

Agent Builder: Giao diện kéo thả trực quan để thiết kế workflow cho AI agent mà không cần code phức tạp
ChatKit: Giao diện chat có thể tích hợp vào ứng dụng hoặc website của bạn, hỗ trợ streaming responses, quản lý threads, hiển thị quá trình suy nghĩ của model
Connector Registry: Bảng điều khiển tập trung để quản lý kết nối dữ liệu với Dropbox, Google Drive, SharePoint, Microsoft Teams
Guardrails: Lớp bảo mật mã nguồn mở giúp bảo vệ agent khỏi hành vi không mong muốn, có thể che PII, phát hiện jailbreaks
Enhanced Evals: Công cụ đánh giá nâng cao với datasets, trace grading, tối ưu prompt tự động, hỗ trợ các model của bên thứ ba
Demo trực tiếp: Tại sự kiện, một kỹ sư OpenAI đã xây dựng một AI agent hoàn chỉnh chỉ trong 8 phút

3. GPT-5 Pro – Model AI Thông Minh Nhất Trong API

Khả năng suy luận: Đạt trình độ PhD trong các lĩnh vực khoa học, có khả năng suy luận sâu cho các tác vụ phức tạp
Độ chính xác cao: Đặc biệt phù hợp cho tài chính, pháp lý, y tế – các lĩnh vực đòi hỏi độ chính xác cao
Reasoning effort: Có 4 mức độ suy luận (minimal, low, medium, high) để cân bằng giữa tốc độ và chất lượng
Context window: 272,000 tokens cho input, 128,000 tokens cho output
Multimodal: Hỗ trợ text và image cho input, text cho output

4. Codex – AI Agent Lập Trình Chính Thức Ra Mắt

GPT-5 Codex Model: Phiên bản GPT-5 được huấn luyện đặc biệt cho coding và agentic workflows
Tích hợp Slack: Lập trình viên có thể giao việc hoặc đặt câu hỏi trực tiếp từ Slack channels
Codex SDK: Cho phép tự động hóa code review, refactoring, automated testing
Thống kê ấn tượng:
- Số lượng tin nhắn tăng 10x kể từ khi ra mắt tháng 8/2025
- Đã xử lý hơn 40 trillion tokens
- Nội bộ OpenAI: 70% pull requests nhiều hơn mỗi tuần

5. Sora 2 – Video Generation Trong API

Kiểm soát nâng cao: Có thể chỉ định độ dài, tỷ lệ khung hình, độ phân giải
Audio đồng bộ: Tạo video với âm thanh đầy đủ, âm thanh môi trường, hiệu ứng được đồng bộ với hình ảnh
Remix video: Cho phép chỉnh sửa và remix video đã tạo
Giá cả:
- Sora-2: $1.00 cho video 10 giây độ phân giải tiêu chuẩn
- Sora-2-pro: $5.00 cho video 10 giây độ phân giải cao

6. Mini Models – Tiết Kiệm Chi Phí

Model	Chức năng	Tiết kiệm
gpt-realtime-mini	Voice interaction real-time	70% rẻ hơn large model
gpt-image-1-mini	Tạo hình ảnh	80% rẻ hơn large model

7. Giá Cả GPT-5 Cạnh Tranh

Loại	Input	Output
GPT-5	$1.25/1M tokens	$10/1M tokens
So với Claude Opus 4.1	$15/1M tokens	$75/1M tokens

✨ II. Những Điểm Nổi Bật Đáng Chú Ý

🎯 Dễ Dàng Hơn Bao Giờ Hết

Dân chủ hóa phát triển phần mềm: Sam Altman đã kể câu chuyện về một cụ ông 89 tuổi người Nhật tự học lập trình với ChatGPT và đã tạo ra 11 ứng dụng iPhone dành cho người cao tuổi. Đây là minh chứng cho tầm nhìn “bất kỳ ai có ý tưởng đều có thể xây dựng ứng dụng cho chính mình”.

⚡ Tốc Độ Phát Triển Chưa Từng Có

“Phần mềm từng mất hàng tháng hoặc hàng năm để xây dựng. Giờ đây bạn thấy nó có thể được tạo ra chỉ trong vài phút với AI. Bạn không cần một đội ngũ lớn. Bạn chỉ cần một ý tưởng hay và có thể biến nó thành hiện thực nhanh hơn bao giờ hết.” – Sam Altman

🔒 Bảo Mật và Quản Trị Doanh Nghiệp

Content Shield: OpenAI cung cấp bảo vệ bản quyền cho doanh nghiệp
Global Admin Console: Quản lý domains, SSO, nhiều API organizations
Guardrails: Bảo vệ dữ liệu nhạy cảm và ngăn chặn hành vi độc hại

🤝 Hợp Tác Chiến Lược

AMD Partnership: OpenAI công bố hợp tác chiến lược với AMD để triển khai 6 gigawatts GPU Instinct của AMD trong nhiều năm tới, với warrant lên đến 160 triệu cổ phiếu AMD.

🌟 III. Tác Động và Ý Nghĩa

1. Đối Với Nhà Phát Triển

Giảm thời gian phát triển: Từ nhiều tháng xuống còn vài phút nhờ các công cụ như AgentKit và Codex
Chi phí thấp hơn: GPT-5 rẻ hơn 50% so với GPT-4o ở input, các mini models tiết kiệm 70-80%
Phân phối rộng rãi: Tiếp cận ngay 800 triệu người dùng ChatGPT qua Apps SDK
Developer lock-in thấp hơn: MCP là chuẩn mở, giúp dễ dàng chuyển đổi giữa các nền tảng

2. Đối Với Doanh Nghiệp

Tăng năng suất: AI agents có thể tự động hóa quy trình phức tạp từ customer support đến sales operations
Giảm headcount: Đội nhỏ có thể làm việc của đội lớn nhờ AI, tiết kiệm chi phí nhân sự
Cạnh tranh công bằng: Startup có thể cạnh tranh với đại gia nhờ chi phí thấp và công cụ dễ tiếp cận
Quản trị và bảo mật: Connector Registry và Guardrails giúp quản lý dữ liệu tập trung và đảm bảo compliance

3. Đối Với Người Dùng Cuối

Trải nghiệm liền mạch: Không cần chuyển đổi giữa nhiều ứng dụng, mọi thứ trong một giao diện ChatGPT
Cá nhân hóa cao: AI agents có thể học và thích nghi với nhu cầu cá nhân
Sáng tạo nội dung dễ dàng: Sora 2 cho phép tạo video chất lượng cao chỉ với mô tả text
Học tập và phát triển: Tích hợp Coursera giúp học tập cá nhân hóa ngay trong ChatGPT

4. Tác Động Ngành

Cuộc chiến giá cả AI: Với giá GPT-5 Pro rẻ hơn đáng kể so với Claude Opus 4.1 (rẻ hơn 92% ở input, 86% ở output), OpenAI đang tạo áp lực giá lên toàn ngành.

Platform Play: ChatGPT không còn là chatbot đơn thuần mà đang trở thành một nền tảng – giống như App Store của Apple. Điều này có thể thay đổi cách phân phối ứng dụng AI.

Democratization of AI: Với công cụ visual như Agent Builder, người không biết code cũng có thể tạo AI agents phức tạp, mở rộng đáng kể cộng đồng AI builders.

Chuyển dịch từ Answers đến Actions: ChatGPT đang chuyển từ trả lời câu hỏi sang thực hiện hành động, đánh dấu bước tiến mới trong phát triển AI.

5. Xu Hướng Tương Lai

AI như một Operating System: ChatGPT đang tiến đến việc trở thành một hệ điều hành AI – nơi tập trung apps, agents và users
Agentic AI: Từ việc chỉ trả lời câu hỏi, AI giờ có thể nhận và hoàn thành các tác vụ phức tạp end-to-end
Multimodal Everything: Tích hợp text, image, audio, video trong một platform duy nhất
Device Ecosystem: Với sự tham gia của Jony Ive và thương vụ mua io ($6.4B), OpenAI đang hướng đến việc tạo ra thiết bị AI riêng

🚀 Kết Luận

OpenAI DevDay 2025 không chỉ là sự kiện công bố sản phẩm mà là tuyên ngôn về tương lai của phát triển phần mềm. Với Apps SDK, AgentKit, GPT-5 Pro, và Sora 2, OpenAI đang xây dựng một hệ sinh thái AI toàn diện – từ nền tảng phát triển cho đến trải nghiệm người dùng cuối.

Thông điệp chính: “Bất kỳ ai có ý tưởng tốt đều có thể biến nó thành hiện thực nhanh hơn bao giờ hết”. Đây không chỉ là slogan marketing mà là tầm nhìn về một thế giới mà AI dân chủ hóa việc sáng tạo phần mềm.

Với 800 triệu người dùng, 4 triệu nhà phát triển và 6 tỷ tokens được xử lý mỗi phút, OpenAI không chỉ dẫn đầu cuộc đua AI mà đang định hình lại cách chúng ta tương tác với công nghệ.

Nguồn tham khảo:

OpenAI DevDay Official: openai.com/devday/
Sam Altman Keynote Livestream
OpenAI Blog và Documentation
CNBC, TechCrunch Coverage

Agentic Web: Weaving the Next Web with AI Agents

Posted on September 20, 2025 by Cuong Dinh

Bài báo “Agentic Web: Weaving the Next Web with AI Agents” được công bố trên arXiv (7/2025) (arXiv+1), do một nhóm tác giả nghiên cứu về trí tuệ nhân tạo và Web viết.

Mục tiêu chính của bài báo là:

Định nghĩa khái niệm Agentic Web – tức một thế hệ Web mới, nơi các AI agents không chỉ là công cụ trả lời câu hỏi, mà có khả năng hành động tự chủ, phối hợp, và thực thi nhiệm vụ đa bước thay cho con người.
Đưa ra khung lý thuyết ba chiều (trí tuệ, tương tác, kinh tế) để phân tích và định hướng phát triển Web trong kỷ nguyên AI agent.
Khảo sát các xu hướng công nghệ hiện tại, từ mô hình ngôn ngữ lớn (LLMs), hệ thống multi-agent, đến các giao thức mới (MCP, A2A), đồng thời thảo luận các thách thức kỹ thuật, kinh tế, đạo đức, và pháp lý.
Định hình tầm nhìn tương lai của Web, từ một không gian thông tin sang một không gian “tác nhân” – nơi các agent tự động đàm phán, phối hợp, và tương tác để phục vụ nhu cầu con người.

Điểm đáng chú ý là bài báo không chỉ mang tính lý thuyết mà còn gắn với các tiến triển thực tế:

Sự xuất hiện của AI agent frameworks (AutoGPT, LangChain, CrewAI, v.v.)
Những giao thức chuẩn hóa đang được phát triển (như Model Context Protocol)
Xu hướng các công ty lớn (OpenAI, Anthropic, Google, Meta) đều đang thử nghiệm agent ecosystems.

Nói cách khác, bài báo vừa mang tính khái niệm (định nghĩa, khung phân tích) vừa mang tính dự báo (visionary), đặt nền móng cho việc nghiên cứu và triển khai Web thế hệ mới dựa trên agent.

Động cơ & Định nghĩa

Tác giả bắt đầu bằng việc nhìn lại quá trình phát triển của Web: từ Web PC (static, tìm kiếm), tới Web di động (UGC, hệ thống gợi ý/recommender), và nay đang tiến tới một kỷ nguyên mới là Agentic Web – Web đại diện cho các tác nhân AI (AI agents) hoạt động tự chủ, mục tiêu rõ ràng, thực hiện các tác vụ đa bước, phối hợp giữa các tác nhân để phục vụ người dùng. arXiv+1
Định nghĩa: Agentic Web là hệ sinh thái phân tán, tương tác, nơi các tác nhân phần mềm (thường sử dụng các mô hình ngôn ngữ lớn) đóng chức năng trung gian, có khả năng lập kế hoạch, phối hợp, thực thi các tác vụ có mục tiêu do người dùng đặt ra. Web trở nên năng động hơn, giữa các tác nhân với nhau tương tác, chứ không chỉ người dùng -> nội dung. arXiv+1

Ba chiều khung khái niệm

Tác giả đưa ra một mô hình ba chiều (dimensions) để hiểu và thiết kế Agentic Web:

Trí tuệ (Intelligence): các khả năng nhận thức, suy luận, lập kế hoạch, học hỏi, sử dụng kiến thức đã học vs dữ liệu thời gian thực, tương tác với các công cụ, API. arXiv+1
Tương tác (Interaction): cách thức các tác nhân tương tác với nhau, với người dùng, với dịch vụ, định dạng giao tiếp, giao diện máy-máy, quản lý cuộc hội thoại dài hạn, phân chia công việc giữa các agent. arXiv
Kinh tế (Economics): cách thức trao đổi giá trị giữa người dùng, hệ thống, dịch vụ, giữa các agent; mô hình kinh doanh mới; nền kinh tế “agent attention economy” nơi các dịch vụ cạnh tranh để được các agent “triệu hồi”, metrics mới thay thế metrics truyền thống như click, lượt xem. arXiv

Những chuyển đổi kỹ thuật & kiến trúc

Thay từ truy vấn đơn giản + tìm kiếm sang tìm kiếm thông minh do agent khởi xướng, truy cập thông tin, công cụ theo ý định người dùng. arXiv+1
Từ hệ thống gợi ý cá nhân hóa sang lập kế hoạch (planning), phối hợp giữa nhiều agent để thực thi task phức tạp. arXiv+1
Từ agent đơn lẻ sang hệ multi-agent, cần có các protocol giao tiếp, chuẩn hoá APIs, đạo đức trong phối hợp agent. arXiv+1
Kiến trúc hệ thống: agent discovery (tìm agent có năng lực phù hợp), orchestration (điều phối agent), communication protocols như MCP (Model Context Protocol) hay A2A (Agent-to-Agent) được đề cập. arXiv

Ứng dụng, rủi ro, quản trị & vấn đề mở

Ứng dụng: đặt dịch vụ giao dịch tự động (ví dụ: đặt vé máy bay, lịch trình du lịch), khám phá thông tin sâu (deep research), trợ lý kiến thức trong doanh nghiệp, agent làm người trung gian giữa người dùng và các dịch vụ. arXiv
Rủi ro: an ninh, bảo mật, sai lệch (bias), agent làm việc không đúng mục đích, kiểm soát & tương tác giữa người và agent, đảm bảo alignment (mục tiêu AI vs mục tiêu người dùng), tin cậy giữa các agent. arXiv
Các vấn đề mở: học & thích ứng động (dynamic learning), hệ thống đa agent đảm bảo phối hợp tốt & tin cậy, giao diện người-agent (human-agent interface), rủi ro hệ thống quy mô lớn, tác động xã hội kinh tế. arXiv

Nhận định & Ý kiến

Dưới đây là quan điểm của mình về bài báo—những điểm mạnh, những khó khăn, và liệu nó có thực sự khả thi & đáng quan tâm.

Các điểm mạnh

Khái niệm rõ ràng, kịp thời: Xu hướng AI agents đang phát triển rất nhanh, nhiều sản phẩm thực tế đã bắt đầu dùng agent tự chủ hơn, vì vậy bài báo nắm bắt rất đúng xu hướng. Việc định nghĩa “Agentic Web” giúp tạo khung để bàn luận chuyên sâu.
Phân tích đa chiều: Ba chiều trí tuệ, tương tác, kinh tế là cách tiếp cận toàn diện — không chỉ về công nghệ mà cả về kinh tế, mô hình kinh doanh, xã hội. Điều này giúp tránh việc chỉ tập trung vào “agent làm gì” mà bỏ qua “ai trả tiền”, “ai chịu trách nhiệm”, “liệu người dùng có tin tưởng” v.v.
Đề xuất kiến trúc & protocol thực tế: Việc nhắc đến MCP, A2A, cần chuẩn hóa interfaces… là những điều cần thiết nếu Agentic Web muốn được triển khai quy mô rộng. Những ví dụ về ứng dụng thực tế giúp minh họa rõ các lợi ích.
Đánh giá rủi ro & vấn đề mở: Không lờ đi các thách thức — như alignment, bảo mật, tin cậy, trách nhiệm — điều này cho thấy tác giả có quan sát sâu sắc, không chỉ hô hào lý tưởng.

Các hạn chế / những vấn đề cần cân nhắc

Yêu cầu hạ tầng rất lớn & phức tạp: Để Agentic Web hoạt động tốt, cần chuẩn hóa protocol, APIs, dịch vụ, quản lý danh mục agent, tin cậy giữa các tác nhân, cơ chế định danh, bảo mật. Ở nhiều nơi hiện nay, hạ tầng Web, dịch vụ vẫn chưa chuẩn mực, do vậy việc triển khai thực tế có thể gặp rất nhiều rào cản.
Vấn đề đạo đức, pháp lý, trách nhiệm: Khi agent thực thi hành động thay người dùng (ví dụ: đặt vé, thanh toán, tương tác với các dịch vụ khác), nếu có sự cố xảy ra—ai chịu trách nhiệm? Ai đảm bảo quyền lợi người dùng? Rất nhiều câu hỏi chưa được giải đáp đủ, đặc biệt trong các vùng pháp luật khác nhau.
Chi phí & kinh tế chưa rõ: Mô hình “agent attention economy” rất hấp dẫn, nhưng để triển khai được nó, ai sẽ chịu chi phí phát triển, vận hành, duy trì? Dịch vụ nào có lợi? Có nguy cơ các “agent” nhỏ, nhà phát triển nhỏ bị lấn át bởi các tập đoàn lớn có nguồn lực mạnh.
Tính chấp nhận của người dùng: Người dùng có thực sự muốn giao quyền nhiều cho agent? Có những việc người dùng muốn kiểm soát chi tiết. Việc tin tưởng AI agent hoàn toàn, hay tin vào các kết quả agent trả về mà không kiểm tra, là rào cản lớn.

Liệu Agentic Web có khả thi?

Mình nghĩ là có, nhưng không phải trong ngắn hạn trên phạm vi rộng. Agentic Web sẽ phát triển dần dần, từng phần:

Những tác vụ tự động hóa nhiều bước nhỏ (đặt chỗ, sắp xếp lịch, tìm thông tin) sẽ được agent hóa trước.
Những dịch vụ lớn, yêu cầu tính tin cậy, đạo đức cao (ví dụ y tế, pháp lý) sẽ bị chậm hơn vì rủi ro lớn.
Cần sự hợp tác giữa các bên: công nghệ, nhà làm luật, doanh nghiệp, người dùng để xây khung quản trị, chuẩn kỹ thuật, bảo vệ người sử dụng.

Tác động nếu được hiện thực hoá tốt

Nâng cao hiệu suất sử dụng Web: người dùng sẽ tiết kiệm thời gian, công sức, có thể giao cho agent làm các công việc lặp đi lặp lại.
Thay đổi mô hình kinh doanh của các công ty công nghệ: ai sở hữu agent registry, ai được chọn/recommended bởi agent, ai được trả công khi agent “invoke” dịch vụ…
Có thể làm tăng bất bình đẳng nếu chỉ những tổ chức lớn có tài nguyên triển khai agent mạnh mới thắng được — các dịch vụ nhỏ có thể bị loại bỏ khỏi “attention” của agent nếu không có khả năng cạnh tranh.

Kết luận

Bài báo là một đóng góp quan trọng, làm rõ hướng phát triển mới cho Web trong kỷ nguyên AI. Nó vừa có giá trị lý thuyết (khung khái niệm, phân tích) vừa có tính định hướng thực tiễn (ứng dụng, rủi ro). Mình nghĩ việc Agentic Web phát triển là chỉ là vấn đề thời gian nếu các công nghệ liên quan (LLMs, multi-agent, protocol chuẩn, bảo mật, luật pháp) tiếp tục tiến mạnh.

Claude Code Spec Workflow: Hướng Dẫn Thực Hành Spec-Driven Development

Posted on September 11, 2025 by Cuong Dinh

Claude Code Spec Workflow: Hướng Dẫn Thực Hành Spec-Driven Development

Giới Thiệu

Trong thế giới phát triển phần mềm hiện đại, việc kết hợp AI vào quy trình coding đang trở thành xu hướng không thể tránh khỏi. Tuy nhiên, phương pháp “vibe coding” – viết code dựa trên cảm hứng và prompt ngẫu nhiên – thường dẫn đến kết quả không nhất quán và thiếu cấu trúc. Claude Code Spec Workflow ra đời như một giải pháp để biến việc phát triển phần mềm với AI trở nên có hệ thống và chuyên nghiệp hơn.

1. Spec-Driven Development Là Gì?

Định Nghĩa

Spec-Driven Development (SDD) là một phương pháp phát triển phần mềm mà trong đó specification (đặc tả) trở thành trung tâm của toàn bộ quy trình engineering – từ planning và design đến implementation, testing và documentation. SDD nhấn mạnh việc viết specifications rõ ràng và có cấu trúc trước khi bắt đầu implementation.

Nguyên Lý Cốt Lõi

Thay vì “vibe coding” – phương pháp mô tả mục tiêu và nhận về một khối code có vẻ đúng nhưng thường không hoạt động chính xác, SDD đối xử với coding agents như những pair programmers nghiêm túc. Chúng xuất sắc trong pattern recognition nhưng vẫn cần hướng dẫn rõ ràng và không mơ hồ.

Quy Trình 4 Giai Đoạn

SDD hoạt động qua 4 giai đoạn với các checkpoint rõ ràng:

1. Specify (Đặc Tả): Tạo ra contract về cách code nên hoạt động, trở thành source of truth cho tools và AI agents.

2. Plan (Lập Kế Hoạch): AI agent phân tích spec và tạo ra kế hoạch chi tiết về architecture, constraints và cách tiếp cận.

3. Tasks (Nhiệm Vụ): Chia nhỏ spec và plan thành các công việc cụ thể, có thể review được, mỗi task giải quyết một phần cụ thể của puzzle.

4. Implement (Triển Khai): AI agent thực hiện từng task một cách có hệ thống, với developer review những thay đổi tập trung thay vì những code dump hàng nghìn dòng.

Ưu Điểm Của SDD

Giảm đoán mò: Spec rõ ràng giúp giảm thiểu sự bất ngờ và đảm bảo chất lượng code
Dễ thay đổi hướng: Chỉ cần update spec, regenerate plan và để AI agent xử lý phần còn lại
Phù hợp với dự án phức tạp: Đặc biệt hữu ích cho greenfield projects và feature work trong hệ thống hiện có

2. Claude Code Spec Workflow – Tổng Quan

Giới Thiệu Tool

Claude Code Spec Workflow là một toolkit tự động hóa được phát triển bởi Pimzino, cung cấp quy trình spec-driven development có cấu trúc cho Claude Code. Tool này transform các ý tưởng feature thành implementation hoàn chỉnh thông qua quy trình: Requirements → Design → Tasks → Implementation.

Các Tính Năng Chính

🎯 Quy Trình Phát Triển Có Cấu Trúc

Requirements Generation: Tạo user stories và acceptance criteria sử dụng định dạng EARS (WHEN/IF/THEN statements)
Design Creation: Tạo technical architecture và design với Mermaid diagrams để visualization
Task Breakdown: Chia design thành các atomic coding tasks tập trung vào test-driven development
Systematic Implementation: Thực hiện tasks một cách có hệ thống với validation dựa trên requirements

🛠 7 Slash Commands Chính

/spec-create <name> <description> – Tạo specification mới cho feature
/spec-requirements – Generate requirements document
/spec-design – Tạo design document
/spec-tasks – Generate implementation tasks
/spec-execute <task-number> – Execute specific tasks
/spec-status – Check status của specification hiện tại
/spec-list – List tất cả specifications

🏗 Cấu Trúc Project Tự Động

Sau khi setup, tool tự động tạo:

📁 .claude/ directory: Chứa commands, templates, specs, và config files
📝 7 slash commands: Để thực hiện complete workflow
📋 Document templates: Đảm bảo formatting nhất quán
⚙️ Configuration files: Cho workflow automation
📖 CLAUDE.md: Với comprehensive workflow instructions

✨ Tính Năng Nâng Cao

Triple optimization commands: get-steering-context, get-spec-context, và get-template-context
Smart document handling: Bug documents sử dụng direct reading, templates sử dụng bulk loading
Session-based caching: Intelligent file change detection và cache invalidation
Real-time web dashboard: Monitor specs, tasks, và progress với live updates
Bug workflow system: Complete bug reporting và resolution tracking

3. Hướng Dẫn Cài Đặt và Sử Dụng

Yêu Cầu Hệ Thống

Node.js: 16.0.0 hoặc cao hơn
Claude Code: Đã cài đặt và configure
Bất kỳ project directory nào

Cài Đặt Claude Code (Prerequisite)

bash

# Install Claude Code
npm install -g @anthropic-ai/claude-code

# Verify installation
claude doctor

# Navigate to your project
cd your-awesome-project

# Start Claude Code (first time login required)
claude

Cài Đặt Claude Code Spec Workflow

Phương Pháp 1: Cài Đặt Nhanh (Khuyến Nghị)

bash

# Cài đặt trong current directory
npx @pimzino/claude-code-spec-workflow

# Cài đặt trong directory cụ thể  
npx @pimzino/claude-code-spec-workflow --project /path/to/project

# Force overwrite existing files
npx @pimzino/claude-code-spec-workflow --force

# Skip confirmation prompts
npx @pimzino/claude-code-spec-workflow --yes

# Test setup
npx @pimzino/claude-code-spec-workflow test

Phương Pháp 2: Global Installation

bash

# Install globally
npm install -g @pimzino/claude-code-spec-workflow

# Use anywhere
claude-spec-setup

Phương Pháp 3: Development Dependency

bash

# Install as dev dependency
npm install --save-dev @pimzino/claude-code-spec-workflow

# Run via package.json script
npx claude-spec-setup

Cấu Trúc Được Tạo Ra

your-project/
├── .claude/
│   ├── commands/
│   │   ├── spec-create.md
│   │   ├── spec-requirements.md  
│   │   ├── spec-design.md
│   │   ├── spec-tasks.md
│   │   ├── spec-execute.md
│   │   ├── spec-status.md
│   │   └── spec-list.md
│   ├── templates/
│   │   ├── requirements-template.md
│   │   ├── design-template.md
│   │   └── tasks-template.md
│   ├── specs/
│   │   └── (your specs will be created here)
│   └── spec-config.json
└── CLAUDE.md (created/updated)

4. Hướng Dẫn Sử Dụng Chi Tiết

Workflow Cơ Bản

Bước 1: Khởi Tạo Claude Code

bash

cd my-awesome-project
claude

Bước 2: Tạo Specification Mới

bash

# Trong Claude Code terminal
/spec-create user-dashboard "User profile management system"

Bước 3: Generate Requirements

bash

/spec-requirements

Output: User stories với EARS format (WHEN/IF/THEN statements) đảm bảo comprehensive requirement coverage.

Bước 4: Tạo Design Document

bash

/spec-design

Output: Technical architecture với Mermaid diagrams, plans components, interfaces, và data models.

Bước 5: Generate Implementation Tasks

bash

/spec-tasks

Output: Atomic coding tasks với focus vào test-driven development, references specific requirements.

Bước 6: Execute Tasks

bash

/spec-execute 1

Executes tasks systematically với validation against requirements, ensures quality và consistency.

Bước 7: Monitor Progress

bash

# Check current status
/spec-status

# List all specifications  
/spec-list

Web Dashboard (Tính Năng Nâng Cao)

bash

# Basic dashboard
npx -p @pimzino/claude-code-spec-workflow claude-spec-dashboard

# Dashboard with tunnel (share externally) 
npx -p @pimzino/claude-code-spec-workflow claude-spec-dashboard --tunnel

# Full tunnel configuration
npx -p @pimzino/claude-code-spec-workflow claude-spec-dashboard \
  --tunnel \
  --tunnel-password mySecret123 \
  --tunnel-provider cloudflare \
  --port 3000 \
  --open

5. Ví Dụ Thực Hành: Phát Triển Game Asteroids

Mô Tả Dự Án

Một developer đã sử dụng claude-code-spec-workflow để tạo một game 2D đơn giản where player controls spaceship để tránh falling asteroids. Score được tính dựa trên survival time.

Quy Trình Thực Hiện

bash

# 1. Tạo specification
/spec-create asteroids-game "A simple 2D action game where the player controls a spaceship to avoid falling asteroids. The score is based on survival time."

# 2. Generate 3 types of specs
- Requirements document (요구사항 정의서)
- Design document (설계서)  
- Task list (타스크 리스트)

# 3. Implementation using SDD methodology

Kết Quả

Developer đã thành công tạo ra một asteroids avoidance game hoàn chỉnh sử dụng:

LLM: Claude Sonnet 4
Frontend: HTML, CSS, JavaScript
Development Tools: Claude Code, claude-code-spec-workflow

Kỹ Thuật Sử Dụng

Trong video demonstration, developer đã sử dụng:

/spec-status để check workflow status
/spec-list để view tất cả specifications
/spec-create để tạo các specs với detailed content
Cuối cùng là gameplay video của completed game

6. Troubleshooting và Best Practices

Common Issues

❓ Command Not Found After NPX

bash

# Make sure you're using correct package name
npx @pimzino/claude-code-spec-workflow

❓ Setup Fails với Permission Errors

bash

# Try with different directory permissions
npx @pimzino/claude-code-spec-workflow --project ~/my-project

❓ Claude Code Not Detected

bash

# Install Claude Code first
npm install -g @anthropic-ai/claude-code

# Show verbose output
DEBUG=* npx @pimzino/claude-code-spec-workflow

# Check package version
npx @pimzino/claude-code-spec-workflow --version

Best Practices

1. Project Setup

bash

# Setup multiple projects efficiently
for dir in project1 project2 project3; do
  npx @pimzino/claude-code-spec-workflow --project $dir --yes
done

2. Testing Setup

bash

# Test setup trong temporary directory
npx @pimzino/claude-code-spec-workflow test

3. Workflow Efficiency

Auto-detects project type: Node.js, Python, Java, etc.
Beautiful CLI: Với progress indicators
Validation: Claude Code installation check
Safety: Preserves existing CLAUDE.md content

7. So Sánh Với Các Công Cụ Khác

Claude Code Spec Workflow vs GitHub Spec-Kit

GitHub Spec-Kit: Toolkit chính thức từ GitHub cho SDD với support cho multiple AI agents (GitHub Copilot, Claude Code, Gemini CLI)
Claude Code Spec Workflow: Chuyên biệt cho Claude Code với workflow tự động hóa và dashboard

Ưu Điểm Của Claude Code Spec Workflow

Dễ cài đặt: One-command setup
Tự động hóa cao: 7 slash commands có sẵn
Dashboard tích hợp: Real-time monitoring
TypeScript implementation: Comprehensive error handling

8. Tương Lai và Phát Triển

Xu Hướng SDD

Spec-driven development đang trở thành popular trong developer community như một cách để build software với structure hơn và ít “vibes” hơn. Nó đặc biệt phù hợp với greenfield projects và mid-to-large-sized features.

Limitations

UI-heavy work: Non-visual spec không hữu ích cho UI work
Small features: Tạo full-blown spec có thể overkill cho small features hoặc bug fixes
Overengineering risk: Có thể dẫn đến solutions phức tạp hơn cần thiết

Future of SDD

“Specs are the new code” – Sean Grove từ OpenAI team cho rằng 80-90% công việc của programmers là structured communication, và specs là cách tốt nhất để communicate về software functionality.

9. Kết Luận

Claude Code Spec Workflow đại diện cho bước tiến quan trọng trong việc kết hợp AI vào quy trình phát triển phần mềm một cách có hệ thống. Tool này không chỉ đơn thuần là automation mà còn là methodology giúp developers:

Lợi Ích Chính

Cấu trúc hóa quy trình: Từ vibe coding thành systematic development
Tăng chất lượng code: Thông qua spec-driven approach
Cải thiện collaboration: Giữa developers và AI agents
Giảm rủi ro dự án: Với clear specifications và validation

Khi Nào Nên Sử Dụng

Greenfield projects: Starting từ zero với clear vision
Feature development: Trong existing complex systems
Team collaboration: Cần consistent development approach
Quality-focused projects: Khi code quality là priority

Khuyến Nghị

Claude Code Spec Workflow là tool xuất sắc cho developers muốn áp dụng SDD methodology với Claude Code. Tuy nhiên, hãy nhớ rằng tool chỉ là means, methodology và mindset mới là điều quan trọng nhất.

Hãy bắt đầu với những dự án nhỏ, làm quen với workflow, và dần mở rộng sang những dự án phức tạp hơn. Spec-driven development không phải là silver bullet, nhưng chắc chắn là một powerful approach trong arsenal của modern developers.

Demo:

Sử dụng claude cli + claude-code-spec-workflow test tạo workflow cho chức năng user-authentication

Kết quả :
Tạo ra code và spec cho chức năng user-authentication

————————————————————————————————————————————————————————————————————————————–

Sử dụng claude cli + claude-code-spec-workflow test 1 game đơn giản bằng html

Kết quả :

Tài Liệu Tham Khảo:

Cursor 0.50 Just Dropped – Your AI-Powered Coding Assistant Just Got Smarter

Posted on May 27, 2025May 28, 2025 by Cuong Dinh

💡 Cursor 0.50 Just Dropped – Your AI-Powered Coding Assistant Just Got Smarter

TL;DR: With the release of Cursor 0.50, developers get access to request-based billing, background AI agents, smarter multi-file edits, and deeper workspace integration. Cursor is fast becoming the most capable AI coding tool for serious developers.

🚀 What Is Cursor?

Cursor is an AI-native code editor built on top of VS Code, designed to let AI work with your code rather than next to it. With GPT-4 and Claude integrated deeply into its architecture, Cursor doesn’t just autocomplete — it edits, debugs, understands your full project, and runs background agents to help you move faster.

🔥 What’s New in Cursor 0.50?

💰 Request-Based Billing + Max Mode for All Models

Cursor now offers:

Transparent usage-based pricing — You only pay for requests you make.
Max Mode for all LLMs (GPT-4, Claude, etc.) — Access higher-quality reasoning per token.

This change empowers all users — from solo hackers to enterprise teams — to choose the right balance between cost and quality.

🤖 Background AI Agents (Yes, Parallel AI!)

One of the most powerful new features is background AI agents:

Agents run asynchronously and can take over tasks like bug fixing, PR writing, and large-scale refactoring.
You can now “send a task” to an agent, switch context, and return later — a huge leap in multitasking with AI.

Powered by the Multi-Context Project (MCP) framework, these agents can reference more of your codebase than ever before.

🧠 Tab Model v2: Smarter, Cross-File Edits

Cursor’s AI can now:

Suggest changes across multiple files — critical for large refactors.
Understand relationships between files (like components, hooks, or service layers).
Provide syntax-highlighted AI completions for better visual clarity.

🛠️ Redesigned Inline Edit Flow

Inline editing (Cmd/Ctrl+K) is now:

More intuitive, with options to edit the whole file (⌘⇧⏎) or delegate to an agent (⌘L).
Faster and scalable for large files (yes, even thousands of lines).

This bridges the gap between simple fixes and deep code transformations.

🗂️ Full-Project Context + Multi-Root Workspaces

Cursor now handles large, complex projects better than ever:

You can use @folders to add whole directories into the AI’s context.
Multi-root workspace support means Cursor can understand and work across multiple codebases — essential for microservices and monorepos.

🧪 Real Use Cases (from the Community)

According to GenerativeAI.pub’s deep dive, developers are already using Cursor 0.50 to:

Let background agents auto-refactor legacy modules.
Draft PRs from diffs in seconds.
Inject whole folders into the AI context for more accurate suggestions.

It’s not just about faster code — it’s about working smarter with an AI assistant that gets the big picture.

📌 Final Thoughts

With Cursor 0.50, the future of pair programming isn’t just someone typing next to you — it’s an agent that can read, think, and refactor your code while you focus on building features. Whether you’re a solo developer or a CTO managing a team, this update is a must-try.

👉 Try it now at cursor.sh or read the full changelog here.

🏷 Suggested Tags for SEO:

#AIProgramming, #CursorEditor, #GPT4Dev, #AIAgents, #CodeRefactoring, #DeveloperTools, #VSCodeAI, #Productivity, #GenerativeAI

Introduction to Mastra AI and Basic Installation Guide

Posted on April 28, 2025April 28, 2025 by Cuong Dinh

Introduction to Mastra AI and Basic Installation Guide

In the booming era of AI development, the demand for open-source platforms that support building machine learning (ML) models is rapidly increasing. Mastra AI emerges as a flexible and easy-to-use tool that helps researchers and AI engineers efficiently build, train, and deploy complex ML pipelines. This article provides an overview of Mastra AI and a basic installation guide to get started.

What is Mastra AI?

According to the official documentation (mastra.ai), Mastra is an open-source framework designed to support building, training, and operating AI/ML pipelines at scale.

Mastra is optimized for:

Managing workflows of complex AI projects.
Tracking data, models, and experiments.
Automating the training, evaluation, and deployment processes.
Supporting customizable and easily extendable plugins.

Mastra aims to become a rapid “launchpad” for AI teams, suitable for both research (R&D) and production-grade systems.

Key Components of Mastra

Pipeline Management: Easily define and manage pipeline steps.
Experiment Tracking: Record and compare experimental results.
Deployment Tools: Support for exporting models and deploying them in production environments.
Plugin System: Integration with external tools like HuggingFace, TensorFlow, and PyTorch.
UI Dashboard: Visualize processes and results.

Basic Installation Guide for Mastra

To install Mastra, you can refer to the detailed guide here:
👉 Mastra Installation Guide

Summary of the basic steps:

1. System Requirements

Node.js v20.0 or higher
Access to a supported large language model (LLM)

To run Mastra, you need access to an LLM. Typically, you’ll want to get an API key from an LLM provider such as OpenAI , Anthropic , or Google Gemini . You can also run Mastra with a local LLM using Ollama .

2.Create a New Project

We recommend starting a new Mastra project using create-mastra, which will scaffold your project. To create a project, run:

npx create-mastra@latest

On installation, you’ll be guided through the following prompts:

After the prompts, create-mastra will:

Set up your project directory with TypeScript
Install dependencies
Configure your selected components and LLM provider
Configure the MCP server in your IDE (if selected) for instant access to docs, examples, and help while you code

MCP Note: If you’re using a different IDE, you can install the MCP server manually by following the instructions in the MCP server docs. Also note that there are additional steps for Cursor and Windsurf to activate the MCP server.

3. Set Up your API Key

Add the API key for your configured LLM provider in your .env file.

OPENAI_API_KEY=<your-openai-key>

Non-Interactive mode:

You can now specify the project name as either a positional argument or with the -p, --project-name option. This works consistently in both the Mastra CLI (mastra create) and create-mastra package. If both are provided, the argument takes precedence over the option.

3. Start the Mastra Server

Mastra provides commands to serve your agents via REST endpoints:

mastra run examples/quickstart_pipeline.yaml

Development Server

Run the following command to start the Mastra server:

npm run dev

If you have the mastra CLI installed, run:

mastra dev

This command creates REST API endpoints for your agents.

Test the Endpoint

You can test the agent’s endpoint using curl or fetch:

curl -X POST http://localhost:4111/api/agents/weatherAgent/generate \
-H “Content-Type: application/json” \
-d ‘{“messages”: [“What is the weather in London?”]}’

Use Mastra on the Client

To use Mastra in your frontend applications, you can use our type-safe client SDK to interact with your Mastra REST APIs.

See the Mastra Client SDK documentation for detailed usage instructions.

Run from the command line

If you’d like to directly call agents from the command line, you can create a script to get an agent and call it:

Then, run the script to test that everything is set up correctly:

npx tsx src/index.ts

This should output the agent’s response to your console.

🔍 So Sánh Các Phương Pháp Xử Lý Excel Cho RAG

Giới thiệu

🔧 Thiết lập thử nghiệm

Công cụ sử dụng:

4 kịch bản test:

1. Phương pháp CSV (Plain Text)

Ưu điểm

Nhược điểm

2. Phương pháp JSON (Structured)

Ưu điểm

Nhược điểm

3. Phương pháp HTML (Rich Format)

Ưu điểm

Nhược điểm

4. Phương pháp PDF Image (Visual)

Ưu điểm

Nhược điểm

5. Phương pháp Hybrid (HTML + PDF) ⭐

Ưu điểm

Nhược điểm

📊 Bảng so sánh tổng hợp

🔍 Phân tích chi tiết

CSV & JSON – Giới hạn rõ ràng

HTML (ExcelJS) – Một nửa thành công

PDF Image – Mạnh về visual

Hybrid (HTML + PDF) – Người chiến thắng 🏆

Demo

Bước 1. Chuẩn bị & Cài đặt

Bước 2. Code chính cần nắm

Bước 3. Các bước xử lý thực tế

Bước 4. Logic trong các hàm main

💡 Kết luận & Khuyến nghị

🎯 Context Engineering là gì?

🔄 Khác biệt giữa Context Engineering và Prompt Engineering

📝 Prompt Engineering

🧠 Context Engineering

⚡ Tại sao Context Engineering quan trọng hơn?

⚠️ Những điều cần chú ý khi phát triển AI Agents

1. 🎯 Vấn đề “Goldilocks Zone” cho System Prompts

2. 🧹 “Context Rot” – Sự suy giảm độ chính xác

3. 🔧 Quản lý Tools hiệu quả

4. 📊 Just-in-Time Context Retrieval

5. 🎨 Ba chiến lược cho tác vụ dài hạn

📦 Compaction (Nén thông tin)

📝 Structured Note-Taking

🤖 Multi-Agent Architecture

6. 🎯 Ưu tiên Context theo tầm quan trọng

7. 📈 Monitoring và Iteration

💡 Kết luận

🎭 Revolutionizing Test Automation with Playwright Agents

What Are Playwright Agents?

Planner Agent

How It Works:

Example Output:

Generator Agent

Key Features:

Generated Test Example:

Healer Agent

Healing Process:

Common Fixes Applied:

🤖 Working with Claude Code

Setup Process:

Benefits with Claude Code:

The Complete Workflow

Why This Matters

Ready to Transform Your Testing?

🤖 OpenAI AgentKit vs Dify

🚀 Platform Overview

OpenAI AgentKit

Dify

🎯 OpenAI AgentKit – Core Features

🎨 Agent Builder

💬 ChatKit

🔌 Connector Registry

📊 Evaluation & Optimization

🔒 Security & Guardrails

⚡ Performance

🎯 Real-World Success Story

🛠️ Dify – Core Features

🎯 Visual Workflow Builder

Bước 4. Logic trong các hàm `main`