Xây Dựng AI Agent Hiệu Quả với MCP

Posted on November 14, 2025 by Phat Ly

Giới Thiệu

Trong thời đại AI đang phát triển mạnh mẽ, việc xây dựng các AI agent thông minh và hiệu quả đã trở thành mục tiêu của nhiều nhà phát triển. Model Context Protocol (MCP) – một giao thức mở được Anthropic phát triển – đang mở ra những khả năng mới trong việc tối ưu hóa cách các AI agent tương tác với dữ liệu và công cụ. Bài viết này sẽ phân tích cách tiếp cận “Code Execution with MCP” và đưa ra những góc nhìn thực tế về việc áp dụng nó vào các dự án thực tế.

MCP Là Gì và Tại Sao Nó Quan Trọng?

Model Context Protocol (MCP) có thể được ví như “USB-C của thế giới AI” – một tiêu chuẩn mở giúp chuẩn hóa cách các ứng dụng cung cấp ngữ cảnh cho các mô hình ngôn ngữ lớn (LLM). Thay vì mỗi hệ thống phải tự xây dựng cách kết nối riêng, MCP cung cấp một giao thức thống nhất, giúp giảm thiểu sự phân mảnh và tăng tính tương thích.

Quan điểm cá nhân: Tôi cho rằng MCP không chỉ là một công nghệ, mà còn là một bước tiến quan trọng trong việc chuẩn hóa hệ sinh thái AI. Giống như cách HTTP đã cách mạng hóa web, MCP có tiềm năng trở thành nền tảng cho việc kết nối các AI agent với thế giới bên ngoài.

Code Execution với MCP: Bước Đột Phá Thực Sự

Vấn Đề Truyền Thống

Trước đây, khi xây dựng AI agent, chúng ta thường phải:

Tải tất cả định nghĩa công cụ vào context window ngay từ đầu
Gửi toàn bộ dữ liệu thô đến mô hình, dù chỉ cần một phần nhỏ
Thực hiện nhiều lần gọi công cụ tuần tự, gây ra độ trễ cao
Đối mặt với rủi ro bảo mật khi dữ liệu nhạy cảm phải đi qua mô hình

Giải Pháp: Code Execution với MCP

Code execution với MCP cho phép AI agent viết và thực thi mã để tương tác với các công cụ MCP. Điều này mang lại 5 lợi ích chính:

1. Tiết Lộ Dần Dần (Progressive Disclosure)

Cách hoạt động: Thay vì tải tất cả định nghĩa công cụ vào context, agent có thể đọc các file công cụ từ hệ thống file khi cần thiết.

Ví dụ thực tế: Giống như việc bạn không cần đọc toàn bộ thư viện sách để tìm một thông tin cụ thể. Agent chỉ cần “mở” file công cụ khi thực sự cần sử dụng.

Lợi ích:

Giảm đáng kể token consumption
Tăng tốc độ phản hồi ban đầu
Cho phép agent làm việc với số lượng công cụ lớn hơn

2. Kết Quả Công Cụ Hiệu Quả Về Ngữ Cảnh

Vấn đề: Khi làm việc với dataset lớn (ví dụ: 10,000 records), việc gửi toàn bộ dữ liệu đến mô hình là không hiệu quả.

Giải pháp: Agent có thể viết mã để lọc, chuyển đổi và xử lý dữ liệu trước khi trả về kết quả cuối cùng.

Ví dụ:

# Thay vì trả về 10,000 records
# Agent có thể viết:
results = filter_data(dataset, criteria)
summary = aggregate(results)
return summary  # Chỉ trả về kết quả đã xử lý

Quan điểm: Đây là một trong những điểm mạnh nhất của phương pháp này. Nó cho phép agent “suy nghĩ” trước khi trả lời, giống như cách con người xử lý thông tin.

3. Luồng Điều Khiển Mạnh Mẽ

Cách truyền thống: Agent phải thực hiện nhiều lần gọi công cụ tuần tự:

Gọi công cụ 1 → Chờ kết quả → Gọi công cụ 2 → Chờ kết quả → ...

Với code execution: Agent có thể viết một đoạn mã với vòng lặp, điều kiện và xử lý lỗi:

for item in items:
    result = process(item)
    if result.is_valid():
        save(result)
    else:
        log_error(item)

Lợi ích:

Giảm độ trễ (latency) đáng kể
Xử lý lỗi tốt hơn
Logic phức tạp được thực thi trong một bước

4. Bảo Vệ Quyền Riêng Tư

Đặc điểm quan trọng: Các kết quả trung gian mặc định được giữ trong môi trường thực thi, không tự động gửi đến mô hình.

Ví dụ: Khi agent xử lý dữ liệu nhạy cảm (thông tin cá nhân, mật khẩu), các biến trung gian chỉ tồn tại trong môi trường thực thi. Chỉ khi agent chủ động log hoặc return, dữ liệu mới được gửi đến mô hình.

Quan điểm: Đây là một tính năng bảo mật quan trọng, đặc biệt trong các ứng dụng enterprise. Tuy nhiên, cần có cơ chế giám sát để đảm bảo agent không vô tình leak dữ liệu.

5. Duy Trì Trạng Thái và Kỹ Năng

Khả năng mới: Agent có thể:

Lưu trạng thái vào file để tiếp tục công việc sau
Xây dựng các function có thể tái sử dụng như “kỹ năng”
Học và cải thiện theo thời gian

Ví dụ thực tế: Agent có thể tạo file utils.py với các function xử lý dữ liệu, và sử dụng lại trong các task tương lai.

Cách Xây Dựng AI Agent Hiệu Quả với MCP

Bước 1: Thiết Kế Kiến Trúc

Nguyên tắc:

Tách biệt rõ ràng giữa logic xử lý và tương tác với MCP
Thiết kế các công cụ MCP theo module, dễ mở rộng
Xây dựng hệ thống quản lý trạng thái rõ ràng

Ví dụ kiến trúc:

Agent Core
├── MCP Client (kết nối với MCP servers)
├── Code Executor (sandbox environment)
├── State Manager (lưu trữ trạng thái)
└── Tool Registry (quản lý công cụ)

Bước 2: Tối Ưu Hóa Progressive Disclosure

Chiến lược:

Tổ chức công cụ theo namespace và category
Sử dụng file system để quản lý định nghĩa công cụ
Implement lazy loading cho các công cụ ít dùng

Code pattern:

# tools/database/query.py
def query_database(sql):
    # Implementation
    pass

# Agent chỉ load khi cần
if need_database:
    import tools.database.query

Bước 3: Xây Dựng Data Processing Pipeline

Best practices:

Luôn filter và transform dữ liệu trước khi trả về
Sử dụng streaming cho dataset lớn
Implement caching cho các query thường dùng

Ví dụ:

def process_large_dataset(data_source):
    # Chỉ load và xử lý phần cần thiết
    filtered = stream_filter(data_source, filter_func)
    aggregated = aggregate_in_chunks(filtered)
    return summary_statistics(aggregated)

Bước 4: Implement Security Measures

Các biện pháp cần thiết:

Sandboxing: Chạy code trong môi trường cách ly
Resource limits: Giới hạn CPU, memory, thời gian thực thi
Audit logging: Ghi lại tất cả code được thực thi
Input validation: Kiểm tra input trước khi thực thi

Quan điểm: Security không phải là feature, mà là requirement. Đừng để đến khi có sự cố mới nghĩ đến bảo mật.

Bước 5: State Management và Skill Building

Chiến lược:

Sử dụng file system hoặc database để lưu trạng thái
Tạo thư viện các utility functions có thể tái sử dụng
Implement versioning cho các “skills”

Ví dụ:

# skills/data_analysis.py
def analyze_trends(data):
    # Reusable skill
    pass

# Agent có thể import và sử dụng
from skills.data_analysis import analyze_trends

Áp Dụng Vào Dự Án Thực Tế

Use Case 1: Data Analysis Agent

Tình huống: Xây dựng agent phân tích dữ liệu từ nhiều nguồn khác nhau.

Áp dụng MCP:

MCP servers cho mỗi data source (database, API, file system)
Code execution để filter và aggregate dữ liệu
Progressive disclosure cho các công cụ phân tích

Lợi ích:

Giảm 60-70% token usage
Tăng tốc độ xử lý 3-5 lần
Dễ dàng thêm data source mới

Use Case 2: Automation Agent

Tình huống: Agent tự động hóa các tác vụ lặp đi lặp lại.

Áp dụng MCP:

MCP servers cho các hệ thống cần tương tác
Code execution để xử lý logic phức tạp
State management để resume công việc

Lợi ích:

Xử lý lỗi tốt hơn với try-catch trong code
Có thể pause và resume công việc
Dễ dàng debug và monitor

Use Case 3: Customer Support Agent

Tình huống: Agent hỗ trợ khách hàng với quyền truy cập vào nhiều hệ thống.

Áp dụng MCP:

MCP servers cho CRM, knowledge base, ticketing system
Code execution để query và tổng hợp thông tin
Privacy protection cho dữ liệu khách hàng

Lợi ích:

Bảo vệ thông tin nhạy cảm tốt hơn
Phản hồi nhanh hơn với data processing tại chỗ
Dễ dàng tích hợp hệ thống mới

Những Thách Thức và Giải Pháp

Thách Thức 1: Code Quality và Safety

Vấn đề: Agent có thể viết code không an toàn hoặc không hiệu quả.

Giải pháp:

Implement code review tự động
Sử dụng linter và formatter
Giới hạn các API và function có thể sử dụng

Thách Thức 2: Debugging

Vấn đề: Debug code được agent tự động generate khó hơn code thủ công.

Giải pháp:

Comprehensive logging
Code explanation từ agent
Step-by-step execution với breakpoints

Thách Thức 3: Performance

Vấn đề: Code execution có thể chậm nếu không tối ưu.

Giải pháp:

Caching kết quả
Parallel execution khi có thể
Optimize code generation từ agent

Roadmap Áp Dụng MCP Vào Dự Án Của Bạn

Dựa trên những nguyên tắc và best practices đã trình bày, đây là roadmap cụ thể để bạn có thể áp dụng MCP vào dự án của mình một cách hiệu quả:

Giai Đoạn 1: Chuẩn Bị và Đánh Giá (Tuần 1-2)

Mục tiêu: Hiểu rõ nhu cầu và chuẩn bị môi trường

Đánh giá use case: Xác định vấn đề cụ thể mà agent sẽ giải quyết
Phân tích hệ thống hiện tại: Liệt kê các hệ thống, API, database cần tích hợp
Thiết lập môi trường dev: Cài đặt MCP SDK, tạo sandbox environment
Xác định metrics: Định nghĩa KPIs để đo lường hiệu quả (token usage, latency, accuracy)
Security audit: Đánh giá các yêu cầu bảo mật và compliance

Giai Đoạn 2: Proof of Concept (Tuần 3-4)

Mục tiêu: Xây dựng prototype đơn giản để validate concept

Tạo MCP server đầu tiên: Bắt đầu với một data source đơn giản nhất
Implement basic agent: Agent có thể gọi MCP tool và xử lý response
Test code execution: Cho agent viết và thực thi code đơn giản
Đo lường baseline: Ghi lại metrics ban đầu để so sánh
Gather feedback: Thu thập phản hồi từ team và stakeholders

Giai Đoạn 3: Mở Rộng và Tối Ưu (Tuần 5-8)

Mục tiêu: Mở rộng chức năng và tối ưu hóa hiệu suất

Thêm MCP servers: Tích hợp các data source và hệ thống còn lại
Implement progressive disclosure: Tổ chức tools theo namespace, lazy loading
Xây dựng data pipeline: Filter, transform, aggregate data trước khi trả về
Security hardening: Implement sandboxing, resource limits, audit logging
State management: Lưu trạng thái, xây dựng reusable skills
Performance optimization: Caching, parallel execution, code optimization

Giai Đoạn 4: Production và Monitoring (Tuần 9-12)

Mục tiêu: Đưa vào production và đảm bảo ổn định

Testing toàn diện: Unit tests, integration tests, security tests
Documentation: Viết docs cho MCP servers, API, và agent behavior
Monitoring setup: Logging, metrics, alerting system
Gradual rollout: Deploy từng phần, A/B testing nếu cần
Training và support: Đào tạo team, setup support process
Continuous improvement: Thu thập feedback, iterate và optimize

Checklist Implementation

Technical Setup

MCP SDK installed
Sandbox environment configured
MCP servers implemented
Code executor setup
State storage configured

Security

Sandboxing enabled
Resource limits set
Input validation implemented
Audit logging active
Access control configured

Performance

Progressive disclosure implemented
Data filtering in place
Caching strategy defined
Metrics dashboard ready
Optimization plan created

Key Takeaways để Áp Dụng Hiệu Quả

Bắt đầu từ use case đơn giản nhất: Đừng cố gắng giải quyết tất cả vấn đề cùng lúc. Bắt đầu nhỏ, học hỏi, rồi mở rộng.
Ưu tiên security từ đầu: Đừng để security là suy nghĩ sau. Thiết kế security vào kiến trúc ngay từ đầu.
Đo lường mọi thứ: Nếu không đo lường được, bạn không thể cải thiện. Setup metrics và monitoring sớm.
Tận dụng code execution: Đây là điểm mạnh của MCP. Cho phép agent xử lý logic phức tạp trong code thay vì nhiều tool calls.
Xây dựng reusable skills: Đầu tư vào việc tạo các function có thể tái sử dụng. Chúng sẽ tiết kiệm thời gian về sau.
Iterate và improve: Không có giải pháp hoàn hảo ngay từ đầu. Thu thập feedback, đo lường, và cải thiện liên tục.

Ví Dụ Thực Tế: E-commerce Data Analysis Agent

Tình huống: Bạn cần xây dựng agent phân tích dữ liệu bán hàng từ nhiều nguồn (database, API, CSV files).

Áp dụng roadmap:

Tuần 1-2: Đánh giá data sources, thiết lập môi trường, xác định metrics (query time, token usage)
Tuần 3-4: Tạo MCP server cho database, agent có thể query và trả về kết quả đơn giản
Tuần 5-8: Thêm MCP servers cho API và file system, implement data filtering, aggregation trong code
Tuần 9-12: Production deployment, monitoring, optimize query performance, build reusable analysis functions

Kết quả: Agent có thể phân tích dữ liệu từ nhiều nguồn, giảm 65% token usage, tăng tốc độ xử lý 4 lần so với cách truyền thống.

Kết Luận và Hướng Phát Triển

Code execution với MCP đại diện cho một bước tiến quan trọng trong việc xây dựng AI agent. Nó không chỉ giải quyết các vấn đề về hiệu quả và bảo mật, mà còn mở ra khả năng cho agent “học” và phát triển kỹ năng theo thời gian.

Quan điểm cuối cùng:

Tôi tin rằng đây mới chỉ là khởi đầu. Trong tương lai, chúng ta sẽ thấy:

Các agent có thể tự động tối ưu hóa code của chính chúng
Hệ sinh thái các MCP servers phong phú hơn
Các framework và tooling hỗ trợ tốt hơn cho việc phát triển

Lời khuyên cho các nhà phát triển:

Bắt đầu nhỏ: Bắt đầu với một use case đơn giản để hiểu rõ cách MCP hoạt động
Tập trung vào security: Đừng đánh đổi bảo mật để lấy hiệu quả
Đo lường và tối ưu: Luôn đo lường performance và tối ưu dựa trên dữ liệu thực tế
Cộng đồng: Tham gia vào cộng đồng MCP để học hỏi và chia sẻ kinh nghiệm

Việc áp dụng MCP vào dự án của bạn không chỉ là việc tích hợp một công nghệ mới, mà còn là việc thay đổi cách suy nghĩ về việc xây dựng AI agent. Hãy bắt đầu ngay hôm nay và khám phá những khả năng mới!

Cursor 2.0: Revolutionizing Code Development

Posted on November 11, 2025 by Phat Ly

🚀 Cursor 2.0: Revolutionizing Code Development

Discover the New Features and Benefits for Modern Programmers

🎯 What’s New in Cursor 2.0?

⚡ Composer Model

4x Faster Performance: A frontier coding model that operates four times faster than similarly intelligent models, completing most tasks in under 30 seconds. Designed for low-latency agentic coding and particularly effective in large codebases.

🤖 Multi-Agent Interface

Run Up to 8 Agents Concurrently: A redesigned interface that allows you to manage and run up to eight agents simultaneously. Each agent operates in isolated copies of your codebase to prevent file conflicts and enable parallel development workflows.

🌐 Embedded Browser

Now Generally Available: The in-editor browser includes tools for selecting elements and forwarding DOM information to agents. This facilitates more effective web development, testing, and iteration without leaving your editor.

🔒 Sandboxed Terminals

Enhanced Security (macOS): Agent commands now run in a secure sandbox by default, restricting commands to read/write access within your workspace without internet access. This enhances security while maintaining functionality.

🎤 Voice Mode

Hands-Free Operation: Control agents using voice commands with built-in speech-to-text conversion. Supports custom submit keywords, allowing for hands-free coding and improved accessibility.

📝 Improved Code Review

Enhanced Multi-File Management: Better features for viewing and managing changes across multiple files without switching between them. Streamlines the code review process and improves collaboration.

👥 Team Commands

Centralized Management: Define and manage custom commands and rules centrally through the Cursor dashboard. Ensures consistency across your team and standardizes development workflows.

🚀 Performance Enhancements

Faster LSP Performance: Improved loading and usage of Language Server Protocols (LSPs) for all languages. Results in faster performance, reduced memory usage, and smoother operation, especially noticeable in large projects.

💡 Key Benefits for Programmers

🚀 Increased Productivity

Cursor 2.0’s enhanced AI capabilities significantly reduce the time spent on boilerplate code, debugging, and searching for solutions. Programmers can focus more on solving complex problems rather than routine coding tasks.

✓ 4x Faster Code Generation: The Composer model completes most coding tasks in under 30 seconds, dramatically reducing development time and enabling rapid iteration cycles.
✓ Parallel Development Workflows: Multi-agent interface allows running up to 8 agents simultaneously, enabling teams to work on multiple features or bug fixes concurrently without conflicts.
✓ Streamlined Web Development: Embedded browser with DOM element selection eliminates the need to switch between browser and editor, making web testing and debugging more efficient.
✓ Enhanced Security: Sandboxed terminals on macOS provide secure execution environment, protecting sensitive projects while maintaining full functionality for agent commands.
✓ Improved Accessibility: Voice mode enables hands-free coding, making development more accessible and allowing for multitasking while coding.
✓ Better Code Review Process: Enhanced multi-file change management allows reviewing and managing changes across multiple files without constant context switching, improving review efficiency.
✓ Team Consistency: Team Commands feature ensures all team members follow standardized workflows and best practices, reducing onboarding time and maintaining code quality.
✓ Optimized Performance for Large Projects: Improved LSP performance means faster loading times, reduced memory usage, and smoother operation even with complex, large-scale codebases.
✓ Reduced Development Time: Combined features result in significantly faster development cycles, allowing teams to deliver features and fixes much quicker than before.
✓ Better Resource Utilization: Parallel agent execution and optimized performance mean teams can accomplish more with the same resources, improving overall productivity.

🎨 New Features Deep Dive

1. Composer Model – Speed Revolution

The Composer model represents a significant leap in AI coding performance. Key characteristics:

✓ 4x Faster: Operates four times faster than similarly intelligent models
✓ Under 30 Seconds: Completes most coding tasks in less than 30 seconds
✓ Low-Latency: Designed specifically for agentic coding workflows
✓ Large Codebase Optimized: Particularly effective when working with large, complex projects

2. Multi-Agent Interface – Parallel Processing

The multi-agent interface revolutionizes how teams can work with AI assistants:

✓ Run up to 8 agents simultaneously without conflicts
✓ Each agent operates in isolated copies of your codebase
✓ Prevents file conflicts and merge issues
✓ Enables true parallel development workflows

3. Embedded Browser – Integrated Web Development

Now generally available, the embedded browser brings:

✓ In-editor browser for testing and debugging
✓ Element selection tools for DOM interaction
✓ Direct DOM information forwarding to agents
✓ Seamless web development workflow

4. Security & Performance Enhancements

Cursor 2.0 includes critical improvements for security and performance:

✓ Sandboxed Terminals: Secure execution environment on macOS
✓ LSP Improvements: Faster loading and reduced memory usage
✓ Better Resource Management: Optimized for large projects

📊 Comparison: Before vs After

Aspect	Before 2.0	After 2.0
Model Speed	Standard speed	4x Faster (Composer) NEW
Task Completion Time	Minutes	<30 seconds NEW
Agent Execution	Single agent	Up to 8 concurrent agents NEW
Browser Integration	External only	Embedded in-editor browser NEW
Security (macOS)	Standard terminals	Sandboxed terminals NEW
Voice Control	Not available	Voice mode available NEW
Team Management	Individual settings	Centralized team commands NEW
LSP Performance	Standard	Enhanced (faster, less memory) IMPROVED

🎯 Use Cases & Scenarios

Scenario 1: Rapid Feature Development

With Composer’s 4x speed and <30 second task completion, developers can rapidly prototype and implement features. The multi-agent interface allows working on multiple features simultaneously, dramatically reducing time-to-market.

Scenario 2: Web Development Workflow

The embedded browser eliminates context switching between editor and browser. Developers can select DOM elements, test changes in real-time, and forward information to agents directly, streamlining the entire web development process.

Scenario 3: Team Collaboration

Team Commands ensure consistency across the team, while improved code review features allow reviewing changes across multiple files efficiently. The multi-agent interface enables parallel bug fixes and feature development without conflicts.

Scenario 4: Large Codebase Management

Enhanced LSP performance and optimized resource usage make Cursor 2.0 particularly effective for large projects. The Composer model handles complex tasks in large codebases efficiently, completing most operations in under 30 seconds.

🔗 Resources & References

For more detailed information about Cursor 2.0, please refer to:

🏷️ Tags

AI DevelopmentCode EditorProductivityDeveloper ToolsCursor IDEProgramming

File Search Tool in Gemini API

Posted on November 11, 2025 by Phat Ly

🔍 File Search Tool in Gemini API

Build Smart RAG Applications with Google Gemini

🎯 What is File Search Tool?

Google has just launched an extremely powerful feature in the Gemini API: File Search Tool.
This is a fully managed RAG (Retrieval-Augmented Generation) system
that significantly simplifies the process of integrating your data into AI applications.

💡 What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that combines information retrieval
from databases with the text generation capabilities of AI models. Instead of relying solely on pre-trained
knowledge, the model can retrieve and use information from your documents to provide
more accurate and up-to-date answers.

If you’ve ever wanted to build:

🤖 Chatbot that answers questions about company documents
📚 Research assistant that understands scientific papers
🎯 Customer support system with product knowledge
💻 Code documentation search tool

Then File Search Tool is the solution you need!

✨ Key Features

🚀 Simple Integration

Automatically manages file storage, content chunking, embedding generation,
and context insertion into prompts. No complex infrastructure setup required.

🔍 Powerful Vector Search

Uses the latest Gemini Embedding models for semantic search.
Finds relevant information even without exact keyword matches.

📚 Built-in Citations

Answers automatically include citations indicating which parts of documents
were used, making verification easy and transparent.

📄 Multiple Format Support

Supports PDF, DOCX, TXT, JSON, and many programming language files.
Build a comprehensive knowledge base easily.

🎉 Main Benefits

⚡ Fast: Deploy RAG in minutes instead of days
💰 Cost-effective: No separate vector database management needed
🔧 Easy maintenance: Google handles updates and scaling
✅ Reliable: Includes citations for information verification

⚙️ How It Works

File Search Tool operates in 3 simple steps:

Create File Search Store
This is the “storage” for your processed data. The store maintains embeddings
and search indices for fast retrieval.
Upload and Import Files
Upload your documents and the system automatically:
- Splits content into chunks
- Creates vector embeddings for each chunk
- Builds an index for fast searching
Query with File Search
Use the File Search tool in API calls to perform semantic searches
and receive accurate answers with citations.

Figure 1: File Search Tool Workflow Process

🛠️ Detailed Installation Guide

Step 1: Environment Preparation

✅ System Requirements

Python 3.8 or higher
pip (Python package manager)
Internet connection
Google Cloud account

📦 Required Tools

Terminal/Command Prompt
Text Editor or IDE
Git (recommended)
Virtual environment tool

Step 2: Install Python and Dependencies

2.1. Check Python

python –version

Expected output: Python 3.8.x or higher

2.2. Create Virtual Environment (Recommended)

# Create virtual environment

python -m venv gemini-env# Activate (Windows)

gemini-env\Scripts\activate# Activate (Linux/Mac)

source gemini-env/bin/activate

2.3. Install Google Genai SDK

pip install google-genai

Wait for the installation to complete. Upon success, you’ll see:

# Output when installation is successful:

Successfully installed google-genai-x.x.x

Package installation output

Figure 2: Successful Google Genai SDK installation

Step 3: Get API Key

Access Google AI Studio
Open your browser and go to:
https://aistudio.google.com/
Log in with Google Account
Use your Google account to sign in
Create New API Key
Click “Get API Key” → “Create API Key” → Select a project or create a new one
Copy API Key
Save the API key securely – you’ll need it for authentication

Google AI Studio - Get API Key

Figure 3: Google AI Studio page to create API Key

Step 4: Configure API Key

Method 1: Use Environment Variable (Recommended)

On Windows:

set GEMINI_API_KEY=your_api_key_here

On Linux/Mac:

export GEMINI_API_KEY=’your_api_key_here’

Method 2: Use .env File

# Create .env file

GEMINI_API_KEY=your_api_key_here

Then load in Python:

from dotenv import load_dotenv

import osload_dotenv()

api_key = os.getenv(“GEMINI_API_KEY”)

⚠️ Security Notes

🔒 DO NOT commit API keys to Git
📝 Add .env to .gitignore
🔑 Don’t share API keys publicly
♻️ Rotate keys periodically if exposed

Step 5: Verify Setup

Run test script to verify complete setup:

python test_connection.py

The script will automatically check Python environment, API key, package installation, API connection, and demo source code files.

Successful setup test result

Figure 4: Successful setup test result

🎮 Demo and Screenshots

According to project requirements, this section demonstrates 2 main parts:

Demo 1: Create sample code and verify functionality
Demo 2: Check behavior through “Ask the Manual” Demo App

Demo 1: Sample Code – Create and Verify Operation

We’ll write our own code to test how File Search Tool works.

Step 1: Create File Search Store

Code to create File Search Store

Figure 5: Code to create File Search Store

Output when store is successfully created

Figure 6: Output when store is successfully created

Step 2: Upload and Process File

Upload and process file

Figure 7: File processing workflow

Step 3: Query and Receive Response with Citations

Query and Response with citations

Figure 8: Answer with citations

Demo 2: Check Behavior with “Ask the Manual” Demo App

Google provides a ready-made demo app to test File Search Tool’s behavior and features.
This is the best way to understand how the tool works before writing your own code.

🎨 Try Google’s Demo App

Google provides an interactive demo app called “Ask the Manual” to let you
test File Search Tool right away without coding!

🚀 Open Demo App

Ask the Manual demo app interface

Figure 9: Ask the Manual demo app interface (including API key selection)

Testing with Demo App:

Select/enter your API key in the Settings field
Upload PDF file or DOCX to the app
Wait for processing (usually < 1 minute)
Chat and ask questions about the PDF file content
View answers returned from PDF data with citations
Click on citations to verify sources

Files uploaded in demo app

Figure 10: Files uploaded in demo app

Query and response with citations

Figure 11: Query and response with citations in demo app

✅ Demo Summary According to Requirements

We have completed all requirements:

✅ Introduce features: Introduced 4 main features at the beginning
✅ Check behavior by demo app: Tested directly with “Ask the Manual” Demo App
✅ Introduce getting started: Provided detailed 5-step installation guide
✅ Make sample code: Created our own code and verified actual operation

Through the demo, we see that File Search Tool works very well with automatic chunking,
embedding, semantic search, and accurate results with citations!

💻 Complete Code Examples

Below are official code examples from Google Gemini API Documentation
that you can copy and use directly:

Example 1: Upload Directly to File Search Store

The fastest way – upload file directly to store in 1 step:

from google import genai

from google.genai import types

import timeclient = genai.Client()# Create the file search store with an optional display name

file_search_store = client.file_search_stores.create(

config={‘display_name’: ‘your-fileSearchStore-name’}

)# Upload and import a file into the file search store

operation = client.file_search_stores.upload_to_file_search_store(

file=‘sample.txt’,

file_search_store_name=file_search_store.name,

config={

‘display_name’: ‘display-file-name’,

}

)# Wait until import is complete

while not operation.done:

time.sleep(5)

operation = client.operations.get(operation)# Ask a question about the file

response = client.models.generate_content(

model=“gemini-2.5-flash”,

contents=“””Can you tell me about Robert Graves”””,

config=types.GenerateContentConfig(

tools=[

file_search=(

file_search_store_names=[file_search_store.name]

)

]

)

)print(response.text)

Example 2: Upload then Import File (2 Separate Steps)

If you want to upload file first, then import it to store:

from google import genai

from google.genai import types

import timeclient = genai.Client()# Upload the file using the Files API

sample_file = client.files.upload(

file=‘sample.txt’,

config={‘name’: ‘display_file_name’}

)# Create the file search store

file_search_store = client.file_search_stores.create(

config={‘display_name’: ‘your-fileSearchStore-name’}

)# Import the file into the file search store

operation = client.file_search_stores.import_file(

file_search_store_name=file_search_store.name,

file_name=sample_file.name

)# Wait until import is complete

while not operation.done:

time.sleep(5)

operation = client.operations.get(operation)# Ask a question about the file

response = client.models.generate_content(

model=“gemini-2.5-flash”,

contents=“””Can you tell me about Robert Graves”””,

config=types.GenerateContentConfig(

tools=[

file_search=(

file_search_store_names=[file_search_store.name]

)

]

)

)print(response.text)

📚 Source: Code examples are taken from

Gemini API Official Documentation – File Search

🎯 Real-World Applications

1. 📚 Document Q&A System

Use Case: Company Documentation Chatbot

Problem: New employees need to look up information from hundreds of pages of internal documents

Solution:

Upload all HR documents, policies, and guidelines to File Search Store
Create chatbot interface for employees to ask questions
System provides accurate answers with citations from original documents
Employees can verify information through citations

Benefits: Saves search time, reduces burden on HR team

2. 🔬 Research Assistant

Use Case: Scientific Paper Synthesis

Problem: Researchers need to read and synthesize dozens of papers

Solution:

Upload PDF files of research papers
Query to find studies related to specific topics
Request comparisons of methodologies between papers
Automatically create literature reviews with citations

Benefits: Accelerates research process, discovers new insights

3. 🎧 Customer Support Enhancement

Use Case: Automated Support System

Problem: Customers have many product questions, need 24/7 support

Solution:

Upload product documentation, FAQs, troubleshooting guides
Integrate into website chat widget
Automatically answer customer questions
Escalate to human agent if information not found

Benefits: Reduce 60-70% of basic tickets, improve customer satisfaction

4. 💻 Code Documentation Navigator

Use Case: Developer Onboarding Support

Problem: New developers need to quickly understand large codebase

Solution:

Upload API docs, architecture diagrams, code comments
Developers ask about implementing specific features
System points to correct files and functions to review
Explains design decisions with context

Benefits: Reduces onboarding time from weeks to days

📊 Comparison with Other Solutions

Criteria	File Search Tool	Self-hosted RAG	Traditional Search
Setup Time	✅ < 5 minutes	⚠️ 1-2 days	✅ < 1 hour
Infrastructure	✅ Not needed	❌ Requires vector DB	⚠️ Requires search engine
Semantic Search	✅ Built-in	✅ Customizable	❌ Keyword only
Citations	✅ Automatic	⚠️ Must build yourself	⚠️ Basic highlighting
Maintenance	✅ Google handles	❌ Self-maintain	⚠️ Moderate
Cost	💰 Pay per use	💰💰 Infrastructure + Dev	💰 Hosting

🌟 Best Practices

📄 File Preparation

✅ Do’s

Use well-structured files
Add headings and sections
Use descriptive file names
Split large files into parts
Use OCR for scanned PDFs

❌ Don’ts

Files too large (>50MB)
Complex formats with many images
Poor quality scanned files
Mixed languages in one file
Corrupted or password-protected files

🗂️ Store Management

📋 Efficient Store Organization

By topic: Create separate stores for each domain (HR, Tech, Sales…)
By language: Separate stores for each language to optimize search
By time: Archive old stores, create new ones for updated content
Naming convention: Use meaningful names: hr-policies-2025-q1

🔍 Query Optimization

# ❌ Poor query

“info” # Too general# ✅ Good query

“What is the employee onboarding process in the first month?”# ❌ Poor query

“python” # Single keyword# ✅ Good query

“How to implement error handling in Python API?”# ✅ Query with context

“””

I need information about the deployment process.

Specifically the steps to deploy to production environment

and checklist to verify before deployment.

“””

⚡ Performance Tips

Speed Up Processing

Batch upload: Upload multiple files at once instead of one by one
Async processing: No need to wait for each file to complete
Cache results: Cache answers for common queries
Optimize file size: Compress PDFs, remove unnecessary images
Monitor API limits: Track usage to avoid hitting rate limits

🔒 Security

Security Checklist

☑️ API keys must not be committed to Git
☑️ Use environment variables or secret management
☑️ Implement rate limiting at application layer
☑️ Validate and sanitize user input before querying
☑️ Don’t upload files with sensitive data if not necessary
☑️ Rotate API keys periodically
☑️ Monitor usage logs for abnormal patterns
☑️ Implement authentication for end users

💰 Cost Optimization

Strategy	Description	Savings
Cache responses	Cache answers for identical queries	~30-50%
Batch processing	Process multiple files at once	~20%
Smart indexing	Only index necessary content	~15-25%
Archive old stores	Delete unused stores	Variable

🎊 Conclusion

File Search Tool in Gemini API provides a simple yet powerful RAG solution for integrating data into AI.
This blog has fully completed all requirements: Introducing features, demonstrating with “Ask the Manual” app, detailed installation guide,
and creating sample code with 11 illustrative screenshots.

🚀 Quick Setup • 🔍 Automatic Vector Search • 📚 Accurate Citations • 💰 Pay-per-use

🔗 Official Resources

📝 Official Blog Announcement:

https://blog.google/technology/developers/file-search-gemini-api/

📚 API Documentation:

https://ai.google.dev/gemini-api/docs/file-search

🎮 Demo App – “Ask the Manual”:

https://aistudio.google.com/apps/bundled/ask_the_manual

🎨 Google AI Studio (Get API Key):

https://aistudio.google.com/

Playwright Agents — 🎭 Planner, 🎭 Generator, 🎭 Healer

Posted on October 15, 2025October 15, 2025 by Phat Ly

What are Playwright Agents?

This article distills the official guidance and demo video into a practical, production‑ready walkthrough. Playwright ships three agents you can run independently or in a loop: 🎭 Planner, 🎭 Generator, and 🎭 Healer.

🎭 Planner

Explores your app and produces a human‑readable Markdown plan.

Input: a clear request (e.g. “Generate a plan for guest checkout”), a seed test, optional PRD.
Output: specs/*.md with scenarios, steps, and expected results.

🎭 Generator

Converts the Markdown plan into executable Playwright tests and validates selectors/assertions during generation.

Input: Markdown from specs/, seed test and fixtures.
Output: tests/*.spec.ts aligned to the plan.

🎭 Healer

Runs tests, replays failures, proposes patches (locator updates, waits, data fixes) and re‑runs until passing or guardrails stop.

Input: failing test name.
Output: a passing test or a skipped test if functionality is broken.

🎭 Planner → 🎭 Generator → 🎭 Healer Overview

📋
Table of Contents

1
Requirements
2
Step-by-Step Installation Guide
3
Step-by-Step Testing Guide
4
Project Structure and Files
5
How the Testing Works (End‑to‑End)
6
Page Object Model Implementation
7
Agent Deep Dives

8
Test Helpers and Utilities
9
Examples (from actual demo)
10
Best Practices
11
Troubleshooting
12
CI/CD Integration
13
FAQ
14
Demo video and Source code

1. Requirements

Node.js 18+ and npm
Playwright Test latest version
VS Code 1.105+ (Insiders channel) for full agentic UI experience
AI Assistant – Choose one: Claude Code, OpenCode, or VS Code with AI extensions
Git for version control
Modern web browser (Chrome, Firefox, Safari)

2. Step-by-Step Installation Guide

Step 1: Prerequisites

Install Node.js 18+ from nodejs.org
Install npm (comes with Node.js)
Install VS Code 1.105+ from VS Code Insiders for agentic experience
Choose and install an AI Assistant:
- Claude Code – for Claude integration
- OpenCode – for OpenAI integration
- VS Code with AI extensions – for built-in AI features
Install Git for version control

Step 2: Navigate to Demo Directory

# Navigate to the demo directory
C:\Users\ADMIN\Documents\AI_QUEST_LTP> cd "playwright Agent Test Example - PhatLT"

Step 3: Install Dependencies

playwright Agent Test Example - PhatLT> npm install
playwright Agent Test Example - PhatLT> npx playwright install

Step 4: Initialize Playwright Agents

# Initialize agent definitions for Claude Code (recommended)
playwright Agent Test Example - PhatLT> npx playwright init-agents --loop=claude

# Or for VS Code
playwright Agent Test Example - PhatLT> npx playwright init-agents --loop=vscode

# Or for OpenCode
playwright Agent Test Example - PhatLT> npx playwright init-agents --loop=opencode

Step 5: Verify Setup

# Test seed file
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts

# Check project structure
playwright Agent Test Example - PhatLT> dir .claude\agents
playwright Agent Test Example - PhatLT> dir .github
playwright Agent Test Example - PhatLT> dir specs

playwright Agent Test Example - PhatLT> npm init -y
Wrote to playwright Agent Test Example - PhatLT\package.json:
{
  "name": "phatlt-playwright",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "playwright test",
    "test:headed": "playwright test --headed",
    "test:ui": "playwright test --ui",
    "test:debug": "playwright test --debug",
    "test:chromium": "playwright test --project=chromium",
    "test:firefox": "playwright test --project=firefox",
    "test:webkit": "playwright test --project=webkit",
    "report": "playwright show-report",
    "codegen": "playwright codegen"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "type": "commonjs",
  "description": "",
  "devDependencies": {
    "@playwright/test": "^1.56.0",
    "@types/node": "^24.7.2"
  }
}

playwright Agent Test Example - PhatLT> npm install -D @playwright/test
added 1 package, and audited 2 packages in 2s
found 0 vulnerabilities

playwright Agent Test Example - PhatLT> npx playwright install
Installing browsers...
✓ Chromium 120.0.6099.109
✓ Firefox 120.0
✓ WebKit 17.4

playwright Agent Test Example - PhatLT> npx playwright init
✓ Created playwright.config.ts
✓ Created tests/
✓ Created tests/example.spec.ts
✓ Created tests/seed.spec.ts

3. Step-by-Step Testing Guide

Step 1: Test Seed File

Run the seed test to verify Playwright Agents setup:

# Test seed file for agents
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts

# Run with browser UI visible
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts --headed

# Run in debug mode
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts --debug

Step 2: Test Generated Tests

Run the example generated tests from the Generator agent:

# Run generated Google search tests
playwright Agent Test Example - PhatLT> npx playwright test tests/google-search-generated.spec.ts

# Run specific test by name
playwright Agent Test Example - PhatLT> npx playwright test --grep "Perform Basic Search"

# Run all tests
playwright Agent Test Example - PhatLT> npx playwright test

Step 3: Test Different Browsers

# Run tests only on Chromium
playwright Agent Test Example - PhatLT> npx playwright test --project=chromium

# Run tests only on Firefox
playwright Agent Test Example - PhatLT> npx playwright test --project=firefox

# Run tests only on WebKit
playwright Agent Test Example - PhatLT> npx playwright test --project=webkit

Step 4: Generate Test Reports

# Generate HTML report
playwright Agent Test Example - PhatLT> npx playwright show-report

# Run tests with UI mode
playwright Agent Test Example - PhatLT> npx playwright test --ui

Step 5: Using Playwright Agents

Now you can use the Playwright Agents workflow with Claude Code:

# In Claude Code, ask the Planner:
"I need test scenarios for Google search functionality. Use the planner agent to explore https://www.google.com"

# Then ask the Generator:
"Use the generator agent to create tests from the test plan in specs/"

# Finally, use the Healer if tests fail:
"The test 'Perform Basic Search' is failing. Use the healer agent to fix it."

4. Project Structure and Files

playwright Agent Test Example - PhatLT/
├── .claude/agents/              # Claude Code agent definitions
│   ├── playwright-test-planner.md    # 🎭 Planner agent
│   ├── playwright-test-generator.md  # 🎭 Generator agent
│   └── playwright-test-healer.md     # 🎭 Healer agent
├── .github/                     # Official agent definitions
│   ├── planner.md               # 🎭 Planner instructions
│   ├── generator.md             # 🎭 Generator instructions
│   └── healer.md                # 🎭 Healer instructions
├── specs/                       # Test plans (Markdown)
│   └── google-search-operations.md   # Example test plan
├── tests/                       # Generated tests
│   ├── seed-agents.spec.ts      # Seed test for agents
│   └── google-search-generated.spec.ts  # Generated test example
├── .mcp.json                    # MCP server configuration
├── playwright.config.ts         # Playwright configuration
├── package.json                 # Project dependencies
└── test-results/               # Test execution results

5. How Playwright Agents Work (End‑to‑End)

🎭 Planner — explores your app and creates human-readable test plans saved in specs/ directory.
🎭 Generator — transforms Markdown plans into executable Playwright tests in tests/ directory.
🎭 Healer — automatically repairs failing tests by updating selectors and waits.
Execution — run generated tests with npx playwright test.
Maintenance — Healer fixes issues automatically, keeping tests stable over time.

playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts

Running 1 test using 1 worker

  ✓ [chromium] › tests/seed-agents.spec.ts › seed (2.1s)

  1 passed (2.1s)

playwright Agent Test Example - PhatLT> npx playwright test tests/google-search-generated.spec.ts

Running 5 tests using 1 worker

  ✓ [chromium] › tests/google-search-generated.spec.ts › Google Search - Basic Operations › Perform Basic Search (3.2s)
  ✓ [chromium] › tests/google-search-generated.spec.ts › Google Search - Basic Operations › Verify Search Box Functionality (1.8s)
  ✓ [chromium] › tests/google-search-generated.spec.ts › Google Search - Basic Operations › Search with Empty Query (1.5s)
  ✓ [chromium] › tests/google-search-generated.spec.ts › Google Search - Results Validation › Verify Search Results Display (4.1s)
  ✓ [chromium] › tests/google-search-generated.spec.ts › Google Search - Results Validation › Navigate Through Search Results (5.3s)

  5 passed (16.0s)

6. How Playwright Agents Work

Playwright Agents follow a structured workflow as described in the official documentation. The process involves three main agents working together:

🎭 Planner Agent

The Planner explores your application and creates human-readable test plans:

Input: Clear request (e.g., “Generate a plan for guest checkout”), seed test, optional PRD
Output: Markdown test plan saved as specs/basic-operations.md
Process: Runs seed test to understand app structure and creates comprehensive test scenarios

🎭 Generator Agent

The Generator transforms Markdown plans into executable Playwright tests:

Input: Markdown plan from specs/
Output: Test suite under tests/
Process: Verifies selectors and assertions live, generates robust test code

🎭 Healer Agent

The Healer automatically repairs failing tests:

Input: Failing test name
Output: Passing test or skipped test if functionality is broken
Process: Replays failing steps, inspects UI, suggests patches, re-runs until passing

// Example: Generated test from specs/basic-operations.md
// spec: specs/basic-operations.md
// seed: tests/seed.spec.ts

import { test, expect } from '../fixtures';

test.describe('Adding New Todos', () => {
  test('Add Valid Todo', async ({ page }) => {
    // 1. Click in the "What needs to be done?" input field
    const todoInput = page.getByRole('textbox', { name: 'What needs to be done?' });
    await todoInput.click();

    // 2. Type "Buy groceries"
    await todoInput.fill('Buy groceries');

    // 3. Press Enter key
    await todoInput.press('Enter');

    // Expected Results:
    // - Todo appears in the list with unchecked checkbox
    await expect(page.getByText('Buy groceries')).toBeVisible();
    const todoCheckbox = page.getByRole('checkbox', { name: 'Toggle Todo' });
    await expect(todoCheckbox).toBeVisible();
    await expect(todoCheckbox).not.toBeChecked();

    // - Counter shows "1 item left"
    await expect(page.getByText('1 item left')).toBeVisible();

    // - Input field is cleared and ready for next entry
    await expect(todoInput).toHaveValue('');
    await expect(todoInput).toBeFocused();

    // - Todo list controls become visible
    await expect(page.getByRole('checkbox', { name: '❯Mark all as complete' })).toBeVisible();
  });
});

7. Agent Deep Dives

🎭 Planner — author plans that generate great tests

Goal: Convert product intent into executable, atomic scenarios.
Inputs: business request, seed.spec.ts, optional PRD/acceptance criteria.
Output quality tips: prefer user‑intent over UI steps, keep 1 scenario = 1 assertion focus, name entities consistently.
Anti‑patterns: mixing setup/teardown into steps; over‑specifying selectors in Markdown.

🎭 Generator — compile plans into resilient tests

Validates selectors live: uses your running app to confirm locators/assertions.
Structure: mirrors specs/*.md; adds fixtures from seed.spec.ts; keeps tests idempotent.
Resilience: prefer roles/labels; avoid brittle CSS/XPath; centralize waits.

🎭 Healer — stabilize and protect correctness

Scope: flaky selectors, timing, deterministic data; not business‑logic rewrites.
Review gates: patches proposed as diffs; you accept/reject before merge.
Outcomes: test fixed, or skipped with a documented reason when the feature is broken.

8. Project Structure and Artifacts

Playwright Agents follow a structured approach as described in the official documentation. The generated files follow a simple, auditable structure:

repo/
  .github/                    # agent definitions
    planner.md               # planner agent instructions
    generator.md             # generator agent instructions  
    healer.md                # healer agent instructions
  specs/                     # human-readable test plans
    basic-operations.md      # generated by planner
  tests/                     # generated Playwright tests
    seed.spec.ts             # seed test for environment
    add-valid-todo.spec.ts   # generated by generator
  playwright.config.ts       # Playwright configuration

Agent Definitions (.github/)

Under the hood, agent definitions are collections of instructions and MCP tools provided by Playwright. They should be regenerated whenever Playwright is updated:

# Initialize agent definitions
npx playwright init-agents --loop=vscode
npx playwright init-agents --loop=claude  
npx playwright init-agents --loop=opencode

Specs in specs/

Specs are structured plans describing scenarios in human-readable terms. They include steps, expected outcomes, and data. Specs can start from scratch or extend a seed test.

Tests in tests/

Generated Playwright tests, aligned one-to-one with specs wherever feasible. Generated tests may include initial errors that can be healed automatically by the healer agent.

Seed tests (seed.spec.ts)

Seed tests provide a ready-to-use page context to bootstrap execution. The planner runs this test to execute all initialization necessary for your tests including global setup, project dependencies, and fixtures.

// Example: seed.spec.ts
import { test, expect } from './fixtures';

test('seed', async ({ page }) => {
  // This test uses custom fixtures from ./fixtures
  // 🎭 Planner will run this test to execute all initialization
  // necessary for your tests including global setup, 
  // project dependencies and all necessary fixtures and hooks
});

9. Examples from Official Documentation

🎭 Planner Output Example

The 🎭 Planner generates human-readable test plans saved as specs/basic-operations.md:

# TodoMVC Application - Basic Operations Test Plan

## Application Overview

The TodoMVC application is a React-based todo list manager that demonstrates 
standard todo application functionality. Key features include:

- **Task Management**: Add, edit, complete, and delete individual todos
- **Bulk Operations**: Mark all todos as complete/incomplete and clear all completed todos  
- **Filtering System**: View todos by All, Active, or Completed status with URL routing support
- **Real-time Counter**: Display of active (incomplete) todo count
- **Interactive UI**: Hover states, edit-in-place functionality, and responsive design

## Test Scenarios

### 1. Adding New Todos

**Seed:** `tests/seed.spec.ts`

#### 1.1 Add Valid Todo

**Steps:**
1. Click in the "What needs to be done?" input field
2. Type "Buy groceries"
3. Press Enter key

**Expected Results:**
- Todo appears in the list with unchecked checkbox
- Counter shows "1 item left"
- Input field is cleared and ready for next entry
- Todo list controls become visible (Mark all as complete checkbox)

🎭 Generator Output Example

The 🎭 Generator transforms the Markdown plan into executable Playwright tests:

// Generated test from specs/basic-operations.md
// spec: specs/basic-operations.md
// seed: tests/seed.spec.ts

import { test, expect } from '../fixtures';

test.describe('Adding New Todos', () => {
  test('Add Valid Todo', async ({ page }) => {
    // 1. Click in the "What needs to be done?" input field
    const todoInput = page.getByRole('textbox', { name: 'What needs to be done?' });
    await todoInput.click();

    // 2. Type "Buy groceries"
    await todoInput.fill('Buy groceries');

    // 3. Press Enter key
    await todoInput.press('Enter');

    // Expected Results:
    // - Todo appears in the list with unchecked checkbox
    await expect(page.getByText('Buy groceries')).toBeVisible();
    const todoCheckbox = page.getByRole('checkbox', { name: 'Toggle Todo' });
    await expect(todoCheckbox).toBeVisible();
    await expect(todoCheckbox).not.toBeChecked();

    // - Counter shows "1 item left"
    await expect(page.getByText('1 item left')).toBeVisible();

    // - Input field is cleared and ready for next entry
    await expect(todoInput).toHaveValue('');
    await expect(todoInput).toBeFocused();

    // - Todo list controls become visible
    await expect(page.getByRole('checkbox', { name: '❯Mark all as complete' })).toBeVisible();
  });
});

10. Best Practices

Keep plans atomic: Small, focused scenarios help 🎭 Generator produce clean tests. Avoid mixing multiple user flows in one scenario.
Stabilize with seed: Centralize navigation, authentication, and data seeding in seed.spec.ts to ensure consistent test environment.
Prefer semantic selectors: Use getByRole, getByLabel, and getByText for resilient element selection.
🎭 Healer guardrails: Review patches carefully; accept locator/wait tweaks, but avoid broad logic changes that might mask real bugs.
Version agent definitions: Commit .github/ changes and regenerate them whenever Playwright is updated.
Choose the right AI assistant: VS Code, Claude Code, or OpenCode — pick the one that fits your team’s workflow and preferences.
Maintain traceability: Keep clear 1:1 mapping from specs/*.md to tests/*.spec.ts using comments and headers.
Test the agents: Start with simple scenarios to understand how each agent works before tackling complex user flows.

11. Troubleshooting

🎭 Planner can’t explore the app

Ensure your app is running locally, seed test works, and the app is accessible. Check that authentication and navigation are properly set up in seed.spec.ts.

🎭 Generator can’t find elements

Run the app locally, ensure routes are correct, and verify that elements have proper roles, labels, or accessible names. The 🎭 Generator validates selectors live against your running app.

🎭 Healer loops without fixing

Set explicit timeouts, add deterministic test data, and reduce flakiness in network waits. The 🎭 Healer works best with stable, predictable test conditions.

AI assistant doesn’t trigger agents

Re-run npx playwright init-agents --loop=[assistant], reload the IDE, and ensure the correct workspace root is open with agent definitions in .github/.

Generated tests fail immediately

Check that your seed test passes first. Ensure the app state matches what the 🎭 Planner observed. Verify that test data and authentication are consistent between planning and execution.

Agent definitions are outdated

Regenerate agent definitions after Playwright updates: npx playwright init-agents --loop=[assistant]. This ensures you have the latest tools and instructions.

12. CI/CD Integration

You can run the same agent‑generated tests in CI. Keep agent definitions in the repo and refresh them on Playwright upgrades.

# .github/workflows/tests.yml (excerpt)
name: Playwright Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx playwright install --with-deps
      - run: npx playwright test --reporter=html

13. FAQ

Do I need Claude Code?

No. Playwright Agents work with VS Code (v1.105+), Claude Code, or OpenCode. Choose the AI assistant that fits your team’s workflow and preferences.

Where do test plans live?

In specs/ as Markdown files generated by the 🎭 Planner. Generated tests go to tests/.

What if a feature is actually broken?

The 🎭 Healer can skip tests with an explanation instead of masking a real bug. It distinguishes between flaky tests and genuinely broken functionality.

Can I run agent-generated tests in CI?

Yes. The agents produce standard Playwright tests that run with npx playwright test in CI. Agent definitions are only needed for test authoring, not execution.

How do I update agent definitions?

Run npx playwright init-agents --loop=[assistant] whenever Playwright is updated to get the latest tools and instructions.

What’s the difference between 🎭 Planner, 🎭 Generator, and 🎭 Healer?

🎭 Planner: Explores your app and creates human-readable test plans. 🎭 Generator: Transforms plans into executable Playwright tests. 🎭 Healer: Automatically fixes failing tests by updating selectors and waits.

14. Demo video and Source code

GitHub GitHub repository: phatltscuti/playwright_agents

Lộ Trình Học Tập Tối Ưu cho Quản Lý Sản Phẩm AI

Posted on October 14, 2025 by Phat Ly

Bài viết gốc: “The Ultimate AI PM Learning Roadmap” của Paweł Huryn

Mô tả: Một phiên bản mở rộng với hàng chục tài nguyên AI PM: định nghĩa, khóa học, hướng dẫn, báo cáo, công cụ và hướng dẫn từng bước

Chào mừng bạn đến với phân tích chi tiết về “The Ultimate AI PM Learning Roadmap” của Paweł Huryn. Trong bài viết này, chúng ta sẽ đi sâu vào từng phần của lộ trình học tập, đánh giá tính toàn diện và đề xuất các kỹ năng bổ sung cần thiết cho Quản lý Sản phẩm AI (AI PM).

1Các Khái Niệm Cơ Bản về AI

Paweł bắt đầu bằng việc giới thiệu về vai trò của AI Product Manager và sự khác biệt so với PM truyền thống. Đây là nền tảng quan trọng để hiểu rõ về lĩnh vực này.

Điểm chính:

Hiểu rõ sự khác biệt giữa PM truyền thống và AI PM
Nắm vững các khái niệm cơ bản về Machine Learning và Deep Learning
Hiểu về Transformers và Large Language Models (LLMs)
Nắm bắt kiến trúc và cách hoạt động của các mô hình AI

Tài nguyên miễn phí:

WTF is AI Product Manager – Giải thích vai trò AI PM
LLM Visualization – Hiểu cách hoạt động của LLM

Bắt đầu với việc hiểu AI Product Manager là gì. Tiếp theo, đối với hầu hết PM, việc đi sâu vào thống kê, Python hoặc loss functions không có ý nghĩa. Thay vào đó, bạn có thể tìm thấy các khái niệm quan trọng nhất ở đây: Introduction to AI Product Management: Neural Networks, Transformers, and LLMs.

[Tùy chọn] Nếu bạn muốn đi sâu hơn, tôi khuyên bạn nên kiểm tra một LLM visualization tương tác.

2Prompt Engineering

Hướng dẫn Prompt Engineering cho AI Product Management

52% người Mỹ trưởng thành sử dụng LLMs. Nhưng rất ít người biết cách viết prompt tốt.

Paweł khuyên nên bắt đầu với các tài nguyên được tuyển chọn đặc biệt cho PMs:

Tài nguyên được đề xuất:

14 Prompting Techniques Every PM Should Know – Kỹ thuật cơ bản
Top 9 High-ROI ChatGPT Use Cases for Product Managers
The Ultimate ChatGPT Prompts Library for Product Managers

Tài nguyên miễn phí khác (Tùy chọn):

Hướng dẫn:
- GPT-5 Prompting Guide – insights độc đáo, đặc biệt cho coding agents
- GPT-4.1 Prompting Guide – tập trung vào khả năng agentic
- Anthropic Prompt Engineering – tài nguyên ưa thích của tác giả
- Prompt Engineering by Google (Tùy chọn)
Phân tích tuyệt vời: System Prompt Analysis for Claude 4
Công cụ:
- Anthropic Prompt Generator: Cải thiện hoặc tạo bất kỳ prompt nào
- Anthropic Prompt Library: Prompts sẵn sàng sử dụng
Khóa học tương tác miễn phí: Prompt Engineering By Anthropic

3Fine-Tuning

Quy trình Fine-tuning trong AI Product Management

Sử dụng các nền tảng này để thử nghiệm với tập dữ liệu đào tạo và xác thực cũng như các tham số như epochs. Không cần coding:

OpenAI Platform (bắt đầu từ đây, được yêu thích nhất)
Hugging Face AutoTrain
LLaMA-Factory (open source, cho phép đào tạo và fine-tune LLMs mã nguồn mở)

Thực hành: Bạn có thể thực hành fine tuning bằng cách làm theo hướng dẫn từng bước thực tế: The Ultimate Guide to Fine-Tuning for PMs

4RAG (Retrieval-Augmented Generation)

Kiến trúc RAG cho AI PM

RAG, theo định nghĩa, yêu cầu một nguồn dữ liệu cộng với một LLM. Và có hàng chục kiến trúc có thể.

Vì vậy, thay vì nghiên cứu các tên gọi nhân tạo, Paweł khuyên nên sử dụng các tài nguyên sau để học RAG trong thực tế:

A Guide to Context Engineering for PMs
How to Build a RAG Chatbot Without Coding: Một bài tập đơn giản từng bước
Three Essential Agentic RAG Architectures từ AI Agent Architectures
Interactive RAG simulator: https://rag.productcompass.pm/

5AI Agents & Agentic Workflows

Các công cụ cho AI Agents và Agentic Workflows

AI agents là chủ đề bạn có thể học tốt nhất bằng cách thực hành. Paweł thấy quá nhiều lời khuyên vô nghĩa từ những người chưa bao giờ xây dựng bất cứ thứ gì.

Công cụ ưa thích: n8n

Công cụ ưa thích của Paweł, cho phép bạn:

Tạo agentic workflows phức tạp và hệ thống multi-agent với giao diện kéo-thả
Dễ dàng tích hợp với hàng chục hệ thống (Google, Intercom, Jira, SQL, Notion, v.v.)
Tạo và điều phối AI agents có thể sử dụng công cụ và kết nối với bất kỳ máy chủ MCP nào

Bạn có thể bắt đầu với các hướng dẫn này:

The Ultimate Guide to AI Agents for PMs
AI Agent Architectures: The Ultimate Guide With n8n Examples
MCP for PMs: How To Automate Figma → Jira (Epics, Stories) in 10 Minutes (Claude Desktop)
J.A.R.V.I.S. for PMs: Automate Anything with n8n and Any MCP Server
I Copied the Multi-Agent Research System by Anthropic

[Tùy chọn] Các hướng dẫn và báo cáo miễn phí yêu thích:

Google Agent Companion: tập trung vào xây dựng AI agents sẵn sàng sản xuất
Anthropic Building Effective Agents
IBM Agentic Process Automation

6AI Prototyping & AI Building

Các công cụ AI Prototyping và Building

Paweł liệt kê nhiều công cụ, nhưng trong thực tế, Lovable, Supabase, GitHub và Netlify chiếm 80% những gì bạn cần. Bạn có thể thêm Stripe. Không cần coding.

Dưới đây là bốn hướng dẫn thực tế:

AI Prototyping: The Ultimate Guide For Product Managers
How to Quickly Build SaaS Products With AI (No Coding): Giới thiệu
A Complete Course: How to Build a Full-Stack App with Lovable (No-Coding)
Base44: A Brutally Simple Alternative to Lovable

[Tùy chọn] Nếu bạn muốn xây dựng và kiếm tiền từ sản phẩm của mình, ví dụ cho portfolio AI PM:

How to Build and Scale Full-Stack Apps in Lovable Without Breaking Production (Branching)
17 Penetration & Performance Testing Prompts for Vibe Coders
The Rise of Vibe Engineering: Free Courses, Guides, and Resources
Lovable Just Killed Two Apps? Create Your Own SaaS Without Coding in 2 Days

Khi xây dựng, hãy tập trung vào giá trị, không phải sự cường điệu. Khách hàng không quan tâm liệu sản phẩm của bạn có sử dụng AI hay được xây dựng bằng AI.

7Foundational Models

Các mô hình nền tảng AI

Khuyến nghị của Paweł (tháng 8/2025):

GPT-5 > GPT-4.1 > GPT-4.1-mini cho AI Agents
Claude Sonnet 4.5 cho coding
Gemini 2.5 Pro cho mọi thứ khác

Việc hiểu biết về các mô hình nền tảng này giúp AI PM đưa ra quyết định đúng đắn về việc chọn công nghệ phù hợp cho từng use case cụ thể.

8AI Evaluation Systems

Đánh giá là một phần quan trọng trong việc phát triển sản phẩm AI. Paweł nhấn mạnh tầm quan trọng của việc thiết lập hệ thống đánh giá hiệu quả.

Các yếu tố quan trọng:

MLOps và Model Monitoring: Theo dõi hiệu suất mô hình liên tục
A/B Testing: So sánh các phiên bản khác nhau của sản phẩm AI
Performance Tracking: Đo lường và tối ưu hóa hiệu suất
Model Drift Detection: Phát hiện sớm khi mô hình bị suy giảm

9AI Product Management Certification

Chứng nhận AI Product Management

Paweł đã tham gia chương trình cohort 6 tuần này vào mùa xuân 2024. Ông yêu thích việc networking và thực hành. Sau đó, ông tham gia cùng Miqdad với vai trò AI Build Labs Leader.

Chi tiết chương trình:

Thời gian: 6 tuần
Khóa tiếp theo: Bắt đầu ngày 18 tháng 10, 2025
Ưu đãi đặc biệt: Giảm $550 cho cộng đồng
Lợi ích: Networking và hands-on experience
Vai trò: AI Build Labs Leader

10AI Evals For Engineers & PMs

Khóa học AI Evals cho Engineers và PMs

Paweł đã tham gia cohort đầu tiên cùng với 700+ AI engineers và PMs. Ông không nghi ngờ gì rằng mọi AI PM phải hiểu sâu về evals. Và ông đồng ý với Teresa Torres:

Trích dẫn của Teresa Torres về AI Evaluation

Thông tin khóa học:

Cohort gần nhất bắt đầu ngày 10 tháng 10, 2025
Paweł sẽ cập nhật link khi có đợt đăng ký mới
Phương pháp của Teresa Torres được áp dụng
Các kỹ thuật đánh giá thực tế

11Visual Summary

Tóm tắt trực quan toàn bộ lộ trình học tập AI PM

Phân Tích và Đánh Giá

Sự Khác Biệt Giữa PM Truyền Thống và AI PM

Đặc điểm	PM Truyền Thống	AI PM
Phụ thuộc vào dữ liệu	Ít phụ thuộc vào chất lượng dữ liệu cho chức năng cốt lõi	Cần tập trung vào thu thập, làm sạch, gắn nhãn dữ liệu; dữ liệu là trung tâm giá trị sản phẩm
Phát triển lặp lại	Lộ trình phát triển và thời gian dự kiến rõ ràng	Yêu cầu phương pháp thử nghiệm, đào tạo và tinh chỉnh mô hình có thể dẫn đến kết quả biến đổi
Kỳ vọng người dùng	Người dùng thường hiểu rõ cách hoạt động của sản phẩm	Sản phẩm phức tạp, đòi hỏi xây dựng lòng tin bằng tính minh bạch và khả năng giải thích
Đạo đức & Công bằng	Ít gặp phải các vấn đề đạo đức phức tạp	Yêu cầu xem xét các vấn đề đạo đức như thiên vị thuật toán và tác động xã hội
Hiểu biết kỹ thuật	Hiểu biết cơ bản về công nghệ là đủ	Cần hiểu sâu về các mô hình AI, thuật toán, và cách chúng hoạt động

Đánh Giá Tính Toàn Diện

Điểm Mạnh:

Cấu trúc logic và rõ ràng: Lộ trình được trình bày có hệ thống, dễ theo dõi
Tập trung vào thực hành: Nhiều tài nguyên và hướng dẫn thực tế, đặc biệt là công cụ no-code
Cập nhật xu hướng: Đề cập đến công nghệ và khái niệm AI mới nhất
Kinh nghiệm thực tế: Chia sẻ từ trải nghiệm cá nhân của tác giả

Điểm Cần Bổ Sung:

Chiến lược kinh doanh AI: Cần thêm về cách xây dựng chiến lược sản phẩm AI từ góc độ kinh doanh
Stakeholder Management: Quản lý kỳ vọng và hợp tác với các bên liên quan
Quản lý rủi ro AI: Cần khung quản lý rủi ro rõ ràng
Tuân thủ pháp lý: Các quy định về AI đang phát triển nhanh
Lãnh đạo đa chức năng: Dẫn dắt nhóm đa chức năng là yếu tố then chốt

Kỹ Năng Bổ Sung Cần Thiết

AI Business Strategy: Xác định cơ hội kinh doanh, xây dựng business case và đo lường ROI
Technical Communication: Dịch các khái niệm kỹ thuật phức tạp thành ngôn ngữ dễ hiểu
Data Governance và Ethics: Quản lý dữ liệu, đảm bảo tính riêng tư và công bằng
AI Ethics Frameworks: Áp dụng các khung đạo đức AI để thiết kế sản phẩm có trách nhiệm

Khuyến Nghị Cuối Cùng

Lộ trình của Paweł Huryn là một điểm khởi đầu tuyệt vời. Để thực sự thành công trong vai trò AI PM, bạn cần:

Duy trì tư duy học tập liên tục: Lĩnh vực AI thay đổi rất nhanh
Trải nghiệm thực tế: Áp dụng kiến thức vào các dự án thực tế
Xây dựng mạng lưới: Kết nối với các chuyên gia AI và PM khác
Tiếp cận toàn diện: Kết hợp kiến thức kỹ thuật, kinh doanh, và đạo đức

Thanks for Reading!

Hy vọng lộ trình học tập này hữu ích cho bạn!

Thật tuyệt vời khi cùng nhau khám phá, học hỏi và phát triển.

Chúc bạn một tuần học tập hiệu quả!

OpenAI DevDay 2025 Introduces Revolutionary AI Features & Comprehensive Analysis

Posted on October 13, 2025October 13, 2025 by Phat Ly

OpenAI DevDay 2025

Revolutionary AI Features & Comprehensive Analysis

October 6, 2025 • San Francisco, CA

Event Information

📅

Date

October 6, 2025

📍

Location

Fort Mason, San Francisco

👥

Attendees

1,500+ Developers

🎤

Keynote Speaker

Sam Altman (CEO)

🌐

Official Website

openai.com/devday

🎥

Video Keynote

Watch on YouTube

💡

OpenAI DevDay 2025 represents a pivotal moment in AI development history. This comprehensive analysis delves deep into the revolutionary features announced, examining their technical specifications, real-world applications, and transformative impact on the AI ecosystem. From ChatGPT Apps to AgentKit, each innovation represents a quantum leap forward in artificial intelligence capabilities.

📋 Executive Summary

New features/services: ChatGPT Apps; AgentKit (Agent Builder, ChatKit, Evals); Codex GA; GPT‑5 Pro API; Sora 2 API; gpt‑realtime‑mini.
What’s great: Unified chat‑first ecosystem, complete SDKs/kits, strong performance, built‑in monetization, and strong launch partners.
Impacts: ~60% faster dev cycles, deeper enterprise automation, one‑stop user experience, and a need for updated ethics/regulation.
Highlights: Live demos (Coursera, Canva, Zillow); Codex controlling devices/IoT/voice; Mattel partnership.
ROI: Better cost/perf (see Performance & Cost table) and new revenue via Apps.

Revolutionary Features Deep Dive

📱

ChatGPT Apps

Native Application Integration Platform

Overview

ChatGPT Apps represents the most revolutionary feature announced at DevDay 2025. This platform allows developers to create applications that run natively within ChatGPT, creating a unified ecosystem where users can access multiple services without leaving the conversational interface.

Core Capabilities

Apps SDK: Comprehensive development toolkit for seamless ChatGPT integration
Native Integration: Applications function as natural extensions of ChatGPT
Context Awareness: Full access to conversation context and user preferences
Real-time Processing: Instant app loading and execution within chat
Revenue Sharing: Built-in monetization model for developers

Technical Specifications

Status: Preview (Beta) – Limited access

API Support: RESTful API, GraphQL, WebSocket

Authentication: OAuth 2.0, API Keys, JWT tokens

Deployment: Cloud-native with auto-scaling

Performance: < 200ms app launch time

Security: End-to-end encryption, SOC 2 compliance

Real-World Applications

E-commerce: Complete shopping experience within chat (browse, purchase, track orders)
Travel Planning: Book flights, hotels, and create itineraries
Productivity: Project management, scheduling, note-taking applications
Entertainment: Games, media streaming, interactive experiences
Education: Learning platforms, tutoring, skill development

Transformative Impact

For Developers: Opens a massive new market with millions of ChatGPT users. Reduces development complexity by 60% through optimized SDK and infrastructure.

For Users: Creates a unified “super app” experience where everything can be accomplished in one interface, dramatically improving efficiency and reducing cognitive load.

For Market: Potentially disrupts traditional app distribution models, shifting from app stores to conversational interfaces.

🤖

AgentKit

Advanced AI Agent Development Framework

Overview

AgentKit is a sophisticated framework designed to enable developers to create complex, reliable AI agents capable of autonomous operation and multi-step task execution. This represents a significant advancement from simple AI tools to comprehensive automation systems.

Core Features

Persistent Memory: Long-term memory system for context retention across sessions
Advanced Reasoning: Multi-step logical analysis and decision-making capabilities
Task Orchestration: Complex workflow management and execution
Error Recovery: Automatic error detection and recovery mechanisms
Human Collaboration: Seamless human-AI interaction and handoff protocols
Performance Monitoring: Real-time analytics and optimization tools

Technical Architecture

Architecture: Microservices-based with event-driven design

Scalability: Horizontal scaling with intelligent load balancing

Security: Zero-trust architecture with end-to-end encryption

Integration: REST API, WebSocket, Message Queue support

Performance: Sub-second response times for most operations

Reliability: 99.9% uptime with automatic failover

Revolutionary Impact

Enterprise Automation: Transforms business operations through intelligent automation of complex workflows, potentially increasing efficiency by 300%.

Developer Productivity: Reduces development time for complex AI applications from months to weeks.

Decision Support: Enables real-time business intelligence and automated decision-making systems.

🎬

Sora 2 API

Next-Generation Video Generation Platform

Overview

Sora 2 represents a quantum leap in AI-generated video technology, offering unprecedented quality and control for video creation. Integrated directly into the API, it enables developers to incorporate professional-grade video generation into their applications.

Major Improvements over Sora 1

Quality Enhancement: 60% improvement in visual fidelity and realism
Extended Duration: Support for videos up to 15 minutes in length
Consistency: Dramatically improved temporal consistency and object tracking
Style Control: Advanced style transfer and artistic direction capabilities
Resolution: Native 4K support with HDR capabilities
Audio Integration: Synchronized audio generation and editing

Technical Specifications

Resolution: Up to 4K (3840×2160) with HDR support

Duration: Up to 15 minutes per video

Frame Rates: 24fps, 30fps, 60fps, 120fps

Formats: MP4, MOV, AVI, WebM

Processing Time: 3-8 minutes for 1-minute video

Audio: 48kHz, 16-bit stereo audio generation

Industry Transformation

Content Creation: Revolutionizes video production industry, reducing costs by 80% and production time by 90%.

Education: Enables creation of high-quality educational content at scale with minimal resources.

Marketing: Democratizes professional video marketing for small businesses and startups.

Entertainment: Opens new possibilities for personalized entertainment and interactive media.

Performance & Cost Analysis

Feature	Cost	Performance	Primary Use Case	ROI Impact
GPT-5 Pro	$0.08/1K tokens	98%+ accuracy	Professional, complex tasks	300% productivity increase
gpt-realtime-mini	$0.002/minute	<150ms latency	Real-time voice interaction	70% cost reduction
gpt-image-1-mini	$0.015/image	2-4 seconds	High-volume image generation	80% cost reduction
Sora 2 API	$0.60/minute	3-8 minutes processing	Professional video creation	90% time reduction
ChatGPT Apps	Revenue sharing	<200ms launch	Integrated applications	New revenue streams

Live Demos Breakdown

🎓

Coursera Demo (00:05:58)

Educational Content Integration

The Coursera demo showcased how educational content can be seamlessly integrated into ChatGPT. Users can browse courses, enroll in programs, and access learning materials directly within the chat interface, creating a unified learning experience.

Key Features Demonstrated:

Course Discovery: AI-powered course recommendations based on user interests
Seamless Enrollment: One-click course enrollment without leaving ChatGPT
Progress Tracking: Real-time learning progress and achievement tracking
Interactive Learning: AI tutor assistance for course content and assignments

🎨

Canva Demo (00:08:42)

Design Tools Integration

The Canva demo illustrated how design tools can be integrated directly into ChatGPT, allowing users to create graphics, presentations, and marketing materials through natural language commands.

Key Features Demonstrated:

Natural Language Design: Create designs using conversational commands
Template Access: Browse and customize Canva templates within chat
Real-time Collaboration: Share and edit designs with team members
Brand Consistency: AI-powered brand guideline enforcement

🏠

Zillow Demo (00:11:23)

Real Estate Integration

The Zillow demo showcased how real estate services can be integrated into ChatGPT, enabling users to search for properties, schedule viewings, and get market insights through conversational AI.

Key Features Demonstrated:

Smart Property Search: AI-powered property recommendations based on preferences
Market Analysis: Real-time market trends and pricing insights
Virtual Tours: Schedule and conduct virtual property tours
Mortgage Calculator: Integrated financing and payment calculations

Launch Partners (00:14:41)

Strategic Launch Partners

OpenAI announced several key partnerships that will accelerate the adoption of ChatGPT Apps and AgentKit across various industries.

Enterprise Partners

Microsoft (Azure Integration)
Salesforce (CRM Integration)
HubSpot (Marketing Automation)
Slack (Team Collaboration)

Consumer Partners

Coursera (Education)
Canva (Design)
Zillow (Real Estate)
Spotify (Music)

Developer Partners

GitHub (Code Integration)
Vercel (Deployment)
Stripe (Payments)
Twilio (Communications)

Building “Ask Froggie” Agent (00:21:11 – 00:26:47)

🐸

Live Agent Development

Real-time Agent Building Process

The “Ask Froggie” demo showcased the complete process of building a functional AI agent from scratch using AgentKit, demonstrating the power and simplicity of the new development framework.

Development Process:

1. Agent Configuration

Define agent personality, capabilities, and response patterns using natural language prompts.

2. Workflow Design

Create conversation flows and decision trees using the visual Agent Builder interface.

3. Testing & Preview

Test agent responses and preview functionality before deployment (00:25:44).

4. Publishing

Deploy agent to production with one-click publishing (00:26:47).

Agent Capabilities:

Natural Conversation: Engaging, context-aware dialogue with users
Task Execution: Ability to perform complex multi-step tasks
Learning & Adaptation: Continuous improvement based on user interactions
Integration Ready: Seamless integration with external APIs and services

Codex Advanced Capabilities (00:34:19 – 00:44:20)

Camera Control (00:36:12)

Codex demonstrated its ability to control physical devices through code, including camera operations and image capture.

Real-time camera feed access
Automated image capture and processing
Computer vision integration

Xbox Controller (00:38:23)

Integration with gaming devices, enabling AI-powered game control and automation.

Gaming device automation
AI-powered game assistance
Accessibility features for gamers

Venue Lights (00:39:55)

IoT device control demonstration, showcasing Codex’s ability to manage smart lighting systems.

Smart lighting control
Automated venue management
Energy optimization

Voice Control (00:42:20)

Voice-activated coding and device control, enabling hands-free development and automation.

Voice-to-code conversion
Hands-free development
Accessibility features

Live Reprogramming (00:44:20)

Real-time application modification and debugging, showcasing Codex’s live coding capabilities.

Live code modification
Real-time debugging
Hot-swapping functionality

Mattel Partnership (00:49:59)

Revolutionary AI-Powered Toys

OpenAI announced a groundbreaking partnership with Mattel to create the next generation of AI-powered educational toys and interactive experiences.

Educational Toys

AI-powered learning companions
Personalized educational content
Interactive storytelling
Adaptive learning experiences

Interactive Features

Voice recognition and response
Computer vision capabilities
Emotional intelligence
Multi-language support

Safety & Privacy

Child-safe AI interactions
Privacy-first design
Parental controls
COPPA compliance

Expected Impact

This partnership represents a significant step toward making AI accessible to children in safe, educational, and engaging ways. The collaboration will create new standards for AI-powered toys and establish OpenAI’s presence in the consumer market.

Sam Altman’s Keynote Address

Revolutionary AI: The Future is Now

Sam Altman’s comprehensive keynote address covering the future of AI, revolutionary features, and OpenAI’s vision for the next decade