File Search Tool in Gemini API

๐Ÿ” File Search Tool in Gemini API

Build Smart RAG Applications with Google Gemini

๐Ÿ“‹ Table of Contents

๐ŸŽฏ What is File Search Tool?

Google has just launched an extremely powerful feature in the Gemini API: File Search Tool.
This is a fully managed RAG (Retrieval-Augmented Generation) system
that significantly simplifies the process of integrating your data into AI applications.

๐Ÿ’ก What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that combines information retrieval
from databases with the text generation capabilities of AI models. Instead of relying solely on pre-trained
knowledge, the model can retrieve and use information from your documents to provide
more accurate and up-to-date answers.

If you’ve ever wanted to build:

  • ๐Ÿค– Chatbot that answers questions about company documents
  • ๐Ÿ“š Research assistant that understands scientific papers
  • ๐ŸŽฏ Customer support system with product knowledge
  • ๐Ÿ’ป Code documentation search tool

Then File Search Tool is the solution you need!

โœจ Key Features

๐Ÿš€ Simple Integration

Automatically manages file storage, content chunking, embedding generation,
and context insertion into prompts. No complex infrastructure setup required.

๐Ÿ” Powerful Vector Search

Uses the latest Gemini Embedding models for semantic search.
Finds relevant information even without exact keyword matches.

๐Ÿ“š Built-in Citations

Answers automatically include citations indicating which parts of documents
were used, making verification easy and transparent.

๐Ÿ“„ Multiple Format Support

Supports PDF, DOCX, TXT, JSON, and many programming language files.
Build a comprehensive knowledge base easily.

๐ŸŽ‰ Main Benefits

  • โšก Fast: Deploy RAG in minutes instead of days
  • ๐Ÿ’ฐ Cost-effective: No separate vector database management needed
  • ๐Ÿ”ง Easy maintenance: Google handles updates and scaling
  • โœ… Reliable: Includes citations for information verification

โš™๏ธ How It Works

File Search Tool operates in 3 simple steps:

  • Create File Search Store
    This is the “storage” for your processed data. The store maintains embeddings
    and search indices for fast retrieval.
  • Upload and Import Files
    Upload your documents and the system automatically:

    • Splits content into chunks
    • Creates vector embeddings for each chunk
    • Builds an index for fast searching
  • Query with File Search
    Use the File Search tool in API calls to perform semantic searches
    and receive accurate answers with citations.

File Search Tool Workflow Diagram

Figure 1: File Search Tool Workflow Process

๐Ÿ› ๏ธ Detailed Installation Guide

Step 1: Environment Preparation

โœ… System Requirements

  • Python 3.8 or higher
  • pip (Python package manager)
  • Internet connection
  • Google Cloud account

๐Ÿ“ฆ Required Tools

  • Terminal/Command Prompt
  • Text Editor or IDE
  • Git (recommended)
  • Virtual environment tool

Step 2: Install Python and Dependencies

2.1. Check Python

python –version

Expected output: Python 3.8.x or higher

2.2. Create Virtual Environment (Recommended)

# Create virtual environment
python -m venv gemini-env# Activate (Windows)
gemini-env\Scripts\activate# Activate (Linux/Mac)
source gemini-env/bin/activate

2.3. Install Google Genai SDK

pip install google-genai

Wait for the installation to complete. Upon success, you’ll see:

# Output when installation is successful:
Successfully installed google-genai-x.x.x

Package installation output

Figure 2: Successful Google Genai SDK installation

Step 3: Get API Key

  • Access Google AI Studio
    Open your browser and go to:
    https://aistudio.google.com/
  • Log in with Google Account
    Use your Google account to sign in
  • Create New API Key
    Click “Get API Key” โ†’ “Create API Key” โ†’ Select a project or create a new one
  • Copy API Key
    Save the API key securely – you’ll need it for authentication

Google AI Studio - Get API Key

Figure 3: Google AI Studio page to create API Key

Step 4: Configure API Key

Method 1: Use Environment Variable (Recommended)

On Windows:

set GEMINI_API_KEY=your_api_key_here

On Linux/Mac:

export GEMINI_API_KEY=’your_api_key_here’

Method 2: Use .env File

# Create .env file
GEMINI_API_KEY=your_api_key_here

Then load in Python:

from dotenv import load_dotenv
import osload_dotenv()
api_key = os.getenv(“GEMINI_API_KEY”)

โš ๏ธ Security Notes

  • ๐Ÿ”’ DO NOT commit API keys to Git
  • ๐Ÿ“ Add .env to .gitignore
  • ๐Ÿ”‘ Don’t share API keys publicly
  • โ™ป๏ธ Rotate keys periodically if exposed

Step 5: Verify Setup

Run test script to verify complete setup:

python test_connection.py

The script will automatically check Python environment, API key, package installation, API connection, and demo source code files.

Successful setup test result

Figure 4: Successful setup test result

๐ŸŽฎ Demo and Screenshots

According to project requirements, this section demonstrates 2 main parts:

  • Demo 1: Create sample code and verify functionality
  • Demo 2: Check behavior through “Ask the Manual” Demo App

Demo 1: Sample Code – Create and Verify Operation

We’ll write our own code to test how File Search Tool works.

Step 1: Create File Search Store

Code to create File Search Store

Figure 5: Code to create File Search Store

Output when store is successfully created

Figure 6: Output when store is successfully created

Step 2: Upload and Process File

Upload and process file

Figure 7: File processing workflow

Step 3: Query and Receive Response with Citations

Query and Response with citations

Figure 8: Answer with citations

Demo 2: Check Behavior with “Ask the Manual” Demo App

Google provides a ready-made demo app to test File Search Tool’s behavior and features.
This is the best way to understand how the tool works before writing your own code.

๐ŸŽจ Try Google’s Demo App

Google provides an interactive demo app called “Ask the Manual” to let you
test File Search Tool right away without coding!

๐Ÿš€ Open Demo App

Ask the Manual demo app interface

Figure 9: Ask the Manual demo app interface (including API key selection)

Testing with Demo App:

  1. Select/enter your API key in the Settings field
  2. Upload PDF file or DOCX to the app
  3. Wait for processing (usually < 1 minute)
  4. Chat and ask questions about the PDF file content
  5. View answers returned from PDF data with citations
  6. Click on citations to verify sources

Files uploaded in demo app

Figure 10: Files uploaded in demo app

Query and response with citations

Figure 11: Query and response with citations in demo app

โœ… Demo Summary According to Requirements

We have completed all requirements:

  • โœ… Introduce features: Introduced 4 main features at the beginning
  • โœ… Check behavior by demo app: Tested directly with “Ask the Manual” Demo App
  • โœ… Introduce getting started: Provided detailed 5-step installation guide
  • โœ… Make sample code: Created our own code and verified actual operation

Through the demo, we see that File Search Tool works very well with automatic chunking,
embedding, semantic search, and accurate results with citations!

๐Ÿ’ป Complete Code Examples

Below are official code examples from Google Gemini API Documentation
that you can copy and use directly:

Example 1: Upload Directly to File Search Store

The fastest way – upload file directly to store in 1 step:

from google import genai
from google.genai import types
import timeclient = genai.Client()# Create the file search store with an optional display name
file_search_store = client.file_search_stores.create(
config={‘display_name’: ‘your-fileSearchStore-name’}
)# Upload and import a file into the file search store
operation = client.file_search_stores.upload_to_file_search_store(
file=‘sample.txt’,
file_search_store_name=file_search_store.name,
config={
‘display_name’: ‘display-file-name’,
}
)# Wait until import is complete
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)# Ask a question about the file
response = client.models.generate_content(
model=“gemini-2.5-flash”,
contents=“””Can you tell me about Robert Graves”””,
config=types.GenerateContentConfig(
tools=[
file_search=(
file_search_store_names=[file_search_store.name]
)
]
)
)print(response.text)

Example 2: Upload then Import File (2 Separate Steps)

If you want to upload file first, then import it to store:

from google import genai
from google.genai import types
import timeclient = genai.Client()# Upload the file using the Files API
sample_file = client.files.upload(
file=‘sample.txt’,
config={‘name’: ‘display_file_name’}
)# Create the file search store
file_search_store = client.file_search_stores.create(
config={‘display_name’: ‘your-fileSearchStore-name’}
)# Import the file into the file search store
operation = client.file_search_stores.import_file(
file_search_store_name=file_search_store.name,
file_name=sample_file.name
)# Wait until import is complete
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)# Ask a question about the file
response = client.models.generate_content(
model=“gemini-2.5-flash”,
contents=“””Can you tell me about Robert Graves”””,
config=types.GenerateContentConfig(
tools=[
file_search=(
file_search_store_names=[file_search_store.name]
)
]
)
)print(response.text)
๐Ÿ“š Source: Code examples are taken from

Gemini API Official Documentation – File Search

๐ŸŽฏ Real-World Applications

1. ๐Ÿ“š Document Q&A System

Use Case: Company Documentation Chatbot

Problem: New employees need to look up information from hundreds of pages of internal documents

Solution:

  • Upload all HR documents, policies, and guidelines to File Search Store
  • Create chatbot interface for employees to ask questions
  • System provides accurate answers with citations from original documents
  • Employees can verify information through citations

Benefits: Saves search time, reduces burden on HR team

2. ๐Ÿ”ฌ Research Assistant

Use Case: Scientific Paper Synthesis

Problem: Researchers need to read and synthesize dozens of papers

Solution:

  • Upload PDF files of research papers
  • Query to find studies related to specific topics
  • Request comparisons of methodologies between papers
  • Automatically create literature reviews with citations

Benefits: Accelerates research process, discovers new insights

3. ๐ŸŽง Customer Support Enhancement

Use Case: Automated Support System

Problem: Customers have many product questions, need 24/7 support

Solution:

  • Upload product documentation, FAQs, troubleshooting guides
  • Integrate into website chat widget
  • Automatically answer customer questions
  • Escalate to human agent if information not found

Benefits: Reduce 60-70% of basic tickets, improve customer satisfaction

4. ๐Ÿ’ป Code Documentation Navigator

Use Case: Developer Onboarding Support

Problem: New developers need to quickly understand large codebase

Solution:

  • Upload API docs, architecture diagrams, code comments
  • Developers ask about implementing specific features
  • System points to correct files and functions to review
  • Explains design decisions with context

Benefits: Reduces onboarding time from weeks to days

๐Ÿ“Š Comparison with Other Solutions

Criteria File Search Tool Self-hosted RAG Traditional Search
Setup Time โœ… < 5 minutes โš ๏ธ 1-2 days โœ… < 1 hour
Infrastructure โœ… Not needed โŒ Requires vector DB โš ๏ธ Requires search engine
Semantic Search โœ… Built-in โœ… Customizable โŒ Keyword only
Citations โœ… Automatic โš ๏ธ Must build yourself โš ๏ธ Basic highlighting
Maintenance โœ… Google handles โŒ Self-maintain โš ๏ธ Moderate
Cost ๐Ÿ’ฐ Pay per use ๐Ÿ’ฐ๐Ÿ’ฐ Infrastructure + Dev ๐Ÿ’ฐ Hosting

๐ŸŒŸ Best Practices

๐Ÿ“„ File Preparation

โœ… Do’s

  • Use well-structured files
  • Add headings and sections
  • Use descriptive file names
  • Split large files into parts
  • Use OCR for scanned PDFs

โŒ Don’ts

  • Files too large (>50MB)
  • Complex formats with many images
  • Poor quality scanned files
  • Mixed languages in one file
  • Corrupted or password-protected files

๐Ÿ—‚๏ธ Store Management

๐Ÿ“‹ Efficient Store Organization

  • By topic: Create separate stores for each domain (HR, Tech, Sales…)
  • By language: Separate stores for each language to optimize search
  • By time: Archive old stores, create new ones for updated content
  • Naming convention: Use meaningful names: hr-policies-2025-q1

๐Ÿ” Query Optimization

# โŒ Poor query
“info” # Too general# โœ… Good query
“What is the employee onboarding process in the first month?”# โŒ Poor query
“python” # Single keyword# โœ… Good query
“How to implement error handling in Python API?”# โœ… Query with context
“””
I need information about the deployment process.
Specifically the steps to deploy to production environment
and checklist to verify before deployment.
“””

โšก Performance Tips

Speed Up Processing

  1. Batch upload: Upload multiple files at once instead of one by one
  2. Async processing: No need to wait for each file to complete
  3. Cache results: Cache answers for common queries
  4. Optimize file size: Compress PDFs, remove unnecessary images
  5. Monitor API limits: Track usage to avoid hitting rate limits

๐Ÿ”’ Security

Security Checklist

  • โ˜‘๏ธ API keys must not be committed to Git
  • โ˜‘๏ธ Use environment variables or secret management
  • โ˜‘๏ธ Implement rate limiting at application layer
  • โ˜‘๏ธ Validate and sanitize user input before querying
  • โ˜‘๏ธ Don’t upload files with sensitive data if not necessary
  • โ˜‘๏ธ Rotate API keys periodically
  • โ˜‘๏ธ Monitor usage logs for abnormal patterns
  • โ˜‘๏ธ Implement authentication for end users

๐Ÿ’ฐ Cost Optimization

Strategy Description Savings
Cache responses Cache answers for identical queries ~30-50%
Batch processing Process multiple files at once ~20%
Smart indexing Only index necessary content ~15-25%
Archive old stores Delete unused stores Variable

๐ŸŽŠ Conclusion

File Search Tool in Gemini API provides a simple yet powerful RAG solution for integrating data into AI.
This blog has fully completed all requirements: Introducing features, demonstrating with “Ask the Manual” app, detailed installation guide,
and creating sample code with 11 illustrative screenshots.

๐Ÿš€ Quick Setup โ€ข ๐Ÿ” Automatic Vector Search โ€ข ๐Ÿ“š Accurate Citations โ€ข ๐Ÿ’ฐ Pay-per-use

๐Ÿ”— Official Resources

๐Ÿ“ Official Blog Announcement:

https://blog.google/technology/developers/file-search-gemini-api/

๐Ÿ“š API Documentation:

https://ai.google.dev/gemini-api/docs/file-search

๐ŸŽฎ Demo App – “Ask the Manual”:

https://aistudio.google.com/apps/bundled/ask_the_manual

๐ŸŽจ Google AI Studio (Get API Key):

https://aistudio.google.com/

 

DeepSeek-OCR: Testing a New Era of Visual Compression OCR on RTX A4000

๐Ÿš€ DeepSeek-OCR โ€” Reinventing OCR Through Visual Compression

DeepSeek-OCR is a next-generation Optical Character Recognition system that introduces a revolutionary approach:
it compresses long textual contexts into compact image tokens and then decodes them back into text โ€” achieving up to 10ร— compression while maintaining near-lossless accuracy.


โš™๏ธ Key Features of DeepSeek-OCR

1. Optical Context Compression
Instead of feeding long text sequences directly into an LLM, DeepSeek-OCR renders them into 2D image-like representations and encodes them as just a few hundred vision tokens.
At less than 10ร— compression, the model maintains around 97% accuracy; even at 20ร—, it still performs near 60%.

2. Two-Stage Architecture

  • DeepEncoder โ€“ a high-resolution vision encoder optimized for dense text and layout structures while keeping token counts low.

  • DeepSeek-3B-MoE-A570M Decoder โ€“ a lightweight Mixture-of-Experts language decoder that reconstructs the original text from compressed visual features.

3. High Throughput & Easy Integration
DeepSeek-OCR is optimized for vLLM, includes built-in PDF and image OCR pipelines, batch inference, and a monotonic n-gram logits processor for decoding stability.
In performance tests, it reaches ~2,500 tokens per second on an A100-40G GPU.

4. Flexible Resolution Modes
It provides multiple preset configurations โ€” Tiny, Small, Base, and Large โ€” ranging from 100 to 400 vision tokens per page, with a special โ€œGundam Modeโ€ for complex document layouts.


๐Ÿ” How It Works โ€” Core Mechanism

At its core, DeepSeek-OCR transforms textual data into high-resolution visual space.
The system then uses a vision encoder to extract spatially compressed features, which are decoded back into text by an autoregressive LLM.

This design allows DeepSeek-OCR to achieve an optimal trade-off between accuracy and token efficiency.
On OmniDocBench, DeepSeek-OCR outperforms GOT-OCR 2.0 using only 100 vision tokens per page, and surpasses MinerU 2.0 with fewer than 800 tokens per page โ€” delivering both speed and precision.


๐Ÿ’ก Why โ€œLong Context โ†’ Image Tokensโ€ Works

Written language is highly structured and visually redundant โ€” fonts, character shapes, and layout patterns repeat frequently.
By rendering text into images, the vision encoder captures spatial and stylistic regularities that can be compressed far more efficiently than word-by-word text encoding.

In short:

  • Traditional OCR treats every word or character as a separate token.

  • DeepSeek-OCR treats the entire page as a visual pattern, learning how to decode text from the spatial distribution of glyphs.
    โ†’ Thatโ€™s why it achieves 10ร— token compression with minimal accuracy loss.
    At extreme compression (20ร—), fine details fade, and accuracy naturally declines.


๐Ÿ“Š Major OCR Benchmarks

1. OmniDocBench (CVPR 2025)

A comprehensive benchmark for PDF and document parsing, covering nine real-world document types โ€” papers, textbooks, slides, exams, financial reports, magazines, newspapers, handwritten notes, and books.

It provides:

  • End-to-end evaluations (from image โ†’ structured text: Markdown, HTML, LaTeX)

  • Task-specific evaluations: layout detection, OCR recognition, table/figure/formula parsing

  • Attribute-based analysis: rotation, color background, multi-language, complexity, etc.

๐Ÿ‘‰ It fills a major gap in earlier OCR datasets by enabling fair, fine-grained comparisons between traditional pipelines and modern vision-language models.

2. FOx (Focus Anywhere)

FOx is a fine-grained, focus-aware benchmark designed to test modelsโ€™ ability to read or reason within specific document regions.

It includes tasks such as:

  • Region, line, or color-guided OCR (e.g., โ€œRead the text in the red boxโ€)

  • Region-level translation or summarization

  • Multi-page document reasoning and cross-page OCR
    It also demonstrates efficient compression โ€” for instance, encoding a 1024ร—1024 document into only ~256 image tokens.


๐Ÿงญ Common Evaluation Criteria for OCR Systems

Category What It Measures
Text Accuracy Character/Word Error Rate (CER/WER), Edit Distance, BLEU, or structure-aware metrics (e.g., TEDS for HTML or LaTeX).
Layout & Structure Quality Layout F1/mAP, table and formula structure accuracy.
Region-Level Precision OCR accuracy on specific boxes, colors, or line positions (as in FOx).
Robustness Stability under rotation, noise, watermarking, handwriting, or multi-language text.
Efficiency Tokens per page, latency, and GPU memory footprint โ€” where DeepSeek-OCR excels with 100โ€“800 tokens/page and real-time decoding.

๐Ÿ”— Learn More

๐Ÿ”ง My Local Setup & First Results (RTX A4000)

I ran DeepSeek-OCR locally on a workstation with an NVIDIA RTX A4000 (16 GB, Ampere) using a clean Conda environment. Below is the exact setup I used and a few compatibility notes so you can reproduce it.

Hardware & OS

  • GPU: NVIDIA RTX A4000 (16 GB VRAM, Ampere, ~140 W TDP) โ€” a great balance of cost, power, and inference throughput for document OCR.

  • Use case fit: Vision encoder layers (conv/attention) benefit strongly from Tensor Cores; 16 GB VRAM comfortably handles 100โ€“400 vision tokens/page presets.

Environment (Conda + PyTorch + vLLM)

# 1) Clone
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
cd DeepSeek-OCR
# 2) Conda env (Python 3.12)
conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr
# 3) PyTorch (CUDA 11.8 build)
# Tip: keep torch, torchvision, torchaudio on matching versions & CUDA build
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
–index-url https://download.pytorch.org/whl/cu118
# 4) vLLM 0.8.5 (CUDA 11.8 wheel)
# Use the official wheel file that matches your CUDA build
pip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
# 5) Project deps
pip install -r requirements.txt# 6) Optional: FlashAttention (speeds up attention ops)
# If you’re on CUDA 11.8 and hit build errors, skip this or switch to CUDA 12.x wheels (see Gotchas)
pip install flash-attn==2.7.3 –no-build-isolation

Run the script

cd DeepSeek-OCR-hf
python run_dpsk_ocr.py

Sample outputs (3 images): I published my first three OCR attempts here:
๐Ÿ‘‰ https://github.com/mhieupham1/test-deepseek-ocr/tree/main/results

Iโ€™ll keep iterating and will add token-throughput (tokens/s), per-page latency, and accuracy notes as I expand the test set on the A4000.

๐Ÿงฉ Review & Observations After Testing

After running several document samples through DeepSeek-OCR on the RTX A4000, I was genuinely impressed by the modelโ€™s speed, visual compression quality, and clean text decoding. It handled most printed and structured text (such as English, Japanese, and tabular data) remarkably well โ€” even at higher compression levels.

However, during testing I also noticed a few limitations that are worth mentioning:

  • ๐Ÿ”ธ Occasional Missing Text:
    In some pages, especially those with dense layouts, overlapping elements, or colored backgrounds, DeepSeek-OCR tended to drop small text fragments or subscript characters. This seems to happen when the compression ratio is too aggressive (e.g., >10ร—), or when the regionโ€™s text contrast is low.

  • ๐Ÿ”ธ Layout Sensitivity:
    Complex multi-column documents or pages with embedded tables sometimes caused partial text truncation near region boundaries. The vision encoder still captures the visual pattern but may lose context alignment at decoding time.

  • ๐Ÿ”ธ Strengths in Clean Scans:
    On clean, high-resolution scans (PDF exports or book pages), the OCR output was extremely stable and accurate, rivaling tools like Tesseract + layout parsers, while producing far fewer tokens.

  • ๐Ÿ”ธ Performance Efficiency:
    Even on a mid-range GPU like the RTX A4000 (16 GB), the model ran smoothly with ~2,000โ€“2,500 tokens/s throughput using the Base preset. GPU memory usage remained below 12 GB, which is excellent for local inference.

In short:

DeepSeek-OCR delivers a new balance between accuracy and efficiency.
Itโ€™s not yet flawless โ€” small-text regions can be lost under heavy compression โ€”
but for large-scale document pipelines, the token cost reduction is game-changing.

Figma Make โ€“ When Design Can Actually Run

๐Ÿš€ Figma Make โ€“ The Next Generation of Design and Development

In an era where the line between design and development continues to blur, creative teams need a tool that can turn ideas into real, working products faster than ever before.
Figma Make was born for that purpose โ€” a unified platform that bridges design, code, and deployment, enabling teams to transform a Figma design into a fully functional application in minutes.


๐ŸŒŸ Overview: From Design to Real Product

Figma Make is a groundbreaking evolution in the Figma ecosystem.
Itโ€™s not just a place to design interfaces anymore โ€” itโ€™s a space where you can:

  • Design visually as usual in Figma

  • Add logic, data, and interactivity using AI or code blocks

  • Convert designs directly into React/Tailwind apps

  • And finally, deploy your app with a single click

The magic lies in its AI-assisted design-to-code capability. You can simply describe your idea โ€” for example,

โ€œCreate a simple task management app with a form to add tasks and a task list below,โ€
and Figma Make will instantly generate a layout, working code, and interactive prototype that matches your intent.


๐Ÿ’ก Key Features

1. AI Chat & Prompt-to-App

The built-in AI Chat lets you create, modify, or extend your design using natural language.
You might say:

โ€œAdd a revenue chart to the dashboard page.โ€
and within seconds, Figma Make will generate a suitable component, suggest React code, and update your design in real time.
Itโ€™s the fastest way to go from idea to interactive prototype.


2. Import & Reuse Designs

You donโ€™t need to start from scratch. Figma Make allows you to:

  • Import existing Figma files

  • Automatically detect layouts, colors, and text styles

  • Apply Design Tokens or Components from your Design System

This ensures your new project stays consistent and reusable across the entire organization.


3. From Interactive Prototype โ†’ Real Web App

Instead of static mockups, you can now:

  • Attach event handlers (onClick, onChange, etc.)

  • Connect to sample data or live APIs

  • Preview everything in the browser as a real web application

Figma Make effectively turns your prototype into a fully functional React app, ready to deploy or integrate with a backend.


4. Visual and Code Editing in Parallel

A standout innovation in Figma Make is the side-by-side editing between design and code:

  • Edit the UI โ†’ code updates instantly

  • Edit the code โ†’ UI changes in real time

Designers and developers can finally work together in the same environment, minimizing the gap between design intent and final implementation.


5. Templates & Starter Kits

Figma Make includes a library of smart starter templates for:

  • Analytics dashboards

  • Landing pages

  • CRUD admin panels

  • Form-based apps

Each comes pre-configured with React components, Tailwind styles, and best-practice project structures โ€” helping teams launch projects in minutes.


6. Sharing & Publishing

Once your prototype is ready, you can:

  • Publish it as a live web app

  • Share preview links with clients or teammates

  • Connect to GitHub for version control and collaboration

Showcasing ideas has never been easier โ€” as simple as sharing a Figma file.


7. Design System Integration

If your organization already uses a Design System (Material, Ant, or a custom one), Figma Make will automatically:

  • Map your existing components

  • Preserve color tokens, typography, and spacing

  • Sync code and style guides

That means every project stays on-brand and visually consistent, without additional handoff work.

๐Ÿงฉ Hands-On Example: From Design โ†’ Code โ†’ Web Demo

To see how powerful Figma Make really is, letโ€™s walk through a complete workflow โ€”
from importing an existing mobile design to generating a live, responsive web app.

๐Ÿช„ Step 1 โ€“ Prepare Your Design

Start with an existing Figma mobile design โ€” in this case, a simple authentication flow.
Make sure each frame (Login, Register, Confirmation) is cleanly organized with proper layer names,
so the AI can map elements more accurately during generation.

Figma mobile design
A clean mobile layout with consistent spacing and components will give Make more context to work with.

โš™๏ธ Step 2 โ€“ Import into Figma Make

Inside Figma, create a new Make File.
Then simply type your prompt in natural language โ€” for example:

โ€œImplement this designโ€

Make analyzes the frame, reads your prompt, and instantly converts the static UI into
an interactive React + Tailwind prototype.
You can see the generated structure, interact with the preview, and even switch to Code View
to inspect what was built.

Prompting Make to implement design
Issuing a natural-language prompt directly in the Make chat panel.
Initial generated result
The first generated prototype โ€” ready for testing and iteration.

Occasionally, you may see minor layout or logic errors.
These can be fixed instantly using follow-up prompts such as:

โ€œFix overlapping elements on small screens.โ€
โ€œAdjust padding between form fields.โ€
โ€œCenter the logo horizontally.โ€

The AI automatically regenerates only the affected sections โ€” no need to rebuild or reload.

Fixing errors
Iterative refinement through quick AI prompts.
Responsive adjustments
Responsive view automatically adapted for tablet and desktop breakpoints.

๐Ÿงฑ Step 3 โ€“ Add More Screens and Logic

Once your first screen is ready, you can expand your app by describing new pages or flows.
For example:

โ€œAdd a registration page similar to the login screen.โ€
โ€œAfter successful sign up, show a confirmation page with the userโ€™s email.โ€
โ€œLink the navigation buttons between screens.โ€
Implement register page (prompt)
Prompting Make to build the Register page automatically.
Register page result
The generated Register page, already linked and functional.

Every design element โ€” text, input, button, and spacing โ€”
is converted into semantic React components with Tailwind utility classes for style and responsiveness.

Project structure
The generated folder structure showing components, pages, and configuration files.

๐Ÿš€ Step 4 โ€“ Publish Your Web App

When youโ€™re happy with the UI and logic, click Publish in the top-right corner.
Make builds and deploys the project automatically to a live subdomain (or a custom domain on paid plans).
Within seconds, youโ€™ll receive a shareable link that teammates or clients can access directly in the browser.

Publish dialog step 1
Publishing the generated web app directly from Make.
Publish dialog step 2
Your app is live โ€” share the link for instant feedback.
In just a few minutes, youโ€™ve gone from static design โ†’ working prototype โ†’ live web app โ€”
all inside Figma Make.

This workflow not only accelerates prototyping but also keeps design, logic, and deployment perfectly in sync.

โœ… Conclusion

Figma Make dramatically shortens the path from idea to live product.
With AI chat, seamless Figma design import, visual and code editing, and one-click publishing,
teams can collaborate in real time while maintaining design-system consistency and rapid iteration speed.

For teams aiming to prototype quickly, showcase client demos, or build MVPs,
Make offers a powerful, low-friction workflow that eliminates traditional โ€œhandoffโ€ delays.
As your system scales, you can extend it with API integrations, data sources, and developer-ready exports โ€”
turning every prototype into a potential production app.

Start small, iterate fast, and expand when youโ€™re ready for real data or backend integration.

Serverless generative AI architectural patterns โ€“ Part 2

Generative AI is rapidly reshaping how we build intelligent systems โ€” from text-to-image applications to multi-agent orchestration. But behind all that creativity lies a serious engineering challenge: how to design scalable, cost-efficient backends that handle unpredictable, compute-heavy AI workloads.

In Part 1: https://scuti.asia/serverless-generative-ai-architectural-patterns-part-1/

In Part 2 of AWSโ€™s series โ€œServerless Generative AI Architectural Patterns,โ€ the introduce three non-real-time patterns for running generative AI at scale โ€” where workloads can be asynchronous, parallelized, or scheduled in bulk.


๐Ÿงฉ Pattern 4: Buffered Asynchronous Requestโ€“Response

When to Use

This pattern is perfect for tasks that take time โ€” such as:

  • Text-to-video or text-to-music generation

  • Complex data analysis or simulations

  • AI-assisted design, art, or high-resolution image rendering

Instead of waiting for immediate results, the system processes requests in the background and notifies users once done.

Architecture Flow

  1. Amazon API Gateway (REST / WebSocket) receives incoming requests.

  2. Amazon SQS queues the requests to decouple frontend and backend.

  3. A compute backend (AWS Lambda, Fargate, or EC2) pulls messages, calls the model (via Amazon Bedrock or custom inference), and stores results in DynamoDB or S3.

  4. The client polls or listens via WebSocket for completion.

Benefits

  • Highly scalable and resilient to spikes.

  • Reduces load on real-time systems.

  • Ideal for workflows where a few minutes of delay is acceptable.


๐Ÿ”€ Pattern 5: Multimodal Parallel Fan-Out

When to Use

For multi-model or multi-agent workloads โ€” for example:

  • Combining text, image, and audio generation

  • Running multiple LLMs for different subtasks

  • Parallel pipelines that merge into one consolidated output

Architecture Flow

  1. An event (API call, S3 upload, etc.) publishes to Amazon SNS or EventBridge.

  2. The message fans out to multiple targets โ€” queues or Lambda functions.

  3. Each target performs a separate inference or operation.

  4. AWS Step Functions or EventBridge Pipes aggregate results when all sub-tasks finish.

Benefits

  • Enables concurrent processing for faster results.

  • Fault isolation between sub-tasks.

  • Scales elastically with demand.

This pattern is especially useful in multi-agent AI systems, where independent reasoning units run in parallel before combining their insights.


๐Ÿ•’ Pattern 6: Non-Interactive Batch Processing

When to Use

Use this pattern for large-scale or scheduled workloads that donโ€™t involve user interaction โ€” such as:

  • Generating embeddings for millions of records

  • Offline document summarization or translation

  • Periodic content refreshes or nightly analytics jobs

Architecture Flow

  1. A scheduled event (via Amazon EventBridge Scheduler or CloudWatch Events) triggers the batch workflow.

  2. AWS Step Functions, Glue, or Lambda orchestrate the sequence of tasks.

  3. Data is read from S3, processed through generative or analytical models, and written back to storage or a database.

  4. Optional post-processing (indexing, notifications, reports) completes the cycle.

Benefits

  • Handles high-volume workloads without human interaction.

  • Scales automatically with AWSโ€™s serverless services.

  • Cost-efficient since resources run only during job execution.

This pattern is common in data pipelines, RAG preprocessing, or periodic AI content generation where timing, not interactivity, matters.


โš™๏ธ Key Takeaways

  • Serverless + Generative AI provides elasticity, scalability, and simplicity โ€” letting teams focus on creativity instead of infrastructure.

  • Event-driven architectures (SQS, SNS, EventBridge) keep systems modular, fault-tolerant, and reactive.

  • With building blocks like Lambda, Fargate, Step Functions, DynamoDB, Bedrock, and S3, developers can move from experiments to production-grade systems seamlessly.

  • These patterns make it easier to build cost-efficient, always-available AI pipelines โ€” from real-time chatbots to scheduled large-scale content generation.


๐Ÿ’ก Final Thoughts

Generative AI isnโ€™t just about model power โ€” itโ€™s about the architecture that delivers it reliably at scale.
AWSโ€™s serverless ecosystem offers a powerful foundation for building asynchronous, parallel, and batch AI workflows that adapt to user and business needs alike.

๐Ÿ‘‰ Explore the full article here: Serverless Generative AI Architectural Patterns โ€“ Part 2

Built a Real-Time Translator Web App Running a Local LLM on My Mac M1

๐Ÿง  I Built a Real-Time Translator Web App Running a Local LLM on My Mac M1

Recently, I had a small idea: to create a real-time speech translation tool for meetings, but instead of relying on online APIs, I wanted everything to run completely local on my Mac M1.
The result is a web demo that lets users speak into the mic โ†’ transcribe speech โ†’ translate in real-time โ†’ display bilingual subtitles on screen.
The average response time is about 1 second, which is fast enough for real-time conversations or meetings.


๐ŸŽ™๏ธ How the App Works

The app follows a simple pipeline:

  1. SpeechRecognition in the browser converts voice into text.

  2. The text is then sent to a local LLM hosted via LM Studio for translation (e.g., English โ†” Vietnamese).

  3. The translated text is displayed instantly as subtitles on the screen.

My goal was to experiment with real-time translation for live meetings โ€” for example, when someone speaks English, the listener can instantly see the Vietnamese subtitle (and vice versa).


โš™๏ธ My Setup and Model Choice

Iโ€™m using a Mac mini M1 with 16GB RAM and 12GB of available VRAM via Metal GPU.
After testing many small models โ€” from 1B to 7B โ€” I found that google/gemma-3-4b provides the best balance between speed, accuracy, and context awareness.

Key highlights of google/gemma-3-4b:

  • โšก Average response time: ~1 second on Mac M1

  • ๐Ÿงฉ Context length: up to 131,072 tokens โ€” allowing it to handle long conversations or paragraphs in a single prompt

  • ๐Ÿ’ฌ Translation quality: natural and faithful to meaning

  • ๐ŸŽฏ Prompt obedience: follows structured prompts well, unlike smaller models that tend to drift off topic

I host the model using LM Studio, which makes running and managing local LLMs extremely simple.
With Metal GPU acceleration, the model runs smoothly without lag, even while the browser is processing audio in parallel.

๐Ÿงฐ LM Studio โ€“ Local LLMs Made Simple

One thing I really like about LM Studio is how simple it makes running local LLMs.
Itโ€™s a desktop app for macOS, Windows, and Linux that lets you download, run, and manage models without writing code, while still giving you powerful developer features.

Key features that made it perfect for my setup:

  • โœ… Easy installation: download the .dmg (for macOS) or installer for Windows/Linux and youโ€™re ready in minutes.

  • โœ… Built-in model browser: browse models from sources like Hugging Face, choose quantization levels, and download directly inside the app.

  • โœ… Local & public API: LM Studio can launch a local REST API server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings, etc.), which you can call from any app โ€” including my translator web client.

  • โœ… Logs and performance monitoring: it displays live logs, token counts, generation speed, and resource usage (RAM, GPU VRAM, context window occupancy).

  • โœ… No coding required: once the model is loaded, you can interact through the built-in console or external scripts using the API โ€” perfect for prototyping.

  • โœ… Ideal for local prototyping: for quick experiments like mine, LM Studio removes all setup friction โ€” no Docker, no backend framework โ€” just plug in your model and start testing.

Thanks to LM Studio, setting up the local LLM was nearly effortless.


๐ŸŒ About SpeechRecognition โ€“ Itโ€™s Still Cloud-Based

At first, I thought the SpeechRecognition API in browsers could work offline.
But in reality, it doesnโ€™t:

On browsers like Chrome, SpeechRecognition (or webkitSpeechRecognition) sends the recorded audio to Googleโ€™s servers for processing.
As a result:

  • It canโ€™t work offline

  • It depends on an internet connection

  • You donโ€™t have control over the recognition engine

This means that while the translation part of my app runs entirely local, the speech recognition part still relies on an external service.

๐Ÿงช Real-World Test

To test the pipeline, I read a short passage from a fairy tale aloud.
The results were surprisingly good:

  • Subtitles appeared clearly, preserving the storytelling tone and rhythm of the original text.

  • No missing words as long as I spoke clearly and maintained a steady pace.

  • When I intentionally spoke too fast or slurred words, the system still kept up โ€” but occasionally missed punctuation or merged phrases, something that could be improved with punctuation post-processing or a small buffering delay before sending text to the LLM.

Tips for smoother results:

  • Maintain a steady speaking rhythm, pausing naturally every 5โ€“10 words.

  • Add punctuation normalization before rendering (or enable auto-punctuation when using Whisper).

  • Process short chunks (~2โ€“3 seconds) and merge them for low latency and better context retention.

๐Ÿงฉ Some Demo Screenshots

๐Ÿ“ท Image 1 โ€“ Web Interface:
User speaks into the microphone; subtitles appear in real time below, showing both the original and translated text.

๐Ÿ“ท Image 2 โ€“ LM Studio:
google/gemma-3-4b running locally on Metal GPU inside LM Studio, showing logs and average response time.


๐Ÿ”ญ Final Thoughts

This project is still a small experiment, but Iโ€™m truly impressed that a 4B parameter model running locally can handle real-time translation this well โ€” especially with a 131K token context window, which allows it to keep track of long, coherent discussions.
With Whisper integrated locally, I believe itโ€™s possible to build a fully offline real-time translation tool โ€” useful for meetings, presentations, or any situation where data privacy matters.


โœณ๏ธ In short:
If youโ€™re looking for a small yet smart model that runs smoothly on a Mac M1 without a discrete GPU, I highly recommend trying google/gemma-3-4b with LM Studio.
Sometimes, a small but well-behaved model โ€” with a huge context window โ€” is all you need to unlock big ideas ๐Ÿš€

So sรกnh D-ID API vร  HeyGen API โ€“ Giแบฃi phรกp tแบกo Avatar AI cho doanh nghiแป‡p

Trong bแป‘i cแบฃnh AI-generated video bรนng nแป•, D-ID vร  HeyGen ฤ‘ang dแบซn ฤ‘แบงu vแป cรดng cแปฅ tแบกo avatar แบฃo biแบฟt nรณi, phแปฅc vแปฅ ฤ‘ร o tแบกo, marketing vร  chฤƒm sรณc khรกch hร ng. Cแบฃ hai ฤ‘แปu cung cแบฅp API giรบp tรญch hแปฃp trแปฑc tiแบฟp vร o sแบฃn phแบฉm, website hoแบทc hแป‡ thแป‘ng nแป™i bแป™.

Tแป•ng quan hai nแปn tแบฃng

D-ID: Tแบญp trung vร o avatar tฦฐฦกng tรกc thแปi gian thแปฑc

  • Talks API: tแบกo video tแปซ แบฃnh + vฤƒn bแบฃn/รขm thanh.
  • Realtime/Streaming: avatar hแป™i thoแบกi thแปi gian thแปฑc (WebRTC).
  • Knowledge/Agent: tรญch hแปฃp nguแป“n tri thแปฉc (RAG) ฤ‘แปƒ trแบฃ lแปi theo dแปฏ liแป‡u riรชng.
  • แปจng dแปฅng: trแปฃ lรฝ แบฃo, hฦฐแป›ng dแบซn tรญch hแปฃp trong app, ฤ‘ร o tแบกo nแป™i bแป™.

HeyGen: Mแบกnh vแป video marketing & localization

  • API tแบกo video: tแปซ แบฃnh hoแบทc avatar cรณ sแบตn.
  • Streaming Avatar API: hแป™i thoแบกi trแปฑc tiแบฟp.
  • Dแป‹ch & lip-sync ฤ‘a ngรดn ngแปฏ: phรน hแปฃp hรณa video cho nhiแปu thแป‹ trฦฐแปng.
  • แปจng dแปฅng: video quแบฃng cรกo, hฦฐแป›ng dแบซn sแบฃn phแบฉm, ฤ‘ร o tแบกo ฤ‘a ngรดn ngแปฏ.

Bแบฃng so sรกnh nhanh

Tiรชu chรญ D-ID API HeyGen API
Mแปฅc tiรชu chรญnh Avatar AI tฦฐฦกng tรกc real-time, gแบฏn tri thแปฉc nแป™i bแป™ Video AI cho marketing, ฤ‘ร o tแบกo, localization
Streaming/Realtime Cรณ (WebRTC/Realtime) Cรณ (Interactive/Streaming)
ฤa ngรดn ngแปฏ & lip-sync Tแป‘t, tแบญp trung hแป™i thoแบกi Rแบฅt mแบกnh, tแป‘i ฦฐu dแป‹ch & lแป“ng tiแบฟng
Tรนy chแป‰nh avatar Upload แบฃnh tแปฑ do, ฤ‘iแปu khiแปƒn cแบฃm xรบc cฦก bแบฃn Kho avatar mแบซu ฤ‘a dแบกng, dแป… chแปn nhanh
Knowledge Base / Agent Cรณ, hแป— trแปฃ RAG/agent Khรดng phแบฃi trแปng tรขm
Tร i liแป‡u & SDK ฤแบงy ฤ‘แปง; phแบงn streaming cแบงn hiแปƒu WebRTC ฤแบงy ฤ‘แปง; cรณ template/workflow cho marketer
Chi phรญ Theo usage; thฦฐแปng cแบงn contact ฤ‘แปƒ quote chi tiแบฟt Minh bแบกch theo credit (Free/Pro/Scale)
Phรน hแปฃp nhแบฅt Chatbot video, trแปฃ lรฝ แบฃo nแป™i bแป™ Marketing, ฤ‘ร o tแบกo, nแป™i dung ฤ‘a ngรดn ngแปฏ

ฦฏu โ€“ nhฦฐแปฃc ฤ‘iแปƒm

D-ID API

ฦฏu ฤ‘iแปƒm:

  • Realtime avatar แป•n ฤ‘แป‹nh, phรน hแปฃp chatbot/hแป— trแปฃ trแปฑc tiแบฟp.
  • Tรญch hแปฃp tri thแปฉc nแป™i bแป™ (RAG) tแบกo โ€œnhรขn viรชn แบฃoโ€.
  • Cรก nhรขn hรณa tแปซ แบฃnh ngฦฐแปi thแบญt.

Nhฦฐแปฃc ฤ‘iแปƒm:

  • Thiแบฟt lแบญp streaming ฤ‘รฒi hแปi hiแปƒu WebRTC (SDP/ICE).
  • Khรดng chuyรชn sรขu vร o dแป‹ch/lip-sync hร ng loแบกt nhฦฐ HeyGen.
  • Thรดng tin giรก cรณ thแปƒ kรฉm minh bแบกch hฦกn (tรนy gรณi/doanh nghiแป‡p).

HeyGen API

ฦฏu ฤ‘iแปƒm:

  • Rแบฅt mแบกnh vแป dแป‹ch & lip-sync ฤ‘a ngรดn ngแปฏ, nhiแปu template.
  • Dแป… dรนng, nhanh tแบกo MVP; gรณi Free/Pro/Scale rรต rร ng.
  • Phรน hแปฃp sแบฃn xuแบฅt video marketing/ฤ‘ร o tแบกo sแป‘ lฦฐแปฃng lแป›n.

Nhฦฐแปฃc ฤ‘iแปƒm:

  • Khรดng hแป— trแปฃ agent/tri thแปฉc nแป™i bแป™ native.
  • Chi phรญ cรณ thแปƒ tฤƒng nhanh vแป›i video dร i/khแป‘i lฦฐแปฃng lแป›n.
  • Tรนy biแบฟn avatar theo dแปฏ liแป‡u ngฦฐแปi dรนng kรฉm linh hoแบกt hฦกn.

Gแปฃi รฝ lแปฑa chแปn theo mแปฅc tiรชu

  • Avatar hแป™i thoแบกi trแปฑc tiแบฟp (support, tฦฐ vแบฅn, onboarding): ฦฐu tiรชn D-ID API.
  • Dแป‹ch video/lip-sync ฤ‘a ngรดn ngแปฏ, sแบฃn xuแบฅt nแป™i dung marketing: ฦฐu tiรชn HeyGen API.
  • Nhรขn viรชn แบฃo dรนng dแปฏ liแป‡u riรชng (RAG/agent): D-ID API.
  • ฤร o tแบกo nแป™i bแป™ ฤ‘a ngรดn ngแปฏ & xuแบฅt bแบฃn hร ng loแบกt: HeyGen API.
  • Giแบฃi phรกp kแบฟt hแปฃp: D-ID cho realtime chat; HeyGen cho video ฤ‘ร o tแบกo/marketing.

Khuyแบฟn nghแป‹ triแปƒn khai kแปน thuแบญt

  1. Xรกc ฤ‘แป‹nh luแป“ng chรญnh: realtime (WebRTC) hay batch (render video).
  2. Quy hoแบกch chi phรญ: ฦฐแป›c tรญnh ฤ‘แป™ dร i video, sแป‘ ngรดn ngแปฏ, lฦฐu lฦฐแปฃng concurrent.
  3. Kiแบฟn trรบc tรญch hแปฃp: tรกch microservice render/video queue; bแบญt CDN cho file xuแบฅt.
  4. Bแบฃo mแบญt & quyแปn riรชng tฦฐ: mรฃ hรณa dแปฏ liแป‡u, kiแปƒm soรกt API key/secret, nhแบญt kรฝ truy cแบญp.
  5. ฤo lฦฐแปng chแบฅt lฦฐแปฃng: ฤ‘แบทt KPI cho lip-sync, ฤ‘แป™ trแป… realtime, tแป‰ lแป‡ render thร nh cรดng.

Fine-Tuning GPT-OSS-20B on Google Colab Using Unsloth and LoRA

1. Introduction

In todayโ€™s rapidly advancing field of AI, the use of AI models โ€” or more specifically, running them on personal computers โ€” has become more common than ever.
However, some AI models have become increasingly difficult to use because the training data required for them is massive, often involving millions of parameters.
This makes it nearly impossible for low-end computers to use them effectively for work or projects.

Therefore, in this article, we will explore Google Colab together with Unslothโ€™s fine-tuning tool, combined with LoRA, to fine-tune and use gpt-oss-20b according to our own needs.


2. Main Content

a. What is Unsloth?

  • Unsloth is a modern Python library designed to speed up and optimize the fine-tuning of large language models (LLMs) such as LLaMA, Mistral, Mixtral, and others.
    It makes model training and fine-tuning extremely fast, memory-efficient, and easy โ€” even on limited hardware like a single GPU or consumer-grade machines.

b. What is Colab?

  • Colab is a hosted Jupyter Notebook service that requires no setup and provides free access to computing resources, including GPUs and TPUs.
    It is particularly well-suited for machine learning, data science, and education purposes.

c. What is LoRA?

  • Low-Rank Adaptation (LoRA) is a technique for quickly adapting machine learning models to new contexts.
    LoRA helps make large and complex models more suitable for specific tasks. It works by adding lightweight layers to the original model rather than modifying the entire architecture.
    This allows developers to quickly expand and specialize machine learning models for various applications.

3. Using Colab to Train gpt-oss-20b

– Installing the Libraries

!pip install --upgrade -qqq uv

try:
    import numpy
    install_numpy = f"numpy=={numpy.__version__}"
except:
    install_numpy = "numpy"

!uv pip install -qqq \
  "torch>=2.8.0" "triton>=3.4.0" {install_numpy} \
  "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
  "unsloth[base] @ git+https://github.com/unslothai/unsloth" \
  torchvision bitsandbytes \
  git+https://github.com/huggingface/[email protected] \
  git+https://github.com/triton-lang/triton.git@05b2c186c1b6c9a08375389d5efe9cb4c401c075#subdirectory=python/triton_kernels

– After completing the installation, load the gpt-oss-20b model from Unsloth:

from unsloth import FastLanguageModel
import torch

max_seq_length = 1024
dtype = None
model_name = "unsloth/gpt-oss-20b"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    dtype = dtype,                 # None for auto detection
    max_seq_length = max_seq_length,  # Choose any for long context!
    load_in_4bit = True,           # 4 bit quantization to reduce memory
    full_finetuning = False,       # [NEW!] We have full finetuning now!
    # token = "hf_...",            # use one if using gated models
)
Colab install output

– Adding LoRA for Fine-Tuning

model = FastLanguageModel.get_peft_model(
    model,
    r = 8,  # Choose any number > 0! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,              # Optimized fast path
    bias = "none",                 # Optimized fast path
    # "unsloth" uses less VRAM, fits larger batches
    use_gradient_checkpointing = "unsloth",  # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)
Tip: If you hit out-of-memory (OOM), reduce max_seq_length, set a smaller r, or increase gradient_accumulation_steps.

– Testing the Model Before Fine-Tuning

Now, letโ€™s test how the model responds before fine-tuning:

messages = [
    {"role": "system", "content": "Bแบกn lร  Shark B, mแป™t nhร  ฤ‘แบงu tฦฐ nแป•i tiแบฟng, thแบณng thแบฏn vร  thแปฑc tแบฟ", "thinking": None},
    {"role": "user", "content": "Bแบกn hรฃy giแป›i thiแป‡u bแบฃn thรขn"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "low",
).to(model.device)

from transformers import TextStreamer
_ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))
Generation preview

– Load data for finetune model

Dataset sample

Dataset preview
def formatting_prompts_func(examples):
    convos = examples["messages"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }

from datasets import load_dataset
dataset = load_dataset("json", data_files="data.jsonl", split="train")
dataset
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True)

– Train model

The following code snippet defines the configuration and setup for the fine-tuning process.
Here, we use SFTTrainer and SFTConfig from the trl library to perform Supervised Fine-Tuning (SFT) on our model.
The configuration specifies parameters such as batch size, learning rate, optimizer type, and number of training epochs.

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = SFTConfig(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1,  # Set this for 1 full training run.
        # max_steps = 30,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",  # Use this for WandB etc.
    ),
)

trainer_stats = trainer.train()

– After training, try the fine-tuned model

# Example reload (set to True to run)
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "finetuned_model",  # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 1024,
        dtype = None,
        load_in_4bit = True,
    )

    messages = [
        {"role": "system", "content": "Bแบกn lร  Shark B, mแป™t nhร  ฤ‘แบงu tฦฐ nแป•i tiแบฟng, thแบณng thแบฏn vร  thแปฑc tแบฟ", "thinking": None},
        {"role": "user", "content": "Bแบกn hรฃy giแป›i thiแป‡u bแบฃn thรขn"},
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt = True,
        return_tensors = "pt",
        return_dict = True,
        reasoning_effort = "low",
    ).to(model.device)

    from transformers import TextStreamer
    _ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))
Note: Replace finetuned_model with your actual model path (e.g., outputs or the directory you saved/merged adapters to).

Colab notebook: Open your Colab here.


4. Conclusion & Next Steps

By combining Unsloth (for speed and memory efficiency), LoRA (for lightweight adaptation), and Google Colab (for accessible compute), you can fine-tune gpt-oss-20b even on modest hardware. The workflow above helps you:

  • Install a reproducible environment with optimized kernels.
  • Load gpt-oss-20b in 4-bit to reduce VRAM usage.
  • Attach LoRA adapters to train only a small set of parameters.
  • Prepare chat-style datasets and run supervised fine-tuning with TRLโ€™s SFTTrainer.
  • Evaluate before/after to confirm your improvements.
Open the Colab
Clone the notebook, plug in your dataset, and fine-tune your own assistant in minutes.

Context Engineering: Chรฌa khรณa xรขy dแปฑng AI Agent hiแป‡u quแบฃ

Context Engineering lร  chiแบฟn lฦฐแปฃc quแบฃn lรฝ toร n bแป™ thรดng tin (context) cho AI Agent, khรกc vแป›i Prompt Engineering. Tรฌm hiแปƒu 3 kแปน thuแบญt cแป‘t lรตi giรบp Agent thรดng minh hฦกn vร  duy trรฌ sแปฑ tแบญp trung dร i hแบกn.

Trong nhแปฏng nฤƒm ฤ‘แบงu cแปงa kแปท nguyรชn AI tแบกo sinh, Prompt Engineering tแปซng lร  kแปน nฤƒng ฤ‘ฦฐแปฃc sฤƒn ฤ‘รณn, giรบp chรบng ta tรฌm ra nhแปฏng tแปซ ngแปฏ vร  cแบฅu trรบc tแป‘t nhแบฅt ฤ‘แปƒ khai thรกc sแปฉc mแบกnh cแปงa mรด hรฌnh ngรดn ngแปฏ lแป›n (LLM). Tuy nhiรชn, khi chรบng ta chuyแปƒn tแปซ cรกc tรกc vแปฅ mแป™t lแบงn (one-shot task) sang xรขy dแปฑng cรกc AI Agent (Tรกc nhรขn AI) cรณ khแบฃ nฤƒng hoแบกt ฤ‘แป™ng tแปฑ chแปง, thแปฑc hiแป‡n nhiแปu bฦฐแป›c vร  ghi nhแป› thรดng tin trong thแปi gian dร i, mแป™t khรกi niแป‡m mแป›i ฤ‘รฃ nแป•i lรชn vร  trแปŸ nรชn quan trแปng hฦกn: Context Engineering (Kแปน thuแบญt Ngแปฏ cแบฃnh).
Bร i viแบฟt nร y sแบฝ lร m rรต Context Engineering lร  gรฌ, nรณ khรกc biแป‡t nhฦฐ thแบฟ nร o so vแป›i Prompt Engineering vร  nhแปฏng chiแบฟn lฦฐแปฃc cแป‘t lรตi mร  cรกc kแปน sฦฐ AI tแบกi Anthropic ฤ‘ang รกp dแปฅng ฤ‘แปƒ xรขy dแปฑng cรกc Agent thรดng minh vร  ฤ‘รกng tin cแบญy.

1. Context Engineering lร  gรฌ?

Context (Ngแปฏ cแบฃnh) ฤ‘แป cแบญp ฤ‘แบฟn toร n bแป™ tแบญp hแปฃp cรกc tokens ฤ‘ฦฐแปฃc ฤ‘ฦฐa vร o khi lแบฅy mแบซu (sampling) tแปซ mแป™t LLM. Nรณ lร  nguแป“n tร i nguyรชn quan trแปng, nhฦฐng cรณ giแป›i hแบกn, cung cแบฅp cho mรด hรฌnh mแปi thแปฉ nรณ cแบงn ฤ‘แปƒ ฤ‘ฦฐa ra quyแบฟt ฤ‘แป‹nh hoแบทc tแบกo ra ฤ‘แบงu ra mong muแป‘n.

Context Engineering (CE) lร  tแบญp hแปฃp cรกc chiแบฟn lฦฐแปฃc nhแบฑm quแบฃn lรฝ vร  tแป‘i ฦฐu hรณa tiแป‡n รญch cแปงa cรกc tokens ฤ‘รณ, chแป‘ng lแบกi cรกc giแป›i hแบกn cแป‘ hแปฏu cแปงa LLM (nhฦฐ cแปญa sแป• ngแปฏ cแบฃnh giแป›i hแบกn), nhแบฑm mแปฅc ฤ‘รญch:

Tรฌm ra cแบฅu hรฌnh ngแปฏ cแบฃnh nร o cรณ khแบฃ nฤƒng tแบกo ra hร nh vi mong muแป‘n cแปงa mรด hรฌnh nhแบฅt. Nรณi cรกch khรกc, CE khรดng chแป‰ lร  vแป viแป‡c bแบกn viแบฟt gรฌ trong prompt, mร  lร  vแป viแป‡c bแบกn sแบฏp xแบฟp vร  duy trรฌ toร n bแป™ trแบกng thรกi thรดng tin cรณ sแบตn cho LLM tแบกi bแบฅt kแปณ thแปi ฤ‘iแปƒm nร o.

2. Khรกc biแป‡t cแป‘t lรตi giแปฏa Context Engineering vร  Prompt Engineering

Anthropic xem Context Engineering lร  sแปฑ tiแบฟn hรณa tแปฑ nhiรชn cแปงa Prompt Engineering.

| Tiรชu chรญ | Prompt Engineering | Context Engineering |
| ————- | ——————————– | —————————————- |
| **Trแปng tรขm** | Viแบฟt hฦฐแป›ng dแบซn (prompt) hiแป‡u quแบฃ | Quแบฃn lรฝ toร n bแป™ ngแปฏ cแบฃnh cแปงa mรด hรฌnh |
| **Phแบกm vi** | Mแป™t tรกc vแปฅ ฤ‘ฦกn lแบป | Nhiแปu vรฒng tฦฐฦกng tรกc, trแบกng thรกi dร i hแบกn |
| **Cรกch lร m** | Tแป‘i ฦฐu tแปซng cรขu | Tแป‘i ฦฐu toร n bแป™ luแป“ng thรดng tin |
| **Khi dรนng** | Mแป™t cรขu hแปi โ€“ mแป™t cรขu trแบฃ lแปi | Agent tแปฑ hoแบกt ฤ‘แป™ng, tแปฑ hแปc, tแปฑ nhแป› |

Prompt Engineering ฤ‘แป cแบญp ฤ‘แบฟn cรกc phฦฐฦกng phรกp viแบฟt vร  tแป• chแปฉc hฦฐแป›ng dแบซn cho mรด hรฌnh ngรดn ngแปฏ lแป›n (LLM) nhแบฑm ฤ‘แบกt ฤ‘ฦฐแปฃc kแบฟt quแบฃ tแป‘i ฦฐu (bแบกn cรณ thแปƒ tham khแบฃo thรชm trong tร i liแป‡u hฦฐแป›ng dแบซn cแปงa chรบng tรดi vแป cรกc chiแบฟn lฦฐแปฃc Prompt Engineering hiแป‡u quแบฃ).

Trong khi ฤ‘รณ, Context Engineering lร  tแบญp hแปฃp cรกc chiแบฟn lฦฐแปฃc nhแบฑm lแปฑa chแปn vร  duy trรฌ tแบญp hแปฃp token (thรดng tin) tแป‘i ฦฐu trong quรก trรฌnh suy luแบญn (inference) cแปงa LLM โ€” bao gแป“m toร n bแป™ thรดng tin khรกc cรณ thแปƒ ฤ‘ฦฐแปฃc ฤ‘ฦฐa vร o ngแปฏ cแบฃnh, khรดng chแป‰ riรชng phแบงn prompt.

Trong giai ฤ‘oแบกn ฤ‘แบงu cแปงa viแป‡c phรกt triแปƒn แปฉng dแปฅng vแป›i LLM, prompting chiแบฟm phแบงn lแป›n cรดng viแป‡c cแปงa kแปน sฦฐ AI, vรฌ phแบงn lแป›n cรกc trฦฐแปng hแปฃp sแปญ dแปฅng (ngoร i trรฒ chuyแป‡n thรดng thฦฐแปng) yรชu cแบงu prompt ฤ‘ฦฐแปฃc tแป‘i ฦฐu cho cรกc tรกc vแปฅ mแป™t lแบงn nhฦฐ phรขn loแบกi hoแบทc sinh vฤƒn bแบฃn.

ฤรบng nhฦฐ tรชn gแปi, trแปng tรขm chรญnh cแปงa Prompt Engineering lร  cรกch viแบฟt prompt hiแป‡u quแบฃ, ฤ‘แบทc biแป‡t lร  system prompt (hฦฐแป›ng dแบซn hแป‡ thแป‘ng).
Tuy nhiรชn, khi chรบng ta tiแบฟn tแป›i viแป‡c xรขy dแปฑng cรกc tรกc nhรขn AI (AI Agents) cรณ khแบฃ nฤƒng mแบกnh mแบฝ hฦกn โ€” hoแบกt ฤ‘แป™ng qua nhiแปu vรฒng suy luแบญn (multi-turn inference) vร  thแปi gian dร i hฦกn (long-horizon tasks) โ€” chรบng ta cแบงn cรณ cรกc chiแบฟn lฦฐแปฃc ฤ‘แปƒ quแบฃn lรฝ toร n bแป™ trแบกng thรกi ngแปฏ cแบฃnh, bao gแป“m:
– System instructions (hฦฐแป›ng dแบซn hแป‡ thแป‘ng)
– Tools (cรดng cแปฅ mร  agent cรณ thแปƒ gแปi)
– Model Context Protocol (MCP)
– Dแปฏ liแป‡u bรชn ngoร i
– Lแป‹ch sแปญ tin nhแบฏn

Mแป™t Agent hoแบกt ฤ‘แป™ng theo vรฒng lแบทp sแบฝ liรชn tแปฅc tแบกo ra ngร y cร ng nhiแปu dแปฏ liแป‡u cรณ thแปƒ liรชn quan ฤ‘แบฟn cรกc vรฒng suy luแบญn tiแบฟp theo. Nhแปฏng thรดng tin nร y phแบฃi ฤ‘ฦฐแปฃc tinh lแปc mแป™t cรกch tuแบงn hoร n ฤ‘แปƒ giแปฏ lแบกi cรกc phแบงn quan trแปng nhแบฅt.
Context Engineering chรญnh lร  nghแป‡ thuแบญt vร  khoa hแปc cแปงa viแป‡c chแปn lแปc nhแปฏng gรฌ sแบฝ ฤ‘ฦฐแปฃc ฤ‘ฦฐa vร o cแปญa sแป• ngแปฏ cแบฃnh giแป›i hแบกn tแปซ โ€œvลฉ trแปฅ thรดng tinโ€ liรชn tแปฅc mแปŸ rแป™ng ฤ‘รณ.

3. Cรกc yแบฟu tแป‘ cแป‘t lรตi cแบงn chรบ รฝ khi phรกt triแปƒn AI Agent
Nguyรชn tแบฏc vร ng cแปงa Context Engineering lร : Tรฌm tแบญp hแปฃp tokens cรณ tรญn hiแป‡u cao (high-signal tokens) nhแป nhแบฅt ฤ‘แปƒ tแป‘i ฤ‘a hรณa xรกc suแบฅt ฤ‘แบกt ฤ‘ฦฐแปฃc kแบฟt quแบฃ mong muแป‘n.

3.1. Coi Context lร  Tร i nguyรชn Hแปฏu hแบกn
Cรกc nghiรชn cแปฉu cho thแบฅy, giแป‘ng nhฦฐ con ngฦฐแปi cรณ giแป›i hแบกn bแป™ nhแป› lร m viแป‡c (working memory), LLM cลฉng cรณ mแป™t “ngรขn sรกch chรบ รฝ” (Attention Budget) vร  gแบทp hiแป‡n tฦฐแปฃng Context Rot (khแบฃ nฤƒng nhแป› lแบกi thรดng tin giแบฃm khi sแป‘ lฦฐแปฃng tokens tฤƒng lรชn).

Do ฤ‘รณ, cรกc kแปน sฦฐ cแบงn:
– Tแป‘i giแบฃn hรณa: Chแป‰ ฤ‘ฦฐa vร o thรดng tin thแปฑc sแปฑ cแบงn thiแบฟt.
– Tinh gแปn Tools: Thiแบฟt kแบฟ cรกc cรดng cแปฅ (Tools) khรดng bแป‹ chแป“ng chรฉo chแปฉc nฤƒng, rรต rร ng vร  tแบกo thร nh mแป™t bแป™ tแป‘i thiแปƒu ฤ‘แปƒ trรกnh gรขy mฦก hแป“ cho Agent khi ra quyแบฟt ฤ‘แป‹nh.
Sแปญ dแปฅng vรญ dแปฅ (Few-shot) chแปn lแปc: Thay vรฌ nhแป“i nhรฉt mแป™t danh sรกch dร i cรกc trฦฐแปng hแปฃp biรชn, hรฃy chแปn lแปc cรกc vรญ dแปฅ ฤ‘iแปƒn hรฌnh, ฤ‘a dแบกng (canonical examples) ฤ‘แปƒ minh hแปa hร nh vi mong ฤ‘แปฃi.

3.2. Tแป‘i ฦฐu Hฦฐแป›ng dแบซn Hแป‡ thแป‘ng (System Prompts)
Prompt ban ฤ‘แบงu lร  mแป™t phแบงn khรดng thแปƒ thiแบฟu cแปงa ngแปฏ cแบฃnh. Nรณ cแบงn ฤ‘แบกt ฤ‘แบฟn “ฤ‘แป™ cao phรน hแปฃp” (Right Altitude) โ€“ trแบกng thรกi cรขn bแบฑng hoร n hแบฃo:
– Trรกnh quรก cแปฉng nhแบฏc: Khรดng nรชn mรฃ hรณa logic phแปฉc tแบกp, dแป… gรฃy (brittle, hardcoded logic) vร o prompt.
– Trรกnh quรก mฦก hแป“: Khรดng cung cแบฅp hฦฐแป›ng dแบซn quรก chung chung, thiแบฟu tรญn hiแป‡u cแปฅ thแปƒ.
– Tแป‘i ฦฐu: Sแปญ dแปฅng ngรดn ngแปฏ ฤ‘ฦกn giแบฃn, trแปฑc tiแบฟp. Tแป• chแปฉc prompt thร nh cรกc phแบงn riรชng biแป‡t (, ) bแบฑng thแบป XML hoแบทc Markdown ฤ‘แปƒ mรด hรฌnh dแป… dร ng phรขn tรกch thรดng tin.

3.3. Chiแบฟn lฦฐแปฃc quแบฃn lรฝ Ngแปฏ cแบฃnh cho Tรกc vแปฅ dร i hแบกn (Long-Horizon Tasks)
ฤแป‘i vแป›i cรกc Agent cแบงn hoแบกt ฤ‘แป™ng liรชn tแปฅc trong thแปi gian dร i (nhฦฐ di chuyแปƒn codebase lแป›n, nghiรชn cแปฉu chuyรชn sรขu), vฦฐแปฃt quรก giแป›i hแบกn cแปงa cแปญa sแป• ngแปฏ cแบฃnh, Context Engineering cung cแบฅp ba kแปน thuแบญt chรญnh:

Vรฌ sao Context Engineering lแบกi quan trแปng trong viแป‡c xรขy dแปฑng AI Agent mแบกnh mแบฝ

Mแบทc dรน cรกc mรด hรฌnh ngรดn ngแปฏ lแป›n (LLM) cรณ tแป‘c ฤ‘แป™ xแปญ lรฝ cao vร  khแบฃ nฤƒng quแบฃn lรฝ khแป‘i lฦฐแปฃng dแปฏ liแป‡u ngร y cร ng lแป›n, nhฦฐng chรบng โ€“ giแป‘ng nhฦฐ con ngฦฐแปi โ€“ vแบซn cรณ giแป›i hแบกn vแป khแบฃ nฤƒng tแบญp trung vร  dแป… bแป‹ โ€œrแป‘i loแบกn thรดng tinโ€ khi ngแปฏ cแบฃnh trแปŸ nรชn quรก lแป›n. Cรกc nghiรชn cแปฉu dแบกng โ€œneedle-in-a-haystackโ€ (tรฌm kim trong ฤ‘แป‘ng rฦกm) ฤ‘รฃ phรกt hiแป‡n ra mแป™t hiแป‡n tฦฐแปฃng gแปi lร  context rot โ€” tแปฉc lร  khi sแป‘ lฦฐแปฃng token trong cแปญa sแป• ngแปฏ cแบฃnh tฤƒng lรชn, khแบฃ nฤƒng cแปงa mรด hรฌnh trong viแป‡c ghi nhแป› vร  truy xuแบฅt chรญnh xรกc thรดng tin tแปซ ngแปฏ cแบฃnh ฤ‘รณ lแบกi giแบฃm xuแป‘ng.

1. Context lร  tร i nguyรชn cรณ giแป›i hแบกn

Dรน mแป™t sแป‘ mรด hรฌnh cรณ thแปƒ suy giแบฃm chแบญm hฦกn, nhฦฐng hiแป‡n tฦฐแปฃng nร y xแบฃy ra แปŸ tแบฅt cแบฃ cรกc LLM. Vรฌ vแบญy, ngแปฏ cแบฃnh phแบฃi ฤ‘ฦฐแปฃc xem nhฦฐ mแป™t tร i nguyรชn hแปฏu hแบกn, cรณ lแปฃi รญch giแบฃm dแบงn theo tแปซng token thรชm vร o.Giแป‘ng nhฦฐ con ngฦฐแปi chแป‰ cรณ mแป™t dung lฦฐแปฃng bแป™ nhแป› lร m viแป‡c (working memory) nhแบฅt ฤ‘แป‹nh, LLM cลฉng cรณ โ€œngรขn sรกch chรบ รฝโ€ (attention budget) mร  nรณ sแปญ dแปฅng khi xแปญ lรฝ khแป‘i lฦฐแปฃng lแป›n ngแปฏ cแบฃnh. Mแป—i token mแป›i ฤ‘ฦฐแปฃc thรชm vร o ฤ‘แปu โ€œtiรชu tแป‘nโ€ mแป™t phแบงn ngรขn sรกch ฤ‘รณ, khiแบฟn viแป‡c chแปn lแปc thรดng tin ฤ‘ฦฐa vร o mรด hรฌnh trแปŸ nรชn vรด cรนng quan trแปng.

2. Giแป›i hแบกn bแบฏt nguแป“n tแปซ kiแบฟn trรบc Transformer

Nguแป“n gแป‘c cแปงa sแปฑ khan hiแบฟm โ€œchรบ รฝโ€ nร y nแบฑm แปŸ kiแบฟn trรบc Transformer โ€“ nแปn tแบฃng cแปงa cรกc LLM hiแป‡n nay. Trong kiแบฟn trรบc nร y, mแป—i token cรณ thแปƒ โ€œchรบ รฝโ€ ฤ‘แบฟn mแปi token khรกc trong toร n bแป™ ngแปฏ cแบฃnh, tแบกo ra nยฒ mแป‘i quan hแป‡ cแบทp ฤ‘รดi cho n token. Khi ฤ‘แป™ dร i ngแปฏ cแบฃnh tฤƒng lรชn: Khแบฃ nฤƒng cแปงa mรด hรฌnh trong viแป‡c duy trรฌ cรกc mแป‘i quan hแป‡ nร y bแป‹ kรฉo cฤƒng, dแบซn ฤ‘แบฟn sแปฑ ฤ‘รกnh ฤ‘แป•i tแปฑ nhiรชn giแปฏa kรญch thฦฐแป›c ngแปฏ cแบฃnh vร  ฤ‘แป™ tแบญp trung cแปงa sแปฑ chรบ รฝ. Ngoร i ra, LLM ฤ‘ฦฐแปฃc huแบฅn luyแป‡n chแปง yแบฟu trรชn cรกc chuแป—i ngแบฏn, vรฌ vแบญy chรบng cรณ รญt kinh nghiแป‡m vร  รญt tham sแป‘ chuyรชn biแป‡t hฦกn cho cรกc mแป‘i quan hแป‡ phแปฅ thuแป™c dร i hแบกn trรชn toร n ngแปฏ cแบฃnh.

3. Giแบฃi phรกp kแปน thuแบญt giรบp mแปŸ rแป™ng ngแปฏ cแบฃnh (nhฦฐng khรดng hoร n hแบฃo)

Mแป™t sแป‘ kแปน thuแบญt nhฦฐ position encoding interpolation (nแป™i suy mรฃ hรณa vแป‹ trรญ) giรบp mรด hรฌnh xแปญ lรฝ chuแป—i dร i hฦกn bแบฑng cรกch thรญch แปฉng chรบng vแป›i phแบกm vi ngแปฏ cแบฃnh ngแบฏn hฦกn mร  mรด hรฌnh ฤ‘รฃ ฤ‘ฦฐแปฃc huแบฅn luyแป‡n. Tuy nhiรชn, ฤ‘iแปu nร y cรณ thแปƒ lร m giแบฃm ฤ‘แป™ chรญnh xรกc trong viแป‡c hiแปƒu vแป‹ trรญ token, khiแบฟn hiแป‡u nฤƒng giแบฃm dแบงn chแปฉ khรดng sแปฅp ฤ‘แป• hoร n toร n.

Kแบฟt quแบฃ lร : Mรด hรฌnh vแบซn hoแบกt ฤ‘แป™ng tแป‘t vแป›i ngแปฏ cแบฃnh dร i, nhฦฐng cรณ thแปƒ mแบฅt ฤ‘แป™ chรญnh xรกc trong viแป‡c truy xuแบฅt thรดng tin hoแบทc suy luแบญn dร i hแบกn, so vแป›i khi lร m viแป‡c vแป›i ngแปฏ cแบฃnh ngแบฏn hฦกn.

Giแบฃi phแบซu cแปงa mแป™t ngแปฏ cแบฃnh hiแป‡u quแบฃ

Vรฌ cรกc mรด hรฌnh ngรดn ngแปฏ lแป›n (LLM) bแป‹ giแป›i hแบกn bแปŸi โ€œngรขn sรกch chรบ รฝโ€ (attention budget) hแปฏu hแบกn, kแปน thuแบญt xรขy dแปฑng ngแปฏ cแบฃnh hiแป‡u quแบฃ lร  tรฌm ra tแบญp hแปฃp nhแป nhแบฅt cแปงa cรกc token cรณ giรก trแป‹ cao (high-signal tokens) โ€” tแปฉc lร  nhแปฏng phแบงn thรดng tin cรด ฤ‘แปng, quan trแปng โ€” sao cho tแป‘i ฤ‘a hรณa khแบฃ nฤƒng ฤ‘แบกt ฤ‘ฦฐแปฃc kแบฟt quแบฃ mong muแป‘n.
Tuy nhiรชn, viแป‡c รกp dแปฅng nguyรชn tแบฏc nร y trong thแปฑc tแบฟ khรดng hแป ฤ‘ฦกn giแบฃn. Dฦฐแป›i ฤ‘รขy lร  nhแปฏng hฦฐแป›ng dแบซn cแปฅ thแปƒ vแป cรกch รกp dแปฅng nรณ cho cรกc thร nh phแบงn khรกc nhau trong ngแปฏ cแบฃnh:

1. System prompt โ€” phแบฃi cแปฑc kแปณ rรต rร ng vร  ฤ‘รบng โ€œฤ‘แป™ caoโ€

Phแบงn system prompt nรชn ฤ‘ฦฐแปฃc viแบฟt bแบฑng ngรดn ngแปฏ ฤ‘ฦกn giแบฃn, trแปฑc tiแบฟp, truyแปn ฤ‘แบกt รฝ tฦฐแปŸng แปŸ ฤ‘รบng โ€œฤ‘แป™ caoโ€ (altitude) phรน hแปฃp cho tรกc nhรขn (agent).
โ€œฤแป™ cao phรน hแปฃpโ€ แปŸ ฤ‘รขy chรญnh lร  vรนng Goldilocks โ€” khรดng quรก cแปฅ thแปƒ, cลฉng khรดng quรก mฦก hแป“. Hai sai lแบงm phแป• biแบฟn khi viแบฟt system prompt lร :

Quรก chi tiแบฟt:
Mแป™t sแป‘ kแปน sฦฐ cแป‘ gแบฏng mรฃ hรณa nhแปฏng logic phแปฉc tแบกp vร o trong prompt ฤ‘แปƒ ฤ‘iแปu khiแปƒn hร nh vi cแปงa agent mแป™t cรกch chรญnh xรกc tuyแป‡t ฤ‘แป‘i. Cรกch lร m nร y dแป… gรฃy vร  khรณ bแบฃo trรฌ, vรฌ chแป‰ cแบงn thay ฤ‘แป•i nhแป cลฉng khiแบฟn toร n bแป™ hแป‡ thแป‘ng phแบฃn แปฉng sai.

Quรก chung chung:
Ngฦฐแปฃc lแบกi, cรณ nhแปฏng prompt chแป‰ cung cแบฅp hฦฐแป›ng dแบซn mฦก hแป“, khรดng ฤ‘ฦฐa ra tรญn hiแป‡u cแปฅ thแปƒ cho mรด hรฌnh vแป loแบกi kแบฟt quแบฃ mong ฤ‘แปฃi. Trong trฦฐแปng hแปฃp nร y, mรด hรฌnh giแบฃ ฤ‘แป‹nh sai ngแปฏ cแบฃnh chia sแบป vร  dแป… sinh ra phแบฃn hแป“i lแป‡ch hฦฐแป›ng.

***** Giแบฃi phรกp tแป‘i ฦฐu ******
Tแบกo prompt แปŸ โ€œฤ‘แป™ cao vแปซa phแบฃiโ€ โ€” ฤ‘แปง cแปฅ thแปƒ ฤ‘แปƒ hฦฐแป›ng dแบซn hร nh vi rรต rร ng, nhฦฐng ฤ‘แปง linh hoแบกt ฤ‘แปƒ mรด hรฌnh cรณ thแปƒ suy luแบญn vร  thรญch แปฉng.
Nรณi cรกch khรกc, hรฃy ฤ‘ฦฐa ra heuristics mแบกnh (nguyรชn tแบฏc ฤ‘แป‹nh hฦฐแป›ng) thay vรฌ โ€œkแป‹ch bแบฃn cแปฉng nhแบฏcโ€.

2. Cแบฅu trรบc prompt rรต rร ng, gแปn gร ng

Chรบng tรดi khuyแบฟn khรญch tแป• chแปฉc prompt thร nh cรกc phแบงn riรชng biแป‡t, vรญ dแปฅ nhฦฐ:

## Tool guidance
## Output description

Bแบกn cรณ thแปƒ dรนng XML tag hoแบทc tiรชu ฤ‘แป Markdown ฤ‘แปƒ phรขn tรกch rรต rร ng tแปซng phแบงn.Tuy nhiรชn, khi cรกc mรด hรฌnh ngร y cร ng thรดng minh hฦกn, cรกch ฤ‘แป‹nh dแบกng cรณ thแปƒ sแบฝ dแบงn รญt quan trแปng hฦกn โ€” trแปng tรขm vแบซn lร  nแป™i dung vร  tรญnh rรต rร ng giแปฏ lฦฐแปฃng thรดng tin แปŸ mแปฉc โ€œtแป‘i thiแปƒu ฤ‘แบงy ฤ‘แปงโ€

Bแบฅt kแปƒ bแบกn chแปn cแบฅu trรบc nhฦฐ thแบฟ nร o, mแปฅc tiรชu chรญnh lร :
โ€œCung cแบฅp lฦฐแปฃng thรดng tin nhแป nhแบฅt nhฦฐng vแบซn ฤ‘แปง ฤ‘แปƒ mรด hรฌnh hiแปƒu vร  thแปฑc hiแป‡n ฤ‘รบng hร nh vi mong muแป‘n.โ€
โ€œTแป‘i thiแปƒuโ€ khรดng cรณ nghฤฉa lร  ngแบฏn gแปn ฤ‘แบฟn mแปฉc thiแบฟu thรดng tin. Agent vแบซn cแบงn ฤ‘ฦฐแปฃc cung cแบฅp ฤ‘แบงy ฤ‘แปง dแปฏ kiแป‡n ban ฤ‘แบงu ฤ‘แปƒ hร nh xแปญ ฤ‘รบng.

Cรกch lร m tแป‘t nhแบฅt:

Bแบฏt ฤ‘แบงu bแบฑng mแป™t prompt tแป‘i giแบฃn, thแปญ nghiแป‡m nรณ vแป›i mรด hรฌnh tแป‘t nhแบฅt hiแป‡n cรณ, sau ฤ‘รณ bแป• sung hฦฐแป›ng dแบซn hoแบทc vรญ dแปฅ cแปฅ thแปƒ dแปฑa trรชn nhแปฏng lแป—i phรกt sinh trong giai ฤ‘oแบกn thแปญ nghiแป‡m ฤ‘แบงu tiรชn.
Thiแบฟt kแบฟ cรดng cแปฅ (Tools) cho Agent cรกc cรดng cแปฅ cho phรฉp agent tฦฐฦกng tรกc vแป›i mรดi trฦฐแปng vร  lแบฅy thรชm ngแปฏ cแบฃnh mแป›i trong quรก trรฌnh lร m viแป‡c.

Vรฌ cรดng cแปฅ lร  โ€œhแปฃp ฤ‘แป“ngโ€ giแปฏa agent vร  thแบฟ giแป›i bรชn ngoร i, nรชn viแป‡c thiแบฟt kแบฟ chรบng cแบงn ฦฐu tiรชn hiแป‡u quแบฃ, cแปฅ thแปƒ:
Trแบฃ vแป thรดng tin mแป™t cรกch tiแบฟt kiแป‡m token, hฦฐแป›ng dแบซn hร nh vi cแปงa agent sao cho hiแป‡u quแบฃ vร  hแปฃp lรฝ.
Trong bร i โ€œWriting tools for AI agents โ€“ with AI agentsโ€, Anthropic khuyแบฟn nghแป‹ rแบฑng:
– Cรดng cแปฅ nรชn dแป… hiแปƒu ฤ‘แป‘i vแป›i mรด hรฌnh,
– Cรณ รญt chแป“ng chรฉo chแปฉc nฤƒng,
– Giแป‘ng nhฦฐ cรกc hร m trong codebase tแป‘t โ€” tแปฑ chแปฉa, rรต rร ng, chแป‹u lแป—i tแป‘t,

Cรกc tham sแป‘ ฤ‘แบงu vร o nรชn mรด tแบฃ rรต rร ng, khรดng nhแบญp nhแบฑng, vร  phรน hแปฃp vแป›i khแบฃ nฤƒng cแปงa mรด hรฌnh.

Sai lแบงm phแป• biแบฟn:
Mแป™t bแป™ cรดng cแปฅ โ€œphรฌnh toโ€ quรก mแปฉc โ€” chแปฉa quรก nhiแปu chแปฉc nฤƒng hoแบทc khiแบฟn agent bแป‘i rแป‘i khi chแปn cรดng cแปฅ nร o ฤ‘แปƒ dรนng.
Nแบฟu mแป™t kแปน sฦฐ con ngฦฐแปi cรฒn khรดng chแบฏc nรชn dรนng cรดng cแปฅ nร o, thรฌ ฤ‘แปซng mong mแป™t AI agent lร m tแป‘t hฦกn.

Giแบฃi phรกp: Xรขy dแปฑng mแป™t tแบญp cรดng cแปฅ tแป‘i thiแปƒu khแบฃ dแปฅng (minimal viable toolset) โ€” ฤ‘iแปu nร y giรบp viแป‡c bแบฃo trรฌ dแป… hฦกn vร  ngแปฏ cแบฃnh gแปn hฦกn trong cรกc tฦฐฦกng tรกc dร i hแบกn.

5. Vรญ dแปฅ minh hแปa (Few-shot prompting)

– Cung cแบฅp vรญ dแปฅ โ€” hay cรฒn gแปi lร  few-shot prompting โ€” lร  mแป™t thแปฑc hร nh tแป‘t ฤ‘รฃ ฤ‘ฦฐแปฃc chแปฉng minh qua thแปi gian.
Nhฦฐng: ฤแปซng โ€œnhแป“i nhรฉtโ€ hร ng loแบกt tรฌnh huแป‘ng ngoแบกi lแป‡ (edge cases) vร o prompt ฤ‘แปƒ cแป‘ gแบฏng bao phแปง mแปi quy tแบฏc cรณ thแปƒ xแบฃy ra.
Thay vร o ฤ‘รณ, hรฃy chแปn lแปc mแป™t bแป™ vรญ dแปฅ tiรชu biแปƒu, ฤ‘a dแบกng vร  mang tรญnh chuแบฉn mแปฑc (canonical), thแปƒ hiแป‡n hร nh vi mong muแป‘n cแปงa agent.
Vแป›i mแป™t mรด hรฌnh ngรดn ngแปฏ, โ€œmแป™t vรญ dแปฅ hay ฤ‘รกng giรก hฦกn cแบฃ ngร n dรฒng hฦฐแป›ng dแบซnโ€.
โ€“ Giแปฏ ngแปฏ cแบฃnh gแปn mร  tinh dรน bแบกn ฤ‘ang lร m viแป‡c vแป›i system prompt, cรดng cแปฅ, vรญ dแปฅ, hay lแป‹ch sแปญ hแป™i thoแบกi, hรฃy nhแป› nguyรชn tแบฏc vร ng: โ€œGiแปฏ cho ngแปฏ cแบฃnh cรณ thรดng tin, nhฦฐng chแบทt chแบฝ.โ€

Mแปฅc tiรชu cแปงa context engineering khรดng phแบฃi lร  nhแป“i nhรฉt dแปฏ liแป‡u,
mร  lร  chแปn lแปc thรดng minh โ€” sao cho mแป—i token ฤ‘แปu cรณ giรก trแป‹ ฤ‘รณng gรณp rรต rร ng.

Sแปฑ tiแบฟn hรณa cแปงa agent vร  tแบงm quan trแปng cแปงa ngแปฏ cแบฃnh. Khi cรกc mรด hรฌnh nแปn tแบฃng (base models) ngร y cร ng thรดng minh hฦกn, mแปฉc ฤ‘แป™ tแปฑ chแปง cแปงa agent cลฉng tฤƒng lรชn.
Mแป™t agent cรณ thแปƒ tแปฑ ฤ‘iแปu hฦฐแป›ng trong nhแปฏng khรดng gian vแบฅn ฤ‘แป phแปฉc tแบกp, phแปฅc hแป“i sau lแป—i, vร  tแปฑ hแปc tแปซ mรดi trฦฐแปng โ€” ฤ‘iแปu mร  trฦฐแป›c ฤ‘รขy phแบฃi dแปฑa vร o kแปน sฦฐ con ngฦฐแปi.

Cรนng vแป›i sแปฑ tiแบฟn hรณa ฤ‘รณ, tฦฐ duy thiแบฟt kแบฟ ngแปฏ cแบฃnh (context design) cลฉng thay ฤ‘แป•i.
Nแบฟu trฦฐแป›c ฤ‘รขy, nhiแปu แปฉng dแปฅng AI chแป‰ sแปญ dแปฅng kแปน thuแบญt truy xuแบฅt ngแปฏ cแบฃnh trฦฐแป›c khi suy luแบญn (pre-inference retrieval) โ€” vรญ dแปฅ, dรนng embeddings ฤ‘แปƒ lแบฅy ra nhแปฏng ฤ‘oแบกn thรดng tin quan trแปng trฦฐแป›c khi gแปญi vร o model โ€” thรฌ nay, xu hฦฐแป›ng mแป›i lร  โ€œjust-in-time context retrievalโ€.

– โ€œJust-in-timeโ€ โ€“ Cung cแบฅp ngแปฏ cแบฃnh ฤ‘รบng lรบc thay vรฌ tแบฃi trฦฐแป›c toร n bแป™ dแปฏ liแป‡u liรชn quan, cรกc agent hiแป‡n ฤ‘แบกi chแป‰ lฦฐu lแบกi nhแปฏng โ€œฤ‘แป‹nh danh nhแบนโ€ (lightweight identifiers) nhฦฐ:
– ฤฦฐแปng dแบซn tแป‡p (file paths),
– Cรขu truy vแบฅn ฤ‘รฃ lฦฐu (stored queries),
– Liรชn kแบฟt web (URLs), v.v.
=> Rแป“i khi cแบงn, agent sแบฝ tแปฑ ฤ‘แป™ng gแปi cรดng cแปฅ ฤ‘แปƒ tแบฃi dแปฏ liแป‡u vร o ngแปฏ cแบฃnh tแบกi thแปi ฤ‘iแปƒm runtime.

Vรญ dแปฅ:
๐Ÿ‘‰ Claude Code โ€“ giแบฃi phรกp โ€œagentic codingโ€ cแปงa Anthropic โ€“ sแปญ dแปฅng chiแบฟn lฦฐแปฃc nร y ฤ‘แปƒ phรขn tรญch dแปฏ liแป‡u phแปฉc tแบกp trรชn cฦก sแปŸ dแปฏ liแป‡u lแป›n. Thay vรฌ nแบกp toร n bแป™ dataset, mรด hรฌnh chแป‰ viแบฟt cรกc truy vแบฅn cรณ mแปฅc tiรชu, lฦฐu kแบฟt quแบฃ, vร  dรนng lแป‡nh Bash nhฦฐ head hay tail ฤ‘แปƒ xem xรฉt cรกc phแบงn cแบงn thiแบฟt.

Cรกch tiแบฟp cแบญn nร y bแบฏt chฦฐแป›c nhแบญn thแปฉc cแปงa con ngฦฐแปi:
chรบng ta khรดng ghi nhแป› toร n bแป™ dแปฏ liแป‡u, mร  dรนng hแป‡ thแป‘ng tแป• chแปฉc bรชn ngoร i โ€” nhฦฐ thฦฐ mแปฅc, hแป™p thฦฐ, hay bookmark โ€” ฤ‘แปƒ truy xuแบฅt ฤ‘รบng thรดng tin khi cแบงn. Metadata โ€“ Cแบฅu trรบc giรบp agent hiแปƒu ngแปฏ cแบฃnh: Khรดng chแป‰ tiแบฟt kiแป‡m dung lฦฐแปฃng, metadata cแปงa cรกc tแป‡p vร  tham chiแบฟu cรฒn cung cแบฅp tรญn hiแป‡u quan trแปng giรบp agent suy luแบญn.

Vรญ dแปฅ:
Mแป™t tแป‡p tรชn test_utils.py nแบฑm trong thฦฐ mแปฅc tests/ mang รฝ nghฤฉa khรกc hoร n toร n so vแป›i tแป‡p cรนng tรชn trong src/core_logic/.
Cแบฅu trรบc thฦฐ mแปฅc, quy ฦฐแป›c ฤ‘แบทt tรชn, vร  dแบฅu thแปi gian (timestamp)
โ†’ tแบฅt cแบฃ ฤ‘แปu giรบp agent hiแปƒu mแปฅc ฤ‘รญch vร  mแปฉc ฤ‘แป™ liรชn quan cแปงa thรดng tin.

Khแบฃ nฤƒng tแปฑ khรกm phรก ngแปฏ cแบฃnh (Progressive disclosure). Khi cho phรฉp agent tแปฑ do ฤ‘iแปu hฦฐแป›ng vร  truy xuแบฅt dแปฏ liแป‡u, ta mแปŸ ra khแบฃ nฤƒng โ€œkhรกm phรก ngแปฏ cแบฃnh dแบงn dแบงnโ€ โ€” nghฤฉa lร  agent tแปฑ tรฌm ra ngแปฏ cแบฃnh liรชn quan thรดng qua trแบฃi nghiแป‡m.

Mแป—i hร nh ฤ‘แป™ng tแบกo ra thรชm dแปฏ kiแป‡n cho vรฒng suy luแบญn kแบฟ tiแบฟp:
– Kรญch thฦฐแป›c file โ†’ gแปฃi รฝ ฤ‘แป™ phแปฉc tแบกp,
– Tรชn file โ†’ รกm chแป‰ mแปฅc ฤ‘รญch,
– Thแปi gian cแบญp nhแบญt โ†’ chแป‰ ra mแปฉc ฤ‘แป™ mแป›i vร  liรชn quan.

Agent dแบงn xรขy dแปฑng bแปฉc tranh hiแปƒu biแบฟt tแปซng lแป›p mแป™t, chแป‰ giแปฏ lแบกi thรดng tin cแบงn thiแบฟt trong โ€œbแป™ nhแป› lร m viแป‡cโ€, vร  dรนng chiแบฟn lฦฐแปฃc ghi chรบ (note-taking) ฤ‘แปƒ lฦฐu lแบกi phแบงn cรฒn lแบกi.

Kแบฟt quแบฃ lร : Agent tแบญp trung vร o phแบงn ngแปฏ cแบฃnh liรชn quan nhแบฅt, thay vรฌ bแป‹ โ€œchรฌmโ€ trong lฦฐแปฃng thรดng tin khแป•ng lแป“ vร  nhiแป…u.

Hiแป‡u nฤƒng vs. Tแปฑ chแปง โ€“ Bร i toรกn ฤ‘รกnh ฤ‘แป•i

Tแบฅt nhiรชn, truy xuแบฅt ngแปฏ cแบฃnh lรบc runtime chแบญm hฦกn so vแป›i viแป‡c dรนng dแปฏ liแป‡u ฤ‘รฃ ฤ‘ฦฐแปฃc tรญnh toรกn sแบตn.
Hฦกn nแปฏa, cแบงn kแปน sฦฐ giร u kinh nghiแป‡m ฤ‘แปƒ thiแบฟt kแบฟ cรดng cแปฅ vร  chiแบฟn lฦฐแปฃc ฤ‘iแปu hฦฐแป›ng hแปฃp lรฝ.
Nแบฟu khรดng cรณ ฤ‘แป‹nh hฦฐแป›ng rรต rร ng, agent cรณ thแปƒ:
– Dรนng sai cรดng cแปฅ,
– ฤi vร o ngรต cแปฅt,
– Hoแบทc bแป lแปก thรดng tin quan trแปng.
=> Do ฤ‘รณ, trong nhiแปu tรฌnh huแป‘ng, chiแบฟn lฦฐแปฃc lai (hybrid) lร  tแป‘i ฦฐu:
Mแป™t phแบงn ngแปฏ cแบฃnh ฤ‘ฦฐแปฃc tแบฃi sแบตn ฤ‘แปƒ ฤ‘แบฃm bแบฃo tแป‘c ฤ‘แป™, phแบงn cรฒn lแบกi ฤ‘ฦฐแปฃc agent tแปฑ truy xuแบฅt khi cแบงn.

Vรญ dแปฅ:
๐Ÿ‘‰ Claude Code nแบกp sแบตn cรกc tแป‡p CLAUDE.md vร o ngแปฏ cแบฃnh, nhฦฐng vแบซn cรณ thแปƒ dรนng glob hoแบทc grep ฤ‘แปƒ tแปฑ tรฌm file ฤ‘รบng lรบc, trรกnh lแป—i chแป‰ mแปฅc cลฉ hoแบทc cรขy cรบ phรกp phแปฉc tแบกp. Chiแบฟn lฦฐแปฃc nร y ฤ‘แบทc biแป‡t phรน hแปฃp vแป›i mรดi trฦฐแปng แป•n ฤ‘แป‹nh nhฦฐ phรกp lรฝ hay tร i chรญnh, nฦกi dแปฏ liแป‡u รญt thay ฤ‘แป•i nhฦฐng vแบซn cแบงn ฤ‘แป™ chรญnh xรกc cao.Kแปน thuแบญt context engineering cho tรกc vแปฅ dร i hแบกn

Cรกc tรกc vแปฅ โ€œdร i hฦกiโ€ โ€” nhฦฐ chuyแปƒn ฤ‘แป•i toร n bแป™ codebase hoแบทc dแปฑ รกn nghiรชn cแปฉu dร i hแบกn โ€” ฤ‘รฒi hแปi agent phแบฃi:
– Duy trรฌ tรญnh mแบกch lแบกc vร  mแปฅc tiรชu trong suแป‘t quรก trรฌnh, Lร m viแป‡c qua hร ng ngร n bฦฐแป›c, vฦฐแปฃt xa giแป›i hแบกn context window cแปงa mรด hรฌnh.
– Chแป ฤ‘แปฃi โ€œcontext window lแป›n hฦกnโ€ khรดng phแบฃi lร  lแปi giแบฃi duy nhแบฅt. BแปŸi vรฌ, dรน dร i ฤ‘แบฟn ฤ‘รขu, ngแปฏ cแบฃnh vแบซn cรณ thแปƒ bแป‹ nhiแป…m nhiแป…u (context pollution) hoแบทc chแปฉa thรดng tin lแป—i thแปi.

Anthropic ฤ‘แป xuแบฅt 3 kแปน thuแบญt giรบp agent lร m viแป‡c hiแป‡u quแบฃ hฦกn vแป›i thแปi gian dร i:
– Compaction โ€“ nรฉn vร  tแป•ng hแปฃp thรดng tin cลฉ ฤ‘แปƒ tiแบฟt kiแป‡m context,
– Structured note-taking โ€“ ghi chรบ cรณ cแบฅu trรบc, giรบp agent nhแป› lแบกi logic,
– Multi-agent architectures โ€“ chia tรกc vแปฅ lแป›n thร nh nhiแปu agent nhแป cรนng phแป‘i hแปฃp.

Compaction โ€“ Nรฉn ngแปฏ cแบฃnh thรดng minh

Compaction lร  kแปน thuแบญt tรณm tแบฏt vร  nรฉn lแบกi nแป™i dung khi cuแป™c hแป™i thoแบกi hoแบทc tรกc vแปฅ cแปงa agent bแบฏt ฤ‘แบงu chแบกm ฤ‘แบฟn giแป›i hแบกn context window.
Cแปฅ thแปƒ, thay vรฌ ฤ‘แปƒ mรด hรฌnh phแบฃi โ€œmang vรกcโ€ toร n bแป™ lแป‹ch sแปญ tฦฐฦกng tรกc dร i, ta tแบกo mแป™t bแบฃn tรณm tแบฏt trung thแปฑc (high-fidelity summary), rแป“i khแปŸi tแบกo lแบกi ngแปฏ cแบฃnh mแป›i bแบฑng chรญnh bแบฃn tรณm tแบฏt ฤ‘รณ.

Mแปฅc tiรชu: Giรบp agent duy trรฌ mแบกch logic vร  ฤ‘แป™ chรญnh xรกc lรขu dร i, mร  khรดng bแป‹ giแบฃm hiแป‡u suแบฅt do giแป›i hแบกn token.

Vรญ dแปฅ trong Claude Code, Anthropic thแปฑc hiแป‡n compaction bแบฑng cรกch:
Gแปญi toร n bแป™ lแป‹ch sแปญ tin nhแบฏn cho mรด hรฌnh, dแปƒ mรด hรฌnh tแปฑ tรณm tแบฏt vร  nรฉn lแบกi thรดng tin quan trแปng nhแบฅt.bแบฃn tรณm tแบฏt nร y thฦฐแปng giแปฏ lแบกi:
– Cรกc quyแบฟt ฤ‘แป‹nh kiแบฟn trรบc,
– Lแป—i chฦฐa xแปญ lรฝ,
– Chi tiแบฟt triแปƒn khai quan trแปng, vร  loแบกi bแป nhแปฏng phแบงn dฦฐ thแปซa nhฦฐ kแบฟt quแบฃ cแปงa cรกc lแป‡nh cรดng cแปฅ (tool outputs).
=> Sau ฤ‘รณ, agent tiแบฟp tแปฅc lร m viแป‡c vแป›i ngแปฏ cแบฃnh ฤ‘รฃ nรฉn cแป™ng thรชm 5 file ฤ‘ฦฐแปฃc truy cแบญp gแบงn nhแบฅt โ€” giรบp ngฦฐแปi dรนng cรณ cแบฃm giรกc liแปn mแบกch, khรดng lo ngแบกi vแป giแป›i hแบกn context.

ฤiแปƒm tinh tแบฟ trong compaction:

Chรญnh lร  chแปn cรกi gรฌ giแปฏ, cรกi gรฌ bแป.Nแบฟu nรฉn quรก tay, agent cรณ thแปƒ mแบฅt nhแปฏng chi tiแบฟt nhแป nhฦฐng quan trแปng vแป sau. Anthropic khuyรชn kแปน sฦฐ nรชn:
Tแป‘i ฤ‘a hรณa recall trong giai ฤ‘oแบกn ฤ‘แบงu (ฤ‘แบฃm bแบฃo mแปi thรดng tin quan trแปng ฤ‘แปu ฤ‘ฦฐแปฃc giแปฏ lแบกi),
Sau ฤ‘รณ tแป‘i ฦฐu precision, loแบกi bแป phแบงn dฦฐ thแปซa ฤ‘แปƒ tinh gแปn hฦกn.

Vรญ dแปฅ dแป… hiแปƒu: Kแบฟt quแบฃ cแปงa mแป™t tool ฤ‘รฃ ฤ‘ฦฐแปฃc gแปi nhiแปu bฦฐแป›c trฦฐแป›c hแบงu nhฦฐ khรดng cแบงn giแปฏ lแบกi.
Anthropic thแบญm chรญ ฤ‘รฃ thรชm โ€œtool result clearingโ€ โ€“ mแป™t dแบกng compaction nhแบน vร  an toร n โ€“ vร o Claude Developer Platform.

Structured Note-Taking โ€“ Ghi chรบ cรณ cแบฅu trรบc (Bแป™ nhแป› agentic)

Structured note-taking, hay cรฒn gแปi lร  bแป™ nhแป› agentic, lร  kแปน thuแบญt mร  agent thฦฐแปng xuyรชn ghi chรบ cรกc thรดng tin quan trแปng ra ngoร i context window.
Nhแปฏng ghi chรบ nร y sแบฝ ฤ‘ฦฐแปฃc gแปi lแบกi vร o ngแปฏ cแบฃnh khi cแบงn thiแบฟt trong cรกc bฦฐแป›c sau.

Mแปฅc tiรชu: Cung cแบฅp cho agent mแป™t dแบกng โ€œbแป™ nhแป› dร i hแบกnโ€ mร  khรดng tแป‘n nhiแปu token.
Vรญ dแปฅ: Claude Code cรณ thแปƒ tแบกo file TODO.md hoแบทc NOTES.md ฤ‘แปƒ lฦฐu danh sรกch viแป‡c cแบงn lร m. Cรกc agent tรนy chแป‰nh cรณ thแปƒ ghi chรบ tiแบฟn ฤ‘แป™, trแบกng thรกi, hoแบทc cรกc dependency quan trแปng giแปฏa cรกc bฦฐแป›c phแปฉc tแบกp. Anthropic minh hแปa bแบฑng vรญ dแปฅ thรบ vแป‹: Claude chฦกi Pokรฉmon ๐ŸŽฎ โ€” Agent nร y ghi nhแป› chรญnh xรกc hร ng ngร n bฦฐแป›c chฦกi:
โ€œTrong 1.234 bฦฐแป›c qua, tรดi ฤ‘รฃ luyแป‡n Pikachu แปŸ Route 1, tฤƒng 8 cแบฅp, cรฒn 2 cแบฅp nแปฏa ฤ‘แบกt mแปฅc tiรชu.โ€
Khรดng cแบงn hฦฐแป›ng dแบซn thรชm, Claude tแปฑ phรกt triแปƒn bแบฃn ฤ‘แป“, nhแป› vรนng ฤ‘รฃ khรกm phรก, lฦฐu chiแบฟn lฦฐแปฃc ฤ‘รกnh boss hiแป‡u quแบฃ nhแบฅt, vร  tiแบฟp tแปฅc tแปซ chแป— dแปซng trฦฐแป›c ฤ‘รณ sau khi context ฤ‘ฦฐแปฃc reset.

Kแบฟt quแบฃ: Claude duy trรฌ sแปฑ mแบกch lแบกc xuyรชn suแป‘t hร ng giแป hoแบกt ฤ‘แป™ng, Thแปฑc hiแป‡n ฤ‘ฦฐแปฃc chiแบฟn lฦฐแปฃc dร i hแบกn mร  khรดng cแบงn giแปฏ mแปi thรดng tin trong context window.
Anthropic ฤ‘รฃ ra mแบฏt โ€œMemory Toolโ€ (bแบฃn beta) trong Claude Developer Platform, cho phรฉp agent lฦฐu trแปฏ vร  truy xuแบฅt ghi chรบ tแปซ hแป‡ thแป‘ng file โ€”tแปฉc lร  agent cรณ thแปƒ: Xรขy dแปฑng knowledge base cรก nhรขn, Giแปฏ trแบกng thรกi dแปฑ รกn giแปฏa cรกc phiรชn, Vร  truy cแบญp lแบกi cรดng viแป‡c cลฉ mร  khรดng cแบงn giแปฏ toร n bแป™ trong context hiแป‡n tแบกi.

Sub-Agent Architectures โ€“ Kiแบฟn trรบc ฤ‘a agent chuyรชn biแป‡t

Sub-agent architecture lร  chiแบฟn lฦฐแปฃc phรขn tรกn cรดng viแป‡c giแปฏa nhiแปu agent nhแป, mแป—i agent ฤ‘แบฃm nhแบญn mแป™t nhiแป‡m vแปฅ cแปฅ thแปƒ trong ngแปฏ cแบฃnh riรชng biแป‡t (clean context window).Thay vรฌ ฤ‘แปƒ mแป™t agent phแบฃi โ€œgรกnhโ€ toร n bแป™ dแปฑ รกn, Anthropic chia nhแป thร nh: Agent chรญnh (lead agent): ฤ‘แป‹nh hฦฐแป›ng tแป•ng thแปƒ, ra kแบฟ hoแบกch.Cรกc sub-agent: thแปฑc hiแป‡n cรกc phแบงn viแป‡c kแปน thuแบญt sรขu, hoแบทc dรนng tool ฤ‘แปƒ tรฌm thรดng tin liรชn quan.Mแป—i sub-agent cรณ thแปƒ โ€œlร m viแป‡cโ€ rแบฅt sรขu (vร i chแปฅc nghรฌn token), nhฦฐng chแป‰ trแบฃ lแบกi bแบฃn tรณm tแบฏt sรบc tรญch 1.000โ€“2.000 token cho agent chรญnh.
ฦฏu ฤ‘iแปƒm:
– Tรกch biแป‡t rรต rร ng giแปฏa โ€œngแปฏ cแบฃnh chi tiแบฟtโ€ vร  โ€œngแปฏ cแบฃnh tแป•ng hแปฃpโ€,
– Giรบp agent chรญnh tแบญp trung vร o phรขn tรญch, tแป•ng hแปฃp vร  ra quyแบฟt ฤ‘แป‹nh.
Anthropic cho biแบฟt mรด hรฌnh nร y ฤ‘รฃ tฤƒng hiแป‡u suแบฅt ฤ‘รกng kแปƒ trong cรกc tรกc vแปฅ nghiรชn cแปฉu phแปฉc tแบกp
(vรญ dแปฅ: hแป‡ thแป‘ng nghiรชn cแปฉu ฤ‘a agent trong bร i How We Built Our Multi-Agent Research System).

Kแบฟt luแบญn

Kแปน thuแบญt ngแปฏ cแบฃnh (context engineering) ฤ‘แบกi diแป‡n cho mแป™t bฦฐแป›c chuyแปƒn mรฌnh cฤƒn bแบฃn trong cรกch chรบng ta xรขy dแปฑng cรกc แปฉng dแปฅng dแปฑa trรชn mรด hรฌnh ngรดn ngแปฏ lแป›n (LLM). Khi cรกc mรด hรฌnh ngร y cร ng trแปŸ nรชn mแบกnh mแบฝ hฦกn, thรกch thแปฉc khรดng chแป‰ nแบฑm แปŸ viแป‡c tแบกo ra mแป™t prompt hoร n hแบฃo โ€” mร  lร  viแป‡c lแปฑa chแปn cรณ chแปง ฤ‘รญch nhแปฏng thรดng tin nร o sแบฝ ฤ‘ฦฐแปฃc ฤ‘ฦฐa vร o trong โ€œngรขn sรกch chรบ รฝโ€ (attention budget) giแป›i hแบกn cแปงa mรด hรฌnh tแบกi mแป—i bฦฐแป›c.
Dรน bแบกn ฤ‘ang triแปƒn khai compaction cho cรกc tรกc vแปฅ dร i hแบกn, thiแบฟt kแบฟ cรกc cรดng cแปฅ tiแบฟt kiแป‡m token, hay giรบp cรกc tรกc nhรขn (agent) khรกm phรก mรดi trฦฐแปng cแปงa mรฌnh mแป™t cรกch vแปซa ฤ‘รบng lรบc (just-in-time), thรฌ nguyรชn tแบฏc cแป‘t lรตi vแบซn khรดng ฤ‘แป•i:
Tรฌm ra tแบญp hแปฃp nhแป nhแบฅt cรกc token cรณ giรก trแป‹ thรดng tin cao nhแบฅt ฤ‘แปƒ tแป‘i ฤ‘a hรณa khแบฃ nฤƒng ฤ‘แบกt ฤ‘ฦฐแปฃc kแบฟt quแบฃ mong muแป‘n.

Nhแปฏng kแปน thuแบญt ฤ‘ฦฐแปฃc trรฌnh bร y แปŸ ฤ‘รขy sแบฝ cรฒn tiแบฟp tแปฅc phรกt triแปƒn cรนng vแป›i sแปฑ tiแบฟn bแป™ cแปงa cรกc mรด hรฌnh. Chรบng ta ฤ‘รฃ bแบฏt ฤ‘แบงu thแบฅy rแบฑng cรกc mรด hรฌnh thรดng minh hฦกn sแบฝ cแบงn รญt kแปน thuแบญt โ€œรฉp buแป™cโ€ hฦกn, cho phรฉp cรกc tรกc nhรขn hoแบกt ฤ‘แป™ng tแปฑ chแปง hฦกn. Tuy nhiรชn, ngay cแบฃ khi nฤƒng lแปฑc cแปงa mรด hรฌnh tiแบฟp tแปฅc mแปŸ rแป™ng, viแป‡c xem ngแปฏ cแบฃnh nhฦฐ mแป™t nguแป“n tร i nguyรชn quรฝ giรก vร  hแปฏu hแบกn vแบซn sแบฝ lร  yแบฟu tแป‘ trung tรขm ฤ‘แปƒ xรขy dแปฑng cรกc tรกc nhรขn ฤ‘รกng tin cแบญy vร  hiแป‡u quแบฃ.
Hรฃy bแบฏt ฤ‘แบงu khรกm phรก kแปน thuแบญt context engineering trรชn nแปn tแบฃng Claude Developer Platform ngay hรดm nay, vร  tham khแบฃo thรชm โ€œMemory and Context Management Cookbookโ€ ฤ‘แปƒ tรฌm hiแปƒu nhแปฏng mแบนo vร  phฦฐฦกng phรกp thแปฑc hร nh tแป‘t nhแบฅt.

๐Ÿง  Codex CLI vs Claude Code vs Gemini CLI

1) Codex CLI โ€” Tรณm tแบฏt khแบฃ nฤƒng & cรกc nรขng cแบฅp chรญnh

Codex CLI lร  agent chแบกy ngay trong terminal, ฤ‘รณng vai
trรฒ โ€œpair programmerโ€ biแบฟt lแบญp kแบฟ hoแบกch, dรนng cรดng cแปฅ vร  tแปฑ kiแปƒm tra ฤ‘แบงu
ra theo tแปซng bฦฐแป›c. Bแบฃn nรขng cแบฅp 2025 tแบญp trung vร o khแบฃ nฤƒng cแป™ng tรกc
thแปi gian thแปฑc, theo dรตi tiแบฟn ฤ‘แป™, vร  kiแปƒm soรกt quyแปn truy cแบญp an toร n โ€”
giรบp bแบกn chuyแปƒn tแปซ cรกc yรชu cแบงu nhแป tฦฐฦกng tรกc nhanh ฤ‘แบฟn nhiแป‡m vแปฅ dร i hฦกi
(refactor, thรชm tรญnh nฤƒng, viแบฟt test) mร  khรดng rแปi mรดi trฦฐแปng lร m viแป‡c.

Khแบฃ nฤƒng cแป‘t lรตi

  • Agentic coding trong terminal: ra lแป‡nh, nhแบญn kแบฟ
    hoแบกch, xem log/diff, vร  รกp dแปฅng thay ฤ‘แป•i trแปฑc tiแบฟp แปŸ thฦฐ mแปฅc lร m viแป‡c;
    phรน hแปฃp cแบฃ phiรชn ngแบฏn (promptโ€“sแปญaโ€“chแบกy) lแบซn nhiแป‡m vแปฅ nhiแปu bฦฐแป›c.
  • Hiแปƒu vร  ฤ‘iแปu hฦฐแป›ng codebase: ฤ‘แปc tแบญp tin liรชn quan,
    ฤ‘แป xuแบฅt chแป‰nh sแปญa/viแบฟt mแป›i, chแบกy lแป‡nh build/test ฤ‘แปƒ xรกc thแปฑc; cรณ thแปƒ
    duy trรฌ ngแปฏ cแบฃnh dร i hฦกn nhแป cฦก chแบฟ nรฉn hแป™i thoแบกi.
  • Tแบญn dแปฅng mรด hรฌnh tแป‘i ฦฐu cho coding: hแป— trแปฃ dรนng
    GPT-5-Codex cho tรกc vแปฅ cแปฅc bแป™ trong CLI (tรนy chแปn), cho chแบฅt lฦฐแปฃng mรฃ
    vร  khแบฃ nฤƒng ฤ‘iแปu khiแปƒn tแป‘t hฦกn.
  • Tรญch hแปฃp an toร n theo quyแปn: lร m viแป‡c แปŸ cรกc mแปฉc cแบฅp
    quyแปn khรกc nhau (chแป‰ ฤ‘แปc/duyแป‡t thแปง cรดng, tแปฑ ฤ‘แป™ng trong workspace, hoแบทc
    toร n quyแปn cรณ mแบกng) ฤ‘แปƒ cรขn bแบฑng tแป‘c ฤ‘แป™ vร  kiแปƒm soรกt rแปงi ro.

Cรกc nรขng cแบฅp nแป•i bแบญt (2025)

  • ฤรญnh kรจm & chia sแบป hรฌnh แบฃnh ngay trong CLI: gแปญi
    screenshot/wireframe/diagram ฤ‘แปƒ tแบกo ngแปฏ cแบฃnh UI chung, tแปซ ฤ‘รณ agent bรกm
    sรกt รฝ ฤ‘แป“ thiแบฟt kแบฟ hฦกn.
  • Theo dรตi tiแบฟn ฤ‘แป™ bแบฑng to-do list: CLI hiแปƒn thแป‹ cรกc
    bฦฐแป›c viแป‡c, trแบกng thรกi hoร n thร nh, vร  cho phรฉp tiแบฟp tแปฅc/ฤ‘iแปu chแป‰nh khi
    tรกc vแปฅ phแปฉc tแบกp.
  • Cรดng cแปฅ tรญch hแปฃp tแป‘t hฦกn: thรชm web search vร 
    MCP (Model Context Protocol) ฤ‘แปƒ kแบฟt nแป‘i hแป‡ thแป‘ng bรชn ngoร i vแป›i ฤ‘แป™
    chรญnh xรกc sแปญ dแปฅng cรดng cแปฅ cao hฦกn.
  • Terminal UI mแป›i: hiแปƒn thแป‹ lแป‡nh cรดng cแปฅ vร 
    diff rรต rร ng, dแป… theo dรตi; giรบp bแบกn duyแป‡t vร  chแบฅp thuแบญn thay
    ฤ‘แป•i nhanh.
  • Ba chแบฟ ฤ‘แป™ phรช duyแป‡t ฤ‘ฦกn giแบฃn: Read-only (duyแป‡t thแปง
    cรดng), Auto (toร n quyแปn trong workspace, cแบงn duyแป‡t khi ra ngoร i), Full
    access (ฤ‘แปc file bแบฅt kแปณ & chแบกy lแป‡nh cรณ mแบกng); kรจm cฦก chแบฟ nรฉn hแป™i thoแบกi
    ฤ‘แปƒ giแปฏ phiรชn lร m viแป‡c dร i.
  • Khแบฃ dแปฅng & cร i ฤ‘แบทt nhanh: gรณi CLI phรกt hร nh dแบกng
    open-source; cร i qua npm vร  dรนng chung tร i khoแบฃn
    ChatGPT/Codex ฤ‘แปƒ ฤ‘แป“ng bแป™ trแบฃi nghiแป‡m giแปฏa mรกy cแปฅc bแป™, IDE vร  cloud.

ร nghฤฉa thแปฑc tiแป…n

  • Cho phiรชn ngแบฏn: phแบฃn hแป“i nhanh, sinh/ghi mรฃ, xem diff
    vร  hแปฃp nhแบฅt tแปซng phแบงn mแป™t โ€” rแบฅt hแปฃp xรขy dแปฑng nguyรชn mแบซu, sแปญa lแป—i, viแบฟt
    test.
  • Cho nhiแป‡m vแปฅ dร i hฦกi: theo dรตi to-do, dรนng cรดng cแปฅ
    ฤ‘รบng lรบc (search/MCP), duy trรฌ ngแปฏ cแบฃnh nhiแปu giแป; giแบฃm tแบฃi viแป‡c lแบทp
    thแปง cรดng vร  rแปงi ro โ€œlแบกc ngแปฏ cแบฃnhโ€.
  • Cho ฤ‘แป™i ngลฉ coi trแปng an toร n: mแบทc ฤ‘แป‹nh sandbox vรด
    hiแป‡u mแบกng; mแปi thao tรกc โ€œnhแบกy cแบฃmโ€ ฤ‘แปu cรณ cฦก chแบฟ xin phรฉp, log minh
    bแบกch, vร  cรณ thแปƒ giแป›i hแบกn miแปn mแบกng tin cแบญy khi cแบงn.

2) Gemini CLI โ€” kแบฟt nแป‘i & ngแปฏ cแบฃnh dร i

Gemini CLI ฤ‘ฦฐa mรด hรฌnh Gemini vร o terminal vแป›i thแบฟ mแบกnh nแป•i bแบญt lร 
khแบฃ nฤƒng gom ngแปฏ cแบฃnh lแป›n vร 
khแบฃ nฤƒng โ€œkรฉo tri thแปฉc ngoร iโ€ (web/search, MCP) khi cแบงn. Cรกch
lร m viแป‡c phรน hแปฃp lร  vแปซa viแบฟt mรฃ vแปซa tแป•ng hแปฃp tร i liแป‡u, quy chuแบฉn, vรญ dแปฅ
vร  snippet tแปซ nhiแปu nguแป“n ngay trong mแป™t phiรชn.

Khแบฃ nฤƒng & trแบฃi nghiแป‡m chรญnh

  • Tแป•ng hแปฃp ฤ‘a nguแป“n: ฤ‘แปc nhiแปu tแป‡p
    README/changelog/guide cรนng lรบc, rรบt รฝ vร  hแปฃp nhแบฅt thร nh checklist
    hoแบทc mรฃ khแปŸi tแบกo.
  • Grounding khi thiแบฟu ngแปฏ cแบฃnh: cรณ thแปƒ tra cแปฉu rแป“i
    โ€œฤ‘iแปn chแป— trแป‘ngโ€ (thฦฐ viแป‡n, API mแบซu, quy ฦฐแป›c thiแบฟt kแบฟ) ฤ‘แปƒ tiแบฟp tแปฅc
    triแปƒn khai.
  • Tรญch hแปฃp cรดng cแปฅ qua MCP/tiแป‡n รญch: mแปŸ rแป™ng tรกc vแปฅ tแปซ
    terminal (chแบกy lแป‡nh, xแปญ lรฝ tแป‡p, thao tรกc hแป‡ thแป‘ng) trong cรนng mแป™t
    luแป“ng hแป™i thoแบกi.
  • Thรญch hแปฃp giai ฤ‘oแบกn khแปŸi tแบกo: bootstrap dแปฑ รกn, dแปฑng
    khung cแบฅu trรบc, tแบกo script cร i ฤ‘แบทt & cแบฅu hรฌnh linter/test nhanh.

ฤiแปƒm mแบกnh

  • Gom vร  โ€œtiรชu hoรกโ€ tร i liแป‡u rแบฅt tแป‘t, hแปฏu รญch khi yรชu cแบงu dรญnh nhiแปu quy
    chuแบฉn/tiรชu chรญ.
  • Tiแป‡n รญch terminal ฤ‘a dแบกng; cรณ thแปƒ chuyแปƒn tแปซ thแบฃo luแบญn sang thแปฑc thi
    lแป‡nh liแปn mแบกch.
  • Phรน hแปฃp cรกc bร i toรกn phแบฃi vแปซa tra cแปฉu vแปซa phรกt triแปƒn (setup,
    tรญch hแปฃp nhiแปu dแป‹ch vแปฅ, tแบกo sample end-to-end).

ฤiแปƒm cแบงn lฦฐu รฝ

  • ฤแบงu ra dแป… dร i; nรชn yรชu cแบงu rรบt gแปn hoแบทc
    chแป‰ ghi thay ฤ‘แป•i tแป‘i thiแปƒu ฤ‘แปƒ trรกnh mรฃ/cแบฅu hรฌnh thแปซa.
  • แปž bร i toรกn nhiแปu rร ng buแป™c (vรญ dแปฅ: vแบญt lรฝ/va chแบกm trong game), logic
    ฤ‘รดi khi thiแบฟu แป•n ฤ‘แป‹nh โ€” nรชn kรจm test nhแป ฤ‘แปƒ โ€œneoโ€ hร nh vi mong muแป‘n.
  • Prompt cร ng dร i cร ng dแป… tฤƒng ฤ‘แป™ trแป…; chia nhแป mแปฅc tiรชu giรบp cแบฃi thiแป‡n
    tแป‘c ฤ‘แป™ vร  ฤ‘แป™ chรญnh xรกc.

Khi nร o nรชn dรนng / khรดng nรชn dรนng

  • Nรชn dรนng: khแปŸi tแบกo dแปฑ รกn, hแปฃp nhแบฅt guideline, tแบกo
    khung CI/CD, viแบฟt script cร i ฤ‘แบทt; tรญch hแปฃp SDK/API mแป›i cรณ nhiแปu tร i
    liแป‡u rแบฃi rรกc.
  • Khรดng lรฝ tฦฐแปŸng: tรกc vแปฅ yรชu cแบงu logic thแปi gian thแปฑc
    nhแบกy cแบฃm (gameplay/physics), hoแบทc tแป‘i ฦฐu UI/animation vi mรด cแบงn tinh
    chแป‰nh thแปง cรดng.

3) Claude Code โ€” ฤ‘แป™ sรขu & tรกi cแบฅu trรบc

Claude Code thiรชn vแป hiแปƒu dแปฑ รกn vร 
giแปฏ tรญnh nhแบฅt quรกn trรชn codebase lแป›n. Cรดng cแปฅ nร y lร m tแป‘t cรกc
viแป‡c nhฦฐ ฤ‘iแปu hฦฐแป›ng toร n repo, chuแบฉn hoรก kiแบฟn trรบc, viแบฟt module theo
convention, chแบกy test vร  thแบญm chรญ ฤ‘แป xuแบฅt PR hoร n chแป‰nh vแป›i mรด tแบฃ rรต
rร ng.

Khแบฃ nฤƒng & trแบฃi nghiแป‡m chรญnh

  • Refactor quy mรด lแป›n: phรกt hiแป‡n trรนng lแบทp, tรกch
    mรด-ฤ‘un, chuแบฉn hoรก naming/foldering, giแบฃi thรญch tรกc ฤ‘แป™ng kiแบฟn trรบc.
  • Review cรณ lรฝ do: output thฦฐแปng kรจm chรบ thรญch โ€œvรฌ saoโ€
    vร  โ€œcรกch kiแปƒm chแปฉngโ€, thuแบญn tiแป‡n cho code review theo nhรณm.
  • Giแปฏ trแบกng thรกi & luแป“ng lร m viแป‡c: cรณ thแปƒ theo dรตi ฤ‘แป
    xuแบฅt qua nhiแปu bฦฐแป›c (quรฉt, ฤ‘แป•i tรชn, cแบญp nhแบญt test, cแบญp nhแบญt tร i liแป‡u).
  • UI/animation cรณ tแป• chแปฉc: แปŸ bร i front-end ฤ‘รฒi hแปi
    chuyแปƒn cแบฃnh hoแบทc nhiแปu trแบกng thรกi, cรกch tแป• chแปฉc logic thฦฐแปng gแปn gร ng,
    รญt โ€œgiแบญt cแปฅcโ€.

ฤiแปƒm mแบกnh

  • Rแบฅt phรน hแปฃp vแป›i kแบฟ hoแบกch tรกi cแบฅu trรบc/chuแบฉn hoรก ฤ‘a mรด-ฤ‘un
    hoแบทc khi cแบงn cแปงng cแป‘ ranh giแป›i giแปฏa cรกc layer.
  • ฤแบงu ra dแป… ฤ‘แปc, cรณ chรบ thรญch; thuแบญn lแปฃi cho duy trรฌ lรขu dร i vร 
    onboarding thร nh viรชn mแป›i.
  • Hแป— trแปฃ quy trรฌnh nhรณm: cรณ thแปƒ ฤ‘แป xuแบฅt commit/PR vแป›i mรด tแบฃ chi tiแบฟt,
    checklist kiแปƒm thแปญ vร  hฦฐแป›ng dแบซn rollout.

ฤiแปƒm cแบงn lฦฐu รฝ

  • Tแป‘c ฤ‘แป™ khรดng phแบฃi thแบฟ mแบกnh; cแบงn cรขn nhแบฏc khi deadline gแบฅp hoแบทc chแป‰ sแปญa
    1โ€“2 file nhแป.
  • ฤแปƒ ฤ‘แบกt โ€œฤ‘รบng guโ€ kiแบฟn trรบc, nรชn mรด tแบฃ convention (naming, foldering,
    state, test strategy) ngay tแปซ ฤ‘แบงu.
  • Vแป›i viแป‡c rแบฅt nhแป, chi phรญ thแปi gian cรณ thแปƒ lแป›n hฦกn lแปฃi รญch so vแป›i cรกc
    cรดng cแปฅ hฦฐแป›ng tแป‘c ฤ‘แป™.

Khi nร o nรชn dรนng / khรดng nรชn dรนng

  • Nรชn dรนng: refactor lแป›n, nรขng cแบฅp framework, tรกch
    mรด-ฤ‘un, chuแบฉn hoรก API, dแปn nแปฃ kแปน thuแบญt, viแบฟt/hoร n thiแป‡n test.
  • Khรดng lรฝ tฦฐแปŸng: thแปญ nghiแป‡m nhanh/POC siรชu nhแป, tinh
    chแป‰nh UI/copywriting vi mรด cแบงn phแบฃn hแป“i tแปฉc thรฌ.

4) Bแบฃng so sรกnh chรญnh

Tiรชu chรญ Codex CLI Gemini CLI Claude Code
Model nแปn OpenAI Codex (tแป‘i ฦฐu coding) Gemini 2.5 Pro Claude Sonnet 4
Context window ~128K tokens ~1M tokens ~200K tokens (xแบฅp xแป‰)
Truy cแบญp FS & Shell Cรณ Cรณ Cรณ
Tรญnh nฤƒng khรกc biแป‡t Tแป‘c ฤ‘แป™ phแบฃn hแป“i nhanh, vรฒng lแบทp ngแบฏn Kรฉo tri thแปฉc ngoร i, ngแปฏ cแบฃnh dร i Quรฉt codebase, gแปฃi รฝ PR, chuแบฉn hoรก
Phรน hแปฃp nhแบฅt cho Prototype, sแปญa lแป—i, tรกc vแปฅ cแปฅc bแป™ Quy trรฌnh โ€œviแบฟt mรฃ + tra cแปฉuโ€ Dแปฑ รกn nhiแปu mรด-ฤ‘un, refactor/maintain
Tแป‘c ฤ‘แป™/ฤ‘แป™ trแป… Nhanh nhแบฅt Trung bรฌnh Chแบญm hฦกn
UI/Animation Thiรชn chแปฉc nฤƒng Khรก tแป‘t, phแปฅ thuแป™c prompt Mฦฐแปฃt & cรณ tแป• chแปฉc
Xแปญ lรฝ lแป—i Cแบงn can thiแป‡p tay แปŸ logic phแปฉc tแบกp แป”n nแบฟu prompt rรต Phรกt hiแป‡n & sแปญa tแป‘t, kรจm giแบฃi thรญch

5) Demo 2 tรกc vแปฅ cแปฅ thแปƒ

Task 1 โ€” Platformer 2D phong cรกch Super Mario

Prompt: โ€œTแบกo mแป™t trรฒ chฦกi platformer 2D cฦก bแบฃn theo phong cรกch Super
Mario. Trรฒ chฦกi nรชn cรณ bแป‘ cแปฅc ฤ‘ฦกn giแบฃn dแปฑa trรชn cรกc รด vuรดng vแป›i Mario
ฤ‘แปฉng trรชn cรกc khแป‘i ฤ‘แบฅt, nแปn trแปi vแป›i nhแปฏng ฤ‘รกm mรขy, khแป‘i hรฌnh dแบฅu hแปi
phรญa trรชn vร  mแป™t ฤ‘ฦฐแปng แป‘ng mร u xanh lรก cรขy gแบงn ฤ‘รณ. Bao gแป“m cรกc cฦก chแบฟ cฦก
bแบฃn nhฦฐ di chuyแปƒn trรกi/phแบฃi vร  nhแบฃy bแบฑng cรกc phรญm mลฉi tรชn trรชn bร n phรญm.
Mรด phแปng trแปng lแปฑc vร  va chแบกm vแป›i cรกc nแปn tแบฃng. Sแปญ dแปฅng ฤ‘แป“ hแปa theo
phong cรกch pixel-art vแป›i cรกc tร i nguyรชn cแปฅc bแป™ ฤ‘ฦฐแปฃc nhรบng hoแบทc tham
chiแบฟu.โ€

Codex CLI

Gemini CLI

Claude Code

Task 2 โ€” ฤแป“ng hแป“ ฤ‘แป™ng theo chแปง ฤ‘แป thแปi tiแบฟt

Prompt: โ€œThiแบฟt kแบฟ vร  phรกt triแปƒn mแป™t bแบฃng ฤ‘iแปu khiแปƒn ฤ‘แป“ng hแป“ ฤ‘แป™ng theo
chแปง ฤ‘แป thแปi tiแบฟt vแป›i giao diแป‡n trแปฑc quan phong phรบ chแป‰ bแบฑng HTML, CSS vร 
JavaScript. Mแปฅc tiรชu chรญnh lร  tแบกo ra mแป™t giao diแป‡n ฤ‘แป“ng hแป“ thแปi gian
thแปฑc, khรดng chแป‰ hiแปƒn thแป‹ thแปi gian hiแป‡n tแบกi mร  cรฒn tแปฑ ฤ‘แป™ng ฤ‘iแปu chแป‰nh
theo thแปi gian trong ngร y. Triแปƒn khai bแป‘n hiแป‡u แปฉng chuyแปƒn tiแบฟp nแปn ฤ‘แป™ng
thแปƒ hiแป‡n bรฌnh minh, trฦฐa, hoร ng hรดn vร  ฤ‘รชm, mแป—i hiแป‡u แปฉng cรณ mร u sแบฏc vร 
cรกc yแบฟu tแป‘ ฤ‘แป™ng riรชng biแป‡t nhฦฐ mรขy trรดi, sao lแบฅp lรกnh, hoแบทc mแบทt trแปi/mแบทt
trฤƒng mแปc/lแบทn, vร  cung cแบฅp tรนy chแปn chuyแปƒn ฤ‘แป•i giแปฏa ฤ‘แป‹nh dแบกng thแปi gian
12 giแป vร  24 giแป. ฤแปƒ tฤƒng thรชm tรญnh tฦฐฦกng tรกc, hรฃy thรชm mแป™t phแบงn hiแปƒn
thแป‹ cรขu trรญch dแบซn ฤ‘แป™ng lแปฑc hoแบทc nฤƒng suแบฅt theo tแปซng giแป.โ€

Codex CLI

Gemini CLI

Claude Code

6) ฦฏu & Nhฦฐแปฃc ฤ‘iแปƒm thแปฑc tแบฟ

6.1 Codex CLI

ฦฏu ฤ‘iแปƒm

  • Tแป‘c ฤ‘แป™ phแบฃn hแป“i rแบฅt nhanh; phรน hแปฃp vรฒng lแบทp โ€œchia nhแป โ€” chแบกy thแปญ โ€” sแปญa
    โ€” lแบทpโ€.
  • Trแบฃi nghiแป‡m terminal gแปn gร ng: xem diff โ†’ รกp dแปฅng, chแบกy test/format
    ngay trong CLI.
  • แป”n ฤ‘แป‹nh แปŸ tรกc vแปฅ nhแป/vแปซa; giแปฏ mแบกch cรดng viแป‡c tแป‘t khi bแบกn dแบซn dแบฏt bแบฑng
    checklist/to-do.

Nhฦฐแปฃc ฤ‘iแปƒm

  • UI/animation phแปฉc tแบกp (parallax, canvas, webGL) thฦฐแปng cแบงn chแป‰nh tay
    thรชm; thiรชn vแป chแปฉc nฤƒng.
  • Logic nhiแปu tแบงng, ฤ‘a mรด-ฤ‘un: ฤ‘รดi lรบc bแป sรณt rร ng buแป™c; cแบงn test bao
    phแปง ฤ‘แปƒ duy trรฌ chแบฅt lฦฐแปฃng.
  • Tร i liแป‡u hoรก sinh tแปฑ ฤ‘แป™ng thฦฐแปng ngแบฏn; cแบงn yรชu cแบงu bแป• sung โ€œwhy/howโ€.

6.2 Gemini CLI

ฦฏu ฤ‘iแปƒm

  • Ngแปฏ cแบฃnh rแบฅt lแป›n: ฤ‘แปc nhiแปu tแป‡p/README/changelog cรนng lรบc, tแป•ng hแปฃp
    nguแป“n nhanh.
  • Kรฉo tri thแปฉc ngoร i (web/search) khi thiแบฟu snippet/tiรชu chuแบฉn, rแป“i hแปฃp
    nhแบฅt vร o triแปƒn khai.
  • Hแปฏu รญch khi khแปŸi tแบกo dแปฑ รกn mแป›i cแบงn nhiแปu guideline & tร i liแป‡u tham
    chiแบฟu.

Nhฦฐแปฃc ฤ‘iแปƒm

  • ฤแบงu ra thฦฐแปng dร i; cแบงn rรบt gแปn ฤ‘แปƒ trรกnh code/CSS dฦฐ hoแบทc cแบฅu trรบc rฦฐแปm
    rร .
  • Logic chฦฐa แป•n ฤ‘แป‹nh แปŸ bร i toรกn nhiแปu rร ng buแป™c (vรญ dแปฅ game vแป›i va
    chแบกm/trแปng lแปฑc).
  • ฤแป™ trแป… trung bรฌnh; prompt cร ng dร i cร ng tแป‘n thแปi gian suy nghฤฉ.

6.3 Claude Code

ฦฏu ฤ‘iแปƒm

  • Hiแปƒu dแปฑ รกn tแป‘t, nแป•i bแบญt แปŸ refactor, gom code trรนng, ฤ‘แบทt tรชn cรณ chแปง
    ฤ‘รญch, output cรณ chรบ thรญch.
  • UI/animation mฦฐแปฃt, trแบกng thรกi rรต; phรน hแปฃp demo front-end ฤ‘รฒi hแปi
    chuyแปƒn cแบฃnh tinh tแบฟ.
  • Phรน hแปฃp quy trรฌnh nhรณm: cรณ thแปƒ sinh commit/PR cรณ mรด tแบฃ, tร i liแป‡u hoรก
    bร i bแบฃn.

Nhฦฐแปฃc ฤ‘iแปƒm

  • Tแป‘c ฤ‘แป™ chแบญm hฦกn; khรดng phรน hแปฃp khi cแบงn xแปญ lรฝ โ€œsiรชu nhanhโ€.
  • Phแปฅ thuแป™c prompt chi tiแบฟt ฤ‘แปƒ ฤ‘แบกt kiแบฟn trรบc โ€œฤ‘รบng guโ€.
  • Vแป›i tรกc vแปฅ rแบฅt nhแป (1โ€“2 file), chi phรญ thแปi gian ฤ‘รดi khi lแป›n hฦกn lแปฃi
    รญch so vแป›i Codex.

7) Chแปn cรดng cแปฅ nร o theo nhu cแบงu

Muแป‘n tแป‘c ฤ‘แป™ & vรฒng lแบทp ngแบฏn

Chแปn Codex. Giao tรกc vแปฅ nhแป-vแปซa, kiแปƒm diff theo
bฦฐแป›c; tแบญn dแปฅng test/format tแปฑ ฤ‘แป™ng ฤ‘แปƒ โ€œkhoanh vรนng lแป—iโ€ nhanh.

Muแป‘n kรฉo ngแปฏ cแบฃnh ngoร i & tรฌm kiแบฟm

Chแปn Gemini. Gom README, guideline, link web โ†’ hแปฃp
nhแบฅt checklist & script; hแปฏu รญch khi khแปŸi tแบกo dแปฑ รกn nhiแปu rร ng buแป™c.

Muแป‘n refactor & quแบฃn lรฝ codebase lแป›n

Chแปn Claude. Giao nhiแป‡m vแปฅ tแป• chแปฉc lแบกi cแบฅu trรบc,
sinh PR cรณ mรด tแบฃ; yรชu cแบงu giแบฃi thรญch kiแบฟn trรบc & tรกc ฤ‘แป™ng.

Playwright Agents โ€” ๐ŸŽญ Planner, ๐ŸŽญ Generator, ๐ŸŽญ Healer

What are Playwright Agents?

This article distills the official guidance and demo video into a practical, productionโ€‘ready walkthrough. Playwright ships three agents you can run independently or in a loop: ๐ŸŽญ Planner, ๐ŸŽญ Generator, and ๐ŸŽญ Healer.

๐ŸŽญ Planner

Explores your app and produces a humanโ€‘readable Markdown plan.

  • Input: a clear request (e.g. “Generate a plan for guest checkout”), a seed test, optional PRD.
  • Output: specs/*.md with scenarios, steps, and expected results.

๐ŸŽญ Generator

Converts the Markdown plan into executable Playwright tests and validates selectors/assertions during generation.

  • Input: Markdown from specs/, seed test and fixtures.
  • Output: tests/*.spec.ts aligned to the plan.

๐ŸŽญ Healer

Runs tests, replays failures, proposes patches (locator updates, waits, data fixes) and reโ€‘runs until passing or guardrails stop.

  • Input: failing test name.
  • Output: a passing test or a skipped test if functionality is broken.
๐ŸŽญ Planner โ†’ ๐ŸŽญ Generator โ†’ ๐ŸŽญ Healer Overview

1. Requirements

  • Node.js 18+ and npm
  • Playwright Test latest version
  • VS Code 1.105+ (Insiders channel) for full agentic UI experience
  • AI Assistant – Choose one: Claude Code, OpenCode, or VS Code with AI extensions
  • Git for version control
  • Modern web browser (Chrome, Firefox, Safari)

2. Step-by-Step Installation Guide

Step 1: Prerequisites

  • Install Node.js 18+ from nodejs.org
  • Install npm (comes with Node.js)
  • Install VS Code 1.105+ from VS Code Insiders for agentic experience
  • Choose and install an AI Assistant:
    • Claude Code – for Claude integration
    • OpenCode – for OpenAI integration
    • VS Code with AI extensions – for built-in AI features
  • Install Git for version control

Step 2: Navigate to Demo Directory

# Navigate to the demo directory
C:\Users\ADMIN\Documents\AI_QUEST_LTP> cd "playwright Agent Test Example - PhatLT"

Step 3: Install Dependencies

playwright Agent Test Example - PhatLT> npm install
playwright Agent Test Example - PhatLT> npx playwright install

Step 4: Initialize Playwright Agents

# Initialize agent definitions for Claude Code (recommended)
playwright Agent Test Example - PhatLT> npx playwright init-agents --loop=claude

# Or for VS Code
playwright Agent Test Example - PhatLT> npx playwright init-agents --loop=vscode

# Or for OpenCode
playwright Agent Test Example - PhatLT> npx playwright init-agents --loop=opencode

Step 5: Verify Setup

# Test seed file
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts

# Check project structure
playwright Agent Test Example - PhatLT> dir .claude\agents
playwright Agent Test Example - PhatLT> dir .github
playwright Agent Test Example - PhatLT> dir specs
playwright Agent Test Example - PhatLT> npm init -y
Wrote to playwright Agent Test Example - PhatLT\package.json:
{
  "name": "phatlt-playwright",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "playwright test",
    "test:headed": "playwright test --headed",
    "test:ui": "playwright test --ui",
    "test:debug": "playwright test --debug",
    "test:chromium": "playwright test --project=chromium",
    "test:firefox": "playwright test --project=firefox",
    "test:webkit": "playwright test --project=webkit",
    "report": "playwright show-report",
    "codegen": "playwright codegen"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "type": "commonjs",
  "description": "",
  "devDependencies": {
    "@playwright/test": "^1.56.0",
    "@types/node": "^24.7.2"
  }
}

playwright Agent Test Example - PhatLT> npm install -D @playwright/test
added 1 package, and audited 2 packages in 2s
found 0 vulnerabilities

playwright Agent Test Example - PhatLT> npx playwright install
Installing browsers...
โœ“ Chromium 120.0.6099.109
โœ“ Firefox 120.0
โœ“ WebKit 17.4

playwright Agent Test Example - PhatLT> npx playwright init
โœ“ Created playwright.config.ts
โœ“ Created tests/
โœ“ Created tests/example.spec.ts
โœ“ Created tests/seed.spec.ts

3. Step-by-Step Testing Guide

Step 1: Test Seed File

Run the seed test to verify Playwright Agents setup:

# Test seed file for agents
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts

# Run with browser UI visible
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts --headed

# Run in debug mode
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts --debug

Step 2: Test Generated Tests

Run the example generated tests from the Generator agent:

# Run generated Google search tests
playwright Agent Test Example - PhatLT> npx playwright test tests/google-search-generated.spec.ts

# Run specific test by name
playwright Agent Test Example - PhatLT> npx playwright test --grep "Perform Basic Search"

# Run all tests
playwright Agent Test Example - PhatLT> npx playwright test

Step 3: Test Different Browsers

# Run tests only on Chromium
playwright Agent Test Example - PhatLT> npx playwright test --project=chromium

# Run tests only on Firefox
playwright Agent Test Example - PhatLT> npx playwright test --project=firefox

# Run tests only on WebKit
playwright Agent Test Example - PhatLT> npx playwright test --project=webkit

Step 4: Generate Test Reports

# Generate HTML report
playwright Agent Test Example - PhatLT> npx playwright show-report

# Run tests with UI mode
playwright Agent Test Example - PhatLT> npx playwright test --ui

Step 5: Using Playwright Agents

Now you can use the Playwright Agents workflow with Claude Code:

# In Claude Code, ask the Planner:
"I need test scenarios for Google search functionality. Use the planner agent to explore https://www.google.com"

# Then ask the Generator:
"Use the generator agent to create tests from the test plan in specs/"

# Finally, use the Healer if tests fail:
"The test 'Perform Basic Search' is failing. Use the healer agent to fix it."

4. Project Structure and Files

playwright Agent Test Example - PhatLT/
โ”œโ”€โ”€ .claude/agents/              # Claude Code agent definitions
โ”‚   โ”œโ”€โ”€ playwright-test-planner.md    # ๐ŸŽญ Planner agent
โ”‚   โ”œโ”€โ”€ playwright-test-generator.md  # ๐ŸŽญ Generator agent
โ”‚   โ””โ”€โ”€ playwright-test-healer.md     # ๐ŸŽญ Healer agent
โ”œโ”€โ”€ .github/                     # Official agent definitions
โ”‚   โ”œโ”€โ”€ planner.md               # ๐ŸŽญ Planner instructions
โ”‚   โ”œโ”€โ”€ generator.md             # ๐ŸŽญ Generator instructions
โ”‚   โ””โ”€โ”€ healer.md                # ๐ŸŽญ Healer instructions
โ”œโ”€โ”€ specs/                       # Test plans (Markdown)
โ”‚   โ””โ”€โ”€ google-search-operations.md   # Example test plan
โ”œโ”€โ”€ tests/                       # Generated tests
โ”‚   โ”œโ”€โ”€ seed-agents.spec.ts      # Seed test for agents
โ”‚   โ””โ”€โ”€ google-search-generated.spec.ts  # Generated test example
โ”œโ”€โ”€ .mcp.json                    # MCP server configuration
โ”œโ”€โ”€ playwright.config.ts         # Playwright configuration
โ”œโ”€โ”€ package.json                 # Project dependencies
โ””โ”€โ”€ test-results/               # Test execution results

5. How Playwright Agents Work (Endโ€‘toโ€‘End)

  1. ๐ŸŽญ Planner โ€” explores your app and creates human-readable test plans saved in specs/ directory.
  2. ๐ŸŽญ Generator โ€” transforms Markdown plans into executable Playwright tests in tests/ directory.
  3. ๐ŸŽญ Healer โ€” automatically repairs failing tests by updating selectors and waits.
  4. Execution โ€” run generated tests with npx playwright test.
  5. Maintenance โ€” Healer fixes issues automatically, keeping tests stable over time.
playwright Agent Test Example - PhatLT> npx playwright test tests/seed-agents.spec.ts

Running 1 test using 1 worker

  โœ“ [chromium] โ€บ tests/seed-agents.spec.ts โ€บ seed (2.1s)

  1 passed (2.1s)

playwright Agent Test Example - PhatLT> npx playwright test tests/google-search-generated.spec.ts

Running 5 tests using 1 worker

  โœ“ [chromium] โ€บ tests/google-search-generated.spec.ts โ€บ Google Search - Basic Operations โ€บ Perform Basic Search (3.2s)
  โœ“ [chromium] โ€บ tests/google-search-generated.spec.ts โ€บ Google Search - Basic Operations โ€บ Verify Search Box Functionality (1.8s)
  โœ“ [chromium] โ€บ tests/google-search-generated.spec.ts โ€บ Google Search - Basic Operations โ€บ Search with Empty Query (1.5s)
  โœ“ [chromium] โ€บ tests/google-search-generated.spec.ts โ€บ Google Search - Results Validation โ€บ Verify Search Results Display (4.1s)
  โœ“ [chromium] โ€บ tests/google-search-generated.spec.ts โ€บ Google Search - Results Validation โ€บ Navigate Through Search Results (5.3s)

  5 passed (16.0s)

6. How Playwright Agents Work

Playwright Agents follow a structured workflow as described in the official documentation. The process involves three main agents working together:

๐ŸŽญ Planner Agent

The Planner explores your application and creates human-readable test plans:

  • Input: Clear request (e.g., “Generate a plan for guest checkout”), seed test, optional PRD
  • Output: Markdown test plan saved as specs/basic-operations.md
  • Process: Runs seed test to understand app structure and creates comprehensive test scenarios

๐ŸŽญ Generator Agent

The Generator transforms Markdown plans into executable Playwright tests:

  • Input: Markdown plan from specs/
  • Output: Test suite under tests/
  • Process: Verifies selectors and assertions live, generates robust test code

๐ŸŽญ Healer Agent

The Healer automatically repairs failing tests:

  • Input: Failing test name
  • Output: Passing test or skipped test if functionality is broken
  • Process: Replays failing steps, inspects UI, suggests patches, re-runs until passing
// Example: Generated test from specs/basic-operations.md
// spec: specs/basic-operations.md
// seed: tests/seed.spec.ts

import { test, expect } from '../fixtures';

test.describe('Adding New Todos', () => {
  test('Add Valid Todo', async ({ page }) => {
    // 1. Click in the "What needs to be done?" input field
    const todoInput = page.getByRole('textbox', { name: 'What needs to be done?' });
    await todoInput.click();

    // 2. Type "Buy groceries"
    await todoInput.fill('Buy groceries');

    // 3. Press Enter key
    await todoInput.press('Enter');

    // Expected Results:
    // - Todo appears in the list with unchecked checkbox
    await expect(page.getByText('Buy groceries')).toBeVisible();
    const todoCheckbox = page.getByRole('checkbox', { name: 'Toggle Todo' });
    await expect(todoCheckbox).toBeVisible();
    await expect(todoCheckbox).not.toBeChecked();

    // - Counter shows "1 item left"
    await expect(page.getByText('1 item left')).toBeVisible();

    // - Input field is cleared and ready for next entry
    await expect(todoInput).toHaveValue('');
    await expect(todoInput).toBeFocused();

    // - Todo list controls become visible
    await expect(page.getByRole('checkbox', { name: 'โฏMark all as complete' })).toBeVisible();
  });
});

7. Agent Deep Dives

๐ŸŽญ Planner โ€” author plans that generate great tests

  • Goal: Convert product intent into executable, atomic scenarios.
  • Inputs: business request, seed.spec.ts, optional PRD/acceptance criteria.
  • Output quality tips: prefer userโ€‘intent over UI steps, keep 1 scenario = 1 assertion focus, name entities consistently.
  • Antiโ€‘patterns: mixing setup/teardown into steps; overโ€‘specifying selectors in Markdown.

๐ŸŽญ Generator โ€” compile plans into resilient tests

  • Validates selectors live: uses your running app to confirm locators/assertions.
  • Structure: mirrors specs/*.md; adds fixtures from seed.spec.ts; keeps tests idempotent.
  • Resilience: prefer roles/labels; avoid brittle CSS/XPath; centralize waits.

๐ŸŽญ Healer โ€” stabilize and protect correctness

  • Scope: flaky selectors, timing, deterministic data; not businessโ€‘logic rewrites.
  • Review gates: patches proposed as diffs; you accept/reject before merge.
  • Outcomes: test fixed, or skipped with a documented reason when the feature is broken.

8. Project Structure and Artifacts

Playwright Agents follow a structured approach as described in the official documentation. The generated files follow a simple, auditable structure:

repo/
  .github/                    # agent definitions
    planner.md               # planner agent instructions
    generator.md             # generator agent instructions  
    healer.md                # healer agent instructions
  specs/                     # human-readable test plans
    basic-operations.md      # generated by planner
  tests/                     # generated Playwright tests
    seed.spec.ts             # seed test for environment
    add-valid-todo.spec.ts   # generated by generator
  playwright.config.ts       # Playwright configuration

Agent Definitions (.github/)

Under the hood, agent definitions are collections of instructions and MCP tools provided by Playwright. They should be regenerated whenever Playwright is updated:

# Initialize agent definitions
npx playwright init-agents --loop=vscode
npx playwright init-agents --loop=claude  
npx playwright init-agents --loop=opencode

Specs in specs/

Specs are structured plans describing scenarios in human-readable terms. They include steps, expected outcomes, and data. Specs can start from scratch or extend a seed test.

Tests in tests/

Generated Playwright tests, aligned one-to-one with specs wherever feasible. Generated tests may include initial errors that can be healed automatically by the healer agent.

Seed tests (seed.spec.ts)

Seed tests provide a ready-to-use page context to bootstrap execution. The planner runs this test to execute all initialization necessary for your tests including global setup, project dependencies, and fixtures.

// Example: seed.spec.ts
import { test, expect } from './fixtures';

test('seed', async ({ page }) => {
  // This test uses custom fixtures from ./fixtures
  // ๐ŸŽญ Planner will run this test to execute all initialization
  // necessary for your tests including global setup, 
  // project dependencies and all necessary fixtures and hooks
});

9. Examples from Official Documentation

๐ŸŽญ Planner Output Example

The ๐ŸŽญ Planner generates human-readable test plans saved as specs/basic-operations.md:

# TodoMVC Application - Basic Operations Test Plan

## Application Overview

The TodoMVC application is a React-based todo list manager that demonstrates 
standard todo application functionality. Key features include:

- **Task Management**: Add, edit, complete, and delete individual todos
- **Bulk Operations**: Mark all todos as complete/incomplete and clear all completed todos  
- **Filtering System**: View todos by All, Active, or Completed status with URL routing support
- **Real-time Counter**: Display of active (incomplete) todo count
- **Interactive UI**: Hover states, edit-in-place functionality, and responsive design

## Test Scenarios

### 1. Adding New Todos

**Seed:** `tests/seed.spec.ts`

#### 1.1 Add Valid Todo

**Steps:**
1. Click in the "What needs to be done?" input field
2. Type "Buy groceries"
3. Press Enter key

**Expected Results:**
- Todo appears in the list with unchecked checkbox
- Counter shows "1 item left"
- Input field is cleared and ready for next entry
- Todo list controls become visible (Mark all as complete checkbox)

๐ŸŽญ Generator Output Example

The ๐ŸŽญ Generator transforms the Markdown plan into executable Playwright tests:

// Generated test from specs/basic-operations.md
// spec: specs/basic-operations.md
// seed: tests/seed.spec.ts

import { test, expect } from '../fixtures';

test.describe('Adding New Todos', () => {
  test('Add Valid Todo', async ({ page }) => {
    // 1. Click in the "What needs to be done?" input field
    const todoInput = page.getByRole('textbox', { name: 'What needs to be done?' });
    await todoInput.click();

    // 2. Type "Buy groceries"
    await todoInput.fill('Buy groceries');

    // 3. Press Enter key
    await todoInput.press('Enter');

    // Expected Results:
    // - Todo appears in the list with unchecked checkbox
    await expect(page.getByText('Buy groceries')).toBeVisible();
    const todoCheckbox = page.getByRole('checkbox', { name: 'Toggle Todo' });
    await expect(todoCheckbox).toBeVisible();
    await expect(todoCheckbox).not.toBeChecked();

    // - Counter shows "1 item left"
    await expect(page.getByText('1 item left')).toBeVisible();

    // - Input field is cleared and ready for next entry
    await expect(todoInput).toHaveValue('');
    await expect(todoInput).toBeFocused();

    // - Todo list controls become visible
    await expect(page.getByRole('checkbox', { name: 'โฏMark all as complete' })).toBeVisible();
  });
});

10. Best Practices

  • Keep plans atomic: Small, focused scenarios help ๐ŸŽญ Generator produce clean tests. Avoid mixing multiple user flows in one scenario.
  • Stabilize with seed: Centralize navigation, authentication, and data seeding in seed.spec.ts to ensure consistent test environment.
  • Prefer semantic selectors: Use getByRole, getByLabel, and getByText for resilient element selection.
  • ๐ŸŽญ Healer guardrails: Review patches carefully; accept locator/wait tweaks, but avoid broad logic changes that might mask real bugs.
  • Version agent definitions: Commit .github/ changes and regenerate them whenever Playwright is updated.
  • Choose the right AI assistant: VS Code, Claude Code, or OpenCode โ€” pick the one that fits your team’s workflow and preferences.
  • Maintain traceability: Keep clear 1:1 mapping from specs/*.md to tests/*.spec.ts using comments and headers.
  • Test the agents: Start with simple scenarios to understand how each agent works before tackling complex user flows.

11. Troubleshooting

๐ŸŽญ Planner can’t explore the app

Ensure your app is running locally, seed test works, and the app is accessible. Check that authentication and navigation are properly set up in seed.spec.ts.

๐ŸŽญ Generator can’t find elements

Run the app locally, ensure routes are correct, and verify that elements have proper roles, labels, or accessible names. The ๐ŸŽญ Generator validates selectors live against your running app.

๐ŸŽญ Healer loops without fixing

Set explicit timeouts, add deterministic test data, and reduce flakiness in network waits. The ๐ŸŽญ Healer works best with stable, predictable test conditions.

AI assistant doesn’t trigger agents

Re-run npx playwright init-agents --loop=[assistant], reload the IDE, and ensure the correct workspace root is open with agent definitions in .github/.

Generated tests fail immediately

Check that your seed test passes first. Ensure the app state matches what the ๐ŸŽญ Planner observed. Verify that test data and authentication are consistent between planning and execution.

Agent definitions are outdated

Regenerate agent definitions after Playwright updates: npx playwright init-agents --loop=[assistant]. This ensures you have the latest tools and instructions.

12. CI/CD Integration

You can run the same agentโ€‘generated tests in CI. Keep agent definitions in the repo and refresh them on Playwright upgrades.

# .github/workflows/tests.yml (excerpt)
name: Playwright Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx playwright install --with-deps
      - run: npx playwright test --reporter=html

13. FAQ

Do I need Claude Code?

No. Playwright Agents work with VS Code (v1.105+), Claude Code, or OpenCode. Choose the AI assistant that fits your team’s workflow and preferences.

Where do test plans live?

In specs/ as Markdown files generated by the ๐ŸŽญ Planner. Generated tests go to tests/.

What if a feature is actually broken?

The ๐ŸŽญ Healer can skip tests with an explanation instead of masking a real bug. It distinguishes between flaky tests and genuinely broken functionality.

Can I run agent-generated tests in CI?

Yes. The agents produce standard Playwright tests that run with npx playwright test in CI. Agent definitions are only needed for test authoring, not execution.

How do I update agent definitions?

Run npx playwright init-agents --loop=[assistant] whenever Playwright is updated to get the latest tools and instructions.

What’s the difference between ๐ŸŽญ Planner, ๐ŸŽญ Generator, and ๐ŸŽญ Healer?

๐ŸŽญ Planner: Explores your app and creates human-readable test plans. ๐ŸŽญ Generator: Transforms plans into executable Playwright tests. ๐ŸŽญ Healer: Automatically fixes failing tests by updating selectors and waits.

14. Demo video and Source code

GitHubGitHub repository: phatltscuti/playwright_agents