Fine-Tuning GPT-OSS-20B on Google Colab Using Unsloth and LoRA

1. Introduction

In today’s rapidly advancing field of AI, the use of AI models — or more specifically, running them on personal computers — has become more common than ever.
However, some AI models have become increasingly difficult to use because the training data required for them is massive, often involving millions of parameters.
This makes it nearly impossible for low-end computers to use them effectively for work or projects.

Therefore, in this article, we will explore Google Colab together with Unsloth’s fine-tuning tool, combined with LoRA, to fine-tune and use gpt-oss-20b according to our own needs.


2. Main Content

a. What is Unsloth?

  • Unsloth is a modern Python library designed to speed up and optimize the fine-tuning of large language models (LLMs) such as LLaMA, Mistral, Mixtral, and others.
    It makes model training and fine-tuning extremely fast, memory-efficient, and easy — even on limited hardware like a single GPU or consumer-grade machines.

b. What is Colab?

  • Colab is a hosted Jupyter Notebook service that requires no setup and provides free access to computing resources, including GPUs and TPUs.
    It is particularly well-suited for machine learning, data science, and education purposes.

c. What is LoRA?

  • Low-Rank Adaptation (LoRA) is a technique for quickly adapting machine learning models to new contexts.
    LoRA helps make large and complex models more suitable for specific tasks. It works by adding lightweight layers to the original model rather than modifying the entire architecture.
    This allows developers to quickly expand and specialize machine learning models for various applications.

3. Using Colab to Train gpt-oss-20b

– Installing the Libraries

!pip install --upgrade -qqq uv

try:
    import numpy
    install_numpy = f"numpy=={numpy.__version__}"
except:
    install_numpy = "numpy"

!uv pip install -qqq \
  "torch>=2.8.0" "triton>=3.4.0" {install_numpy} \
  "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
  "unsloth[base] @ git+https://github.com/unslothai/unsloth" \
  torchvision bitsandbytes \
  git+https://github.com/huggingface/[email protected] \
  git+https://github.com/triton-lang/triton.git@05b2c186c1b6c9a08375389d5efe9cb4c401c075#subdirectory=python/triton_kernels

– After completing the installation, load the gpt-oss-20b model from Unsloth:

from unsloth import FastLanguageModel
import torch

max_seq_length = 1024
dtype = None
model_name = "unsloth/gpt-oss-20b"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    dtype = dtype,                 # None for auto detection
    max_seq_length = max_seq_length,  # Choose any for long context!
    load_in_4bit = True,           # 4 bit quantization to reduce memory
    full_finetuning = False,       # [NEW!] We have full finetuning now!
    # token = "hf_...",            # use one if using gated models
)
Colab install output

– Adding LoRA for Fine-Tuning

model = FastLanguageModel.get_peft_model(
    model,
    r = 8,  # Choose any number > 0! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,              # Optimized fast path
    bias = "none",                 # Optimized fast path
    # "unsloth" uses less VRAM, fits larger batches
    use_gradient_checkpointing = "unsloth",  # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)
Tip: If you hit out-of-memory (OOM), reduce max_seq_length, set a smaller r, or increase gradient_accumulation_steps.

– Testing the Model Before Fine-Tuning

Now, let’s test how the model responds before fine-tuning:

messages = [
    {"role": "system", "content": "Bạn là Shark B, một nhà đầu tư nổi tiếng, thẳng thắn và thực tế", "thinking": None},
    {"role": "user", "content": "Bạn hãy giới thiệu bản thân"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "low",
).to(model.device)

from transformers import TextStreamer
_ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))
Generation preview

– Load data for finetune model

Dataset sample

Dataset preview
def formatting_prompts_func(examples):
    convos = examples["messages"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }

from datasets import load_dataset
dataset = load_dataset("json", data_files="data.jsonl", split="train")
dataset
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True)

– Train model

The following code snippet defines the configuration and setup for the fine-tuning process.
Here, we use SFTTrainer and SFTConfig from the trl library to perform Supervised Fine-Tuning (SFT) on our model.
The configuration specifies parameters such as batch size, learning rate, optimizer type, and number of training epochs.

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = SFTConfig(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1,  # Set this for 1 full training run.
        # max_steps = 30,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",  # Use this for WandB etc.
    ),
)

trainer_stats = trainer.train()

– After training, try the fine-tuned model

# Example reload (set to True to run)
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "finetuned_model",  # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 1024,
        dtype = None,
        load_in_4bit = True,
    )

    messages = [
        {"role": "system", "content": "Bạn là Shark B, một nhà đầu tư nổi tiếng, thẳng thắn và thực tế", "thinking": None},
        {"role": "user", "content": "Bạn hãy giới thiệu bản thân"},
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt = True,
        return_tensors = "pt",
        return_dict = True,
        reasoning_effort = "low",
    ).to(model.device)

    from transformers import TextStreamer
    _ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))
Note: Replace finetuned_model with your actual model path (e.g., outputs or the directory you saved/merged adapters to).

Colab notebook: Open your Colab here.


4. Conclusion & Next Steps

By combining Unsloth (for speed and memory efficiency), LoRA (for lightweight adaptation), and Google Colab (for accessible compute), you can fine-tune gpt-oss-20b even on modest hardware. The workflow above helps you:

  • Install a reproducible environment with optimized kernels.
  • Load gpt-oss-20b in 4-bit to reduce VRAM usage.
  • Attach LoRA adapters to train only a small set of parameters.
  • Prepare chat-style datasets and run supervised fine-tuning with TRL’s SFTTrainer.
  • Evaluate before/after to confirm your improvements.
Open the Colab
Clone the notebook, plug in your dataset, and fine-tune your own assistant in minutes.

Codex CLI vs Gemini CLI vs Claude Code

1. Codex CLI – Capabilities and New Features

According to OpenAI’s official announcement (“Introducing upgrades to Codex”), Codex CLI has been rebuilt on top of GPT-5-Codex, turning it into an agentic programming assistant — a developer AI that can autonomously plan, reason, and execute tasks across coding environments.

🌟 Core Abilities

  • Handles both small and large tasks: From writing a single function to refactoring entire projects.
  • Cross-platform integration: Works seamlessly across terminal (CLI), IDE (extension), and cloud environments.
  • Task reasoning and autonomy: Can track progress, decompose goals, and manage multi-step operations independently.
  • Secure by design: Runs in a sandbox with explicit permission requests for risky operations.

📈 Performance Highlights

  • Uses 93.7% fewer reasoning tokens for simple tasks, but invests 2× more computation on complex ones.
  • Successfully ran over 7 hours autonomously on long software tasks during testing.
  • Produces more precise code reviews than older Codex versions.

🟢 In short: Codex CLI 2025 is not just a code generator — it’s an intelligent coding agent capable of reasoning, multitasking, and working securely across terminal, IDE, and cloud environments.

2.Codex CLI vs Gemini CLI vs Claude Code: The New Era of AI in the Terminal

The command line has quietly become the next frontier for artificial intelligence.
While graphical AI tools dominate headlines, the real evolution is unfolding inside the terminal — where AI coding assistants now operate directly beside you, as part of your shell workflow.

Three major players define this new space: Codex CLI, Gemini CLI, and Claude Code.
Each represents a different philosophy of how AI should collaborate with developers — from speed and connectivity to reasoning depth. Let’s break down what makes each contender unique, and where they shine.


🧩 Codex CLI — OpenAI’s Code-Focused Terminal Companion

Codex CLI acts as a conversational layer over your terminal.
It listens to natural language commands, interprets your intent, and translates it into executable code or shell operations.
Now powered by OpenAI’s Codex5-Medium, it builds on the strengths of the o4-mini generation while adding adaptive reasoning and a larger 256K-token context window.

Once installed, Codex CLI integrates seamlessly with your local filesystem.
You can type:

“Create a Python script that fetches GitHub issues and logs them daily,”
and watch it instantly scaffold the files, import the right modules, and generate functional code.

Codex CLI supports multiple languages — Python, JavaScript, Go, Rust, and more — and is particularly strong at rapid prototyping and bug fixing.
Its defining trait is speed: responses feel immediate, making it perfect for fast iteration cycles.

Best for: developers who want quick, high-quality code generation and real-time debugging without leaving the terminal.


🌤️ Gemini CLI — Google’s Adaptive Terminal Intelligence

Gemini CLI embodies Google’s broader vision for connected AI development — blending reasoning, utility, and live data access.
Built on Gemini 2.5 Pro, this CLI isn’t just a coding bot — it’s a true multitool for developers and power users alike.

Beyond writing code, Gemini CLI can run shell commands, retrieve live web data, or interface with Google Cloud services.
It’s ideal for workflows that merge coding with external context — for example:

  • fetching live API responses,

  • monitoring real-time metrics,

  • or updating deployment configurations on-the-fly.

Tight integration with VS Code, Google Cloud SDK, and Workspace tools turns Gemini CLI into a full-spectrum AI companion rather than a mere code generator.

Best for: developers seeking a versatile assistant that combines coding intelligence with live, connected utility inside the terminal.


🧠 Claude Code — Anthropic’s Deep Code Reasoner

If Codex is about speed, and Gemini is about connectivity, Claude Code represents depth.
Built on Claude Sonnet 4.5, Anthropic’s upgraded reasoning model, Claude Code is designed to operate as a true engineering collaborator.

It excels at understanding, refactoring, and maintaining large-scale codebases.
Claude Code can read entire repositories, preserve logic across files, and even generate complete pull requests with human-like commit messages.
Its upgraded 250K-token context window allows it to track dependencies, explain architectural patterns, and ensure code consistency over time.

Claude’s replies are more analytical — often including explanations, design alternatives, and justifications for each change.
It trades a bit of speed for a lot more insight and reliability.

Best for: professional engineers or teams managing complex, multi-file projects that demand reasoning, consistency, and full-codebase awareness.

3.Codex CLI vs Gemini CLI vs Claude Code: Hands-on With Two Real Projects

While benchmarks and specs are useful, nothing beats actually putting AI coding agents to work.
To see how they perform on real, practical front-end tasks, I tested three leading terminal assistants — Codex CLI (Codex5-Medium), Gemini CLI (Gemini 2.5 Pro), and Claude Code (Sonnet 4.5) — by asking each to build two classic web projects using only HTML, CSS, and JavaScript.

  • 🎮 Project 1: Snake Game — canvas-based, pixel-style, smooth movement, responsive.

  • Project 2: Todo App — CRUD features, inline editing, filters, localStorage, dark theme, accessibility + keyboard support.

🎮 Task 1 — Snake Game

Goal

Create a playable 2D Snake Game using HTML, CSS, and JavaScript.
Display a grid-based canvas with a moving snake that grows when it eats food.
The snake should move continuously and respond to arrow-key inputs.
The game ends when the snake hits the wall or itself.
Include a score counter and a restart button with pixel-style graphics and responsive design.

Prompt

Create a playable 2D Snake Game using HTML, CSS, and JavaScript.

  The game should display a grid-based canvas with a moving snake that grows when it eats

  food.

  The snake should move continuously and respond to keyboard arrow keys for direction

  changes.

  The game ends when the snake hits the wall or itself.

  Show a score counter and a restart button.

  Use smooth movement, pixel-style graphics, and responsive design for different screen sizes

Observations

Codex CLI — Generated the basic canvas scaffold in seconds. Game loop, input, and scoring worked out of the box, but it required minor tuning for smoother turning and anti-reverse logic.

Gemini CLI — Delivered well-structured, commented code and used requestAnimationFrame properly. Gameplay worked fine, though the UI looked plain — more functional than fun.

Claude Code — Produced modular, production-ready code with solid collision handling, restart logic, and a polished HUD. Slightly slower response but the most complete result overall.

✅ Task 2 — Todo App

Goal

Build a complete, user-friendly Todo List App using only HTML, CSS, and JavaScript (no frameworks).
Features: add/edit/delete tasks, mark complete/incomplete, filter All / Active / Completed, clear completed, persist via localStorage, live counter, dark responsive UI, and full keyboard accessibility (Enter/Space/Delete).
Deliverables: index.html, style.css, app.js — clean, modular, commented, semantic HTML + ARIA.

Prompt

Develop a complete and user-friendly Todo List App using only HTML, CSS, and JavaScript (no frameworks). The app should include the following functionality and design requirements:

    1. Input field and ‘Add’ button to create new tasks.
    2. Ability to mark tasks as complete/incomplete via checkboxes.
    3. Inline editing of tasks by double-clicking — pressing Enter saves changes and Esc cancels.
    4. Delete buttons to remove tasks individually.
    5. Filter controls for All, Active, and Completed tasks.
    6. A ‘Clear Completed’ button to remove all completed tasks at once.
    7. Automatic saving and loading of todos using localStorage.
    8. A live counter showing the number of active (incomplete) tasks.
    9. A modern, responsive dark theme UI using CSS variables, rounded corners, and hover effects.
    10. Keyboard accessibility — Enter to add, Space to toggle, Delete to remove tasks.
      Ensure the project is well structured with three separate files:
    • index.html
    • style.css
    • app.js
      Code should be clean, modular, and commented, with semantic HTML and appropriate ARIA attributes for accessibility.

Observations

Codex CLI — Created a functional 3-file structure with working CRUD, filters, and persistence. Fast, but accessibility and keyboard flows needed manual reminders.

Gemini CLI — Balanced logic and UI nicely. Used CSS variables for a simple dark theme and implemented localStorage properly.
Performance was impressive — Gemini was the fastest overall, but its default design felt utilitarian, almost as if it “just wanted to get the job done.”
Gemini focuses on correctness and functionality rather than visual finesse.

Claude Code — Implemented inline editing, keyboard shortcuts, ARIA live counters, and semantic roles perfectly. The result was polished, responsive, and highly maintainable.

4.Codex CLI vs Gemini CLI vs Claude Code — Real-World Comparison

When testing AI coding assistants, speed isn’t everything — clarity, structure, and the quality of generated code all matter. To see how today’s top command-line tools compare, I ran the same set of projects across Claude Code, Gemini CLI, and Codex CLI, including a 2D Snake Game and a Todo List App.
Here’s how they performed.


Claude Code: Polished and Reliable

Claude Code consistently produced the most professional and complete results.
Its generated code came with clear structure, organized logic, and well-commented sections.
In the Snake Game test, Claude built the best-looking user interface, with a balanced layout, responsive design, and smooth movement logic.
Error handling was handled cleanly, and the overall experience felt refined — something you could hand over to a production team with confidence.
Although it wasn’t the fastest, Claude made up for it with code quality, structure, and ease of prompt engineering.
If your workflow values polish, maintainability, and readability, Claude Code is the most dependable choice.


Gemini CLI: Fastest but Basic

Gemini CLI clearly took the top spot for speed.
It executed quickly, generated files almost instantly, and made iteration cycles shorter.
However, the output itself felt minimal and unrefined — both the UI and the underlying logic were quite basic compared to Claude or Codex.
In the Snake Game task, Gemini produced a playable result but lacked visual polish and consistent structure.
Documentation and comments were also limited.
In short, Gemini is great for rapid prototyping or testing ideas quickly, but not for projects where you need beautiful UI, advanced logic, or long-term maintainability.


Codex CLI: Flexible but Slower

Codex CLI offered good flexibility and handled diverse prompts reasonably well.
It could generate functional UIs with decent styling, somewhere between Gemini’s simplicity and Claude’s refinement.
However, its main drawback was speed — responses were slower, and sometimes additional manual intervention was needed to correct or complete the code.
Codex is still a solid option when you need to tweak results manually or explore multiple implementation approaches, but it doesn’t match Claude’s polish or Gemini’s speed.


Overall Impression

After testing multiple projects, the overall ranking became clear:

  • Gemini CLI is the fastest but produces simple and unpolished code.

  • Claude Code delivers the most reliable, structured, and visually refined results.

  • Codex CLI sits in between — flexible but slower and less cohesive.

Each tool has its strengths. Gemini is ideal for quick builds, Codex for experimentation, and Claude Code for professional, trust-ready outputs.

In short:

Gemini wins on speed. Claude wins on quality. Codex stands in between — flexible but slower.

Automatically Generate Google Slides with an AI Prompt

I came across a very interesting idea from the author まじん (Majin) on note.com:

Majin used Gemini to turn a single prompt into a complete Google Slides presentation, but I tried customizing it to run with ChatGPT (or Google AI Studio), and the results were quite exciting.


1. 🔍 Structure of Majin’s Prompt

Through analysis, Majin’s prompt has the following main components:

  • Role assignment for AI: The AI is not just a chatbot, but acts as a Data Scientist and Presentation Designer.

  • Absolute mission: From a piece of input text, the AI must output a JavaScript object array (slideData) that defines the entire slide structure.

  • Fixed blueprint: The prompt includes a Google Apps Script (GAS) framework, where you only need to replace slideData to generate Google Slides.

  • SlideData includes:

    • title: Slide title

    • content: Content (in the form of bullets, text, or tables)

    • pattern: Display type (Title, List, TwoColumn, Image, …)

    • speakerNote: Presenter’s notes

👉 The important point: The prompt does not directly create slides, but outputs JSON-like data that the GAS script uses to build the slides.


2. 🛠️ Customizing & Running with ChatGPT

First, download system prompt.md

🔧 Steps inside the Project

  1. Create a new Project → give it a name (e.g., Google Slide Auto Generator).

  2. Add a file → upload system prompt.md.

  3. In the Project Settings / Overview screen, you will find the Instructions (System message) box → paste the line:
    添付した「system prompt.md」に忠実に従ってください。

  4. (Meaning: Please strictly follow the attached system prompt.md).
    If you don’t see a dedicated “Instructions” field, ChatGPT may show “Description” or “Setup” instead → paste it there. It works as the System Prompt for the whole Project.

  5. From now on, in that Project, whenever you input raw text, ChatGPT will immediately reference system prompt.md and output both slideData + GAS code.

👉 In other words: the Project’s Instruction acts as a dedicated System Message for that workspace. The UI may look different, but it’s always under “Overview/Setup” when you open the Project.

Next, type a prompt about the content you want to make slides from → and it will generate slideData.

I tested it with content about LLMs and got the following result:

const slideData = [

{

type: ‘title’,

title: ‘The Busy Person’s Intro to Large Language Models’,

date: ‘2025.09.21’,

notes: ‘Re-recorded overview of LLMs focusing on what they are, how to run them, how they are trained, and why next-word prediction works’

},

{

type: ‘content’,

title: ‘Agenda’,

points: [

‘What is an LLM’,

‘Running a model locally’,

‘Training at scale’,

‘Why next-word prediction is powerful’,

‘Key takeaways’

],

notes: ‘Keep it practical and concrete. We start with the simplest mental model, then show inference vs training, then the intuition for why LLMs encode knowledge’

},

{

type: ‘section’,

title: ‘1. What is an LLM’,

notes: ‘Define an LLM at the simplest level and frame the talk around artifacts you can hold on disk’

},

{

type: ‘content’,

title: ‘LLM = Two Files’,

subhead: ‘Weights + Runner’,

points: [

‘Parameters file holds learned weights’,

‘Runner implements forward pass of the network’,

‘Self contained package runs offline’

],

notes: ‘Stress that inference can be simple and local. The interesting part lives in the parameters learned during training’

},

{

type: ‘content’,

title: ‘Example: Llama 2 • 70B’,

subhead: ‘Open weights from Meta’,

points: [

‘Sizes include 7B 13B 34B 70B’,

‘Open weights and paper enable local study’,

‘Closed APIs trade control for convenience’

],

notes: ‘Contrast open weights with closed APIs like hosted assistants without exposing internals’

},

{

type: ‘section’,

title: ‘2. Running a Model’,

notes: ‘Give a feel for local inference and why the demo used a smaller model for speed’

},

{

type: ‘process’,

title: ‘Local Inference Flow’,

subhead: ‘From files to generations’,

steps: [

‘Get weights file on disk’,

‘Compile or install a runner’,

‘Point runner to weights’,

‘Send a prompt’,

‘Stream generated tokens’

],

notes: ‘Mention that a 7B model is fast enough on a laptop while 70B is roughly an order of magnitude slower’

},

{

type: ‘compare’,

title: ‘Open Weights vs Closed API’,

leftTitle: ‘Open Weights’,

rightTitle: ‘Closed API’,

leftItems: [

‘Full local control’,

‘Offline capable’,

‘Tunable and inspectable’

],

rightItems: [

‘Best quality access’,

‘No infra to manage’,

‘Faster to integrate’

],

notes: ‘There is no universal winner. Choose based on control privacy latency and capability needs’

},

{

type: ‘section’,

title: ‘3. Training at Scale’,

notes: ‘Explain where weights come from and why training is expensive compared to inference’

},

{

type: ‘content’,

title: ‘Training Ingredients’,

subhead: ‘Data Compute Cost’,

points: [

‘Large curated text corpus’,

‘Massive GPU clusters for days’,

‘Significant budget for runs’

],

notes: ‘Use order of magnitude thinking rather than exact numbers. Emphasize engineering and data work beyond raw compute’

},

{

type: ‘timeline’,

title: ‘From Corpus to Weights’,

subhead: ‘Very high level lifecycle’,

milestones: [

{ label: ‘Collect and filter data’, date: ‘Step 1’, state: ‘done’ },

{ label: ‘Train on GPU cluster’, date: ‘Step 2’, state: ‘done’ },

{ label: ‘Evaluate and align’, date: ‘Step 3’, state: ‘done’ },

{ label: ‘Release weights and runner’, date: ‘Step 4’, state: ‘next’ }

],

notes: ‘Keep the audience at the bird’s eye view. The output is a weights artifact you can reuse everywhere for inference’

},

{

type: ‘section’,

title: ‘4. Why It Works’,

notes: ‘Bridge from mechanics to intuition about capability’

},

{

type: ‘content’,

title: ‘Next Word Prediction Objective’,

subhead: ‘Simple objective rich consequences’,

points: [

‘Model predicts a distribution over next token’,

‘Good prediction forces world knowledge’,

‘Patterns styles and relations get compressed’

],

notes: ‘Relate prediction to lossy compression intuition. To predict the next token the model must encode structure facts and relationships’

},

{

type: ‘content’,

title: ‘Common Misconceptions’,

points: [

‘It is not copy paste of the web’,

‘Bigger is not always better’,

‘Closed is not always better than open’

],

notes: ‘Clarify generalization vs memorization and the role of data quality and algorithms along with scale’

},

{

type: ‘section’,

title: ‘5. Takeaways’,

notes: ‘Land the plane with pragmatic guidance’

},

{

type: ‘content’,

title: ‘Practical Guidance’,

points: [

‘Choose model size for latency and cost’,

‘Use open weights for control’,

‘Use hosted APIs for frontier quality’

],

notes: ‘Map choices to product constraints such as privacy cost latency and maintainability’

},

{

type: ‘closing’,

notes: ‘Thanks for watching. Invite questions and link resources in the description’

}

];

Next steps

  1. Open the .md file on your computer.

  2. Copy the code from the .md file.

  3. Open Google Slides → create a new slide deck.

  4. Click Extensions → Apps Script.

  5. Paste the copied code into the Apps Script editor.

  6. Paste your slideData.

  7. Press Run


3. ✅ Experience & Results

 

  • Works well on ChatGPT: No need for Gemini, GPT-5 is enough.

  • Advantage: The prompt standardizes the output into a JSON-like object, making it easy to control.

  • Reference implementation:


📌 Conclusion

  • Majin’s prompt is a great framework to turn AI into an automatic slide design tool.

  • It doesn’t have to be Gemini — ChatGPT (GPT-5) also works well.

  • You just need to customize the input → and you can generate Google Slides for any topic (training, pitching, learning…).

👉 This article was written with reference to blogs by まじん (Majin):

Trying the Realtime Prompting Guide for GPT-Realtime: Experiments with Vietnamese Voice Input

Try Realtime Prompting Guide for GPT-Realtime

1.Introduction

OpenAI’s Realtime API enables the creation of interactive voice experiences with ultra-low latency. Instead of waiting for a full text input, the model can “listen” to a user while they are still speaking and respond almost instantly. This makes it a powerful foundation for building voice assistants, audio chatbots, automated customer support, or multimodal creative applications.

To get the best results, writing a clear and well-structured prompt is essential. OpenAI published the Realtime Prompting Guide as a playbook for controlling model behavior in spoken conversations.

References:


2.What is GPT-Realtime

GPT-Realtime is a model/API designed to handle continuous audio input and provide rapid responses. Its key features include:

  • Real-time speech-to-text recognition.

  • Robust handling of noisy, cut-off, or unclear audio.

  • Customizable reactions to imperfect audio, such as asking for repetition, clarifying, or continuing in the user’s language.

  • Support for detailed prompting to ensure safe, natural, and reliable responses.


3.Overview of the Prompting Guide

The Realtime Prompting Guide outlines seven best practices for writing system prompts for voice agents:

1. Be precise, avoid conflicts.
Instructions must be specific and consistent. For example, if you say “ask again when unclear,” don’t also instruct the model to “guess when unsure.”

2. Use bullet points instead of paragraphs.
Models handle lists better than long prose.

3. Handle unclear audio.
Explicitly instruct what to do when input is noisy or incomplete: politely ask the user to repeat and only respond when confident.

4. Pin the language when needed.
If you want the entire conversation in one language (e.g., English only), state it clearly. Otherwise, the model may switch to mirror the user.

5. Provide sample phrases.
Include example greetings, clarifications, or closing lines to teach the model your desired style.

6. Avoid robotic repetition.
Encourage varied phrasing for greetings, confirmations, and closings to keep interactions natural.

7. Use capitalization for emphasis.
For example: “IF AUDIO IS UNCLEAR, ASK THE USER TO REPEAT.”

4.Prompt Examples

Sample Prompt A – Avoid conflicts, be clear

SYSTEM: – Always speak clearly and respond in the same language the user is speaking. – If the user’s audio is unclear, noisy, partial, or unintelligible, politely ask them to repeat. – Never assume what was said if you did not hear it clearly. – Use short, varied phrases; avoid repeating the same sentence. – Provide helpful, concise responses.

USER: “Hello, can you help me with my internet issue?”

ASSISTANT: (responds according to the prompt, asks for clarification if needed)


Sample Prompt B – Handling unclear audio

SYSTEM: – If the audio is not clear, ask: “I’m sorry, I didn’t catch that. Could you please repeat?” – Only respond when you are confident you understood the user’s request. – Continue conversation in the same language as the user. – Use bullet points for clarity.

USER: “Um… internet…” (noisy, unclear audio)

ASSISTANT: (follows the system instructions)


Sample Prompt C – Keep a natural style, avoid repetition

SYSTEM: – Do not respond with the same phrase twice. – Vary greetings, confirmations, closings. – Aim for a warm, helpful tone. – If the user ends the conversation, say something like: “If there’s anything else you need, feel free to ask.”

USER: “Thank you, that’s all.”

ASSISTANT: “You’re welcome! Glad I could help. Take care!” (or another variation)

5.Experiments

For my testing, I deliberately used Vietnamese speech to see how the model would react in different situations.

First Test: Speaking Unclear Without a Prompt

To begin, I tested what would happen if I spoke unclearly in Vietnamese without providing any system prompt.

For example, I said:

USER: “Ư… mệnh của mình không tỏa lại” (unclear Vietnamese speech)

The model responded in Indonesian, saying:

ASSISTANT: “Tentu, aku di sini untuk membantu. Coba ceritakan dulu apa yang lagi kamu rasakan atau alami. Kita bisa cari solusinya bareng-bareng.”

This shows that when no system prompt is defined, the model may guess or switch languages unpredictably, instead of asking for clarification.

Second Test: Adding a System Prompt

Next, I added a system prompt to guide the model’s behavior when the audio is unclear:

SYSTEM:
- If the audio is not clear, ask: “I’m sorry, I didn’t catch that. Could you please repeat?”
- Only respond when you are confident you understood the user’s request.
- Continue conversation in the same language as user.
- Use bullet points for clarity.

Then I spoke unclearly in Vietnamese again, for example:

USER: “Um… internet…” (spoken quietly, unclear audio)

This time, the model followed the system instructions and politely asked me to repeat. Sometimes, it even suggested that I try saying a simple test sentence so it could better check whether my voice was coming through clearly.

This shows how a well-written system prompt can prevent the model from making random guesses or switching languages, ensuring a more reliable and natural conversation flow.

Third Test: Singing to the Model

Finally, I experimented by singing in Vietnamese to see how the model would react.

The model listened carefully and was able to understand the lyrics and emotional tone of my singing. However, when I asked it to repeat the lyrics back to me, it refused due to copyright restrictions.

This shows that while GPT-Realtime can analyze and comment on songs — such as summarizing their meaning, describing the mood, or suggesting new lines in a similar style — it cannot reproduce lyrics verbatim. In practice, this means you should not expect the model to sing or echo back copyrighted content.

6.Conclusion

GPT-Realtime provides smooth and natural voice interactions with minimal latency. However, its effectiveness depends heavily on the prompt.

Key takeaways:

  • Always write a clear, bullet-pointed system prompt.
  • Define explicit behavior for unclear audio.
  • Control language use and discourage robotic repetition.
  • Respect copyright limitations: the model will not repeat lyrics verbatim but can summarize or create new content.

The Realtime Prompting Guide is a practical resource for building high-quality voice agents that are both natural and safe.

Installing and Using GPT-OSS 20B Locally with Ollama

In this document, we will explore how to install and run GPT-OSS 20B — a powerful open-weight language model released by OpenAI — locally, with detailed instructions for using it on a Tesla P40 GPU.

1. Quick Introduction to GPT-OSS 20B

  • GPT-OSS 20B is an open-weight language model from OpenAI, released in August 2025—the first since GPT-2—under the Apache 2.0 license, allowing free download, execution, and modification.

  • The model has about 21 billion parameters and can run efficiently on consumer machines with at least 16 GB of RAM or GPU VRAM.

  • GPT-OSS 20B uses a Mixture-of-Experts (MoE) architecture, activating only a subset of parameters (~3.6B) at each step, saving resources and energy.

  • The model supports chain-of-thought reasoning, enabling it to understand and explain reasoning processes step by step.


2. Hardware & Software Preparation

Hardware requirements:

  • RAM or VRAM: minimum 16 GB (can be system RAM or GPU VRAM).

  • Storage: around 12–20 GB for the model and data.

  • Operating system: macOS 11+, Windows, or Ubuntu are supported.

  • GPU (if available): Nvidia or AMD for acceleration. Without a GPU, the model still runs on CPU but very slowly.

Software options:

  • Ollama: the simplest method; quick installation with a convenient CLI.

  • LM Studio: a graphical interface, suitable for beginners.

  • Transformers + vLLM (Python): flexible for developers, integrates well into open-source pipelines.


3. How to Run GPT-OSS 20B with Ollama (GPU Tesla P40)

3.1 Goal and Timeline

  • Goal: successfully run GPT-OSS 20B locally using Ollama, leveraging the Tesla P40 GPU (24GB VRAM).

  • Timeline: the first setup takes about 15–20 minutes to download the model. After that, launching the model takes only a few seconds.

3.2 Environment Preparation

  • GPU: Tesla P40 with 24GB VRAM, sufficient for GPT-OSS 20B.

  • NVIDIA Driver: version 525 or higher recommended. In the sample logs, CUDA 12.0 works fine.

  • RAM: minimum 16GB.

  • Storage: at least 20GB free space; the model itself takes ~13GB plus cache.

  • Operating system: Linux (Ubuntu), macOS, or Windows. The following example uses Ubuntu.

3.3 Install Ollama

The fastest way:

curl -fsSL https://ollama.com/install.sh | sh

Or manually (Linux):

curl -LO https://ollama.com/download/ollama-linux-amd64.tgz
sudo tar -C /usr -xzf ollama-linux-amd64.tgz

Start the Ollama service:

OLLAMA_HOST=0.0.0.0:8888 ollama serve

When the log shows listening on [::]:8888, the server is ready.

3.4 Download GPT-OSS 20B

Open a new terminal and run:

OLLAMA_HOST=0.0.0.0:8888 ollama pull gpt-oss:20b

The first download is about 13GB. When the log shows success, the model is ready.

3.5 Run the Model

Start the model and try chatting:

OLLAMA_HOST=0.0.0.0:8888 ollama run gpt-oss:20b

Example:

>>> hi
Hello! 👋 How can I help you today?

3.6 Verify GPU Usage

Run:

nvidia-smi

Result: the Tesla P40 (24GB) consumes around 12–13GB VRAM for the process /usr/bin/ollama. The Ollama log also shows “offloading output layer to GPU” and “llama runner started in 8.05 seconds”, proving the model is running on GPU, not CPU.

3.7 Monitor API and Performance

Ollama exposes a REST API at http://127.0.0.1:8888.
Common endpoints include /api/chat and /api/generate.

Response times:

  • Short prompts: about 2–10 seconds.

  • Long or complex prompts: may take tens of seconds to a few minutes.


4. Conclusion

You have successfully run GPT-OSS 20B on a Tesla P40. The initial model download takes some time, but afterward it launches quickly and runs stably. With 24GB VRAM, the GPU can handle the large model without overload. While long prompts may still be slow, it is fully usable for real-world experiments and local project integration.

Running Latent Diffusion Model on Regular Computers via Google Colab and Ngrok

Introduction

In recent years, diffusion models such as the Latent Diffusion Model (LDM) have become the gold standard for text-to-image generation thanks to their high image quality, fast inference speed, and flexible fine-tuning capabilities. However, the biggest barrier for beginners is often the expensive GPU hardware requirement. This article will guide you on how to run LDM on a regular computer by taking advantage of Google Colab—a cloud environment that provides free/affordable GPU access, allowing you to focus on your ideas instead of hardware setup.

Main Content

What is Colab?

Colab is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs. Colab is especially well suited to machine learning, data science, and education.

What You Need to Prepare

  1. A Colab account – Getting one is very easy; just search for it and sign up. Colab can be used for free but with limits on memory, GPU, etc. You can also subscribe to a paid plan depending on your needs.

  2. An Ngrok account – Just like the first step, sign up for an account, then get your authtoken, which will be used later.

  3. An example LDM setup for this tutorial – In this article, we’ll use the Stable Diffusion Pipeline in Python. Hugging Face provides a list of text-to-image models ranging from weaker to stronger ones at: https://huggingface.co/models?pipeline_tag=text-to-image&sort=trending.
    Example repo for this article: https://github.com/thangnch/MiAI_StableDiffusion_Flask_Text2Image


Running on a Personal Computer

First, clone the above GitHub repo to your local machine. Then install the required libraries and packages such as pip, PyTorch, and diffusion.

Next, run the svr_model.py file.

Depending on whether your personal computer has GPU support (CUDA or MPS), the model can run on GPU; otherwise, it defaults to CPU—which is much slower.

Since my GPU doesn’t support NVIDIA CUDA, I had to run it on CPU.

After starting the server, the demo web page URL appears in the terminal.

Now we can generate an image from a prompt.

 

  • Speed: quite slow at 4.14s/it

  • Consumes a lot of CPU power

  • But still managed to generate an image successfully with the weakest model

This shows that although it is possible to run locally on your own machine, it is very slow and CPU-intensive, even with the smallest model.


Using Colab with Ngrok

  1. Visit Google Colab: https://colab.research.google.com/

  2. Visit Ngrok: https://dashboard.ngrok.com/get-started/setup/windows

In Colab:

Then:

  • Run another cell to install all required libraries (already listed in the repo)

  • Copy the code from text2image_model.py to start running the model

  • Next, copy the code from svr_model.py

Before running svr_model.py, install Ngrok in the Colab environment by running another code cell.

After installation:

  • Go to your Ngrok dashboard, copy your personal authtoken

  • Back in Colab, paste it into the Secrets section on the left sidebar, name it, and save

Now run svr_model.py.


Ngrok will provide a temporary public URL (my tunnel) that connects to your server running on GPU.

Visit the link, and you’ll get the text-to-image web interface.

Time to generate images!

Example:

  • Prompt: “Cat and dog” – With the lowest model

    • Very fast at 9.9s

    • GPU used effectively

Even with higher-level models, the process still runs smoothly.

  • Prompt: “City landscape” – Model level 6

Still stable and responsive


Conclusion

Through these experiments, we have learned how to use Latent Diffusion Models easily on a personal computer and optimize performance with Colab and Ngrok combined. This provides a smooth and fast user experience. Hopefully, this article will be helpful to readers.

Thank you for reading!

Exploring Claude Code Subagents: A Demo Setup for a RAG-Based Website Project

1. Introduction

Recently, Anthropic released an incredible new feature for its product Claude: subagents — secondary agents with specific tasks for different purposes within a user’s project.

2. Main Content

a. How to Set It Up:
First, install Claude using the following command in your Terminal window:

npm i @anthropic-ai/claude-code

If Claude is already installed but it’s an older version, it won’t have the subagent feature.

to update claude, command : claude update

Launch Claude Code in your working directory, then run the command:
/agents

Press Enter, and a management screen for agents will appear, allowing you to start creating agents with specific purposes for your project.

Here, I will set it up following Claude’s recommendation.

After the setup, I have the following subagents:

I will ask Claude to help me build a website using RAG with the following prompt:

The first subagents have started working.

The setup of the RAG project has been completed.

However, I noticed that the subagent ‘production-code-reviewer (Review RAG system code)’ didn’t function after the coding was completed. It might be an issue with my prompt, so I will ask Claude to review the code for me

After the whole working process, Claude Code will deliver an excellent final product.
Link: https://github.com/mhieupham1/claudecode-subagent

3. Conclusion

Through the entire setup process and practical use in a project, it’s clear how powerful and beneficial the Sub-agents feature introduced by Anthropic for Claude Code is. It enables us to have AI “teammates” with specialized skills and roles that operate independently without interfering with each other — allowing projects to be organized, easy to understand, and efficient.

Gemini CLI vs. Claude Code CLI: A Comprehensive Comparison for Developers

1. Introduction to the Launch of Gemini CLI

Recently, Google launched Gemini CLI – an open-source AI agent that can be directly integrated into the terminal for work. In previous articles about Claude Code CLI, we already saw its powerful features. Now, with the interesting arrival of Gemini CLI, users have even more options when choosing which agent to use. In this article, we’ll explore and compare the different criteria between Claude Code CLI and Gemini CLI to see which agent might best suit your needs.

2. Comparison Criteria Between the Two CLI Agents

a. Platform Support

  • Claude Code CLI: This tool has certain limitations when it comes to operating system support. It works well on MacOS and Ubuntu, but for Windows users, it requires extra steps such as installing an Ubuntu WSL virtual machine. Even then, there are still some restrictions and a less-than-ideal user experience.

  • Gemini CLI: Google’s new tool supports all operating systems, allowing users on any platform to set up and use it quickly and easily.

b. Open Source

  • Claude Code CLI: This is a closed-source tool, so its development is entirely controlled by Anthropic.

  • Gemini CLI: Google’s tool is open source, licensed under Apache 2.0, which enables the user community to access and collaborate on making the tool more robust and faster.

c. AI Model

  • Claude Code CLI: Utilizes powerful Anthropic models such as Claude Opus 4 and Claude Sonnet 3.7, both highly effective for coding tasks.

  • Gemini CLI: Gives access to Gemini 2.5 Pro and Gemini 2.5 Flash, each useful for different needs.

d. Context Limitations

  • Claude Code CLI: This is a paid tool. Users can access it through their Claude account with various tiers, each offering different token limits (from 250K to 1M tokens per model). Users can also use Claude’s API key to pay based on token usage.

  • Gemini CLI: Google’s tool provides a free version, which allows access to Gemini 2.5 Pro, but can quickly hit the limit and drop down to Gemini 2.5 Flash.

e. Community and Extensibility

  • Claude Code CLI: As a closed-source tool, only the developer (Anthropic) can improve and maintain it.

  • Gemini CLI: Being open source, it has a large and vibrant community contributing to its rapid improvement and greater capabilities.

3. Gemini CLI

  • Link: https://github.com/mhieupham1/Flashcard_GeminiCLI

  • Prompt Example:

    • Please make for me a website about using flashcard for learning English with HTML, CSS, Javascript, do the best for UI/UX

    • A flashcard set can archive many words, user can add more word to a new set or existed set

    • Function for folder that can add existed flashcard sets or remove it

    • Function for flashcard set that can edit transfer user to a web to practice in this flashcard set

    • Dashboard need to have more eye-catching, good layout

    • And many prompts to ask Gemini CLI to fix their own bugs

    • Make the web has layout, functions like an official website with better CSS, JS, HTML

  • Strengths:

    • Can handle large token requests and good at reading context

    • Cost: Free version can access Gemini 2.5 Pro, but may quickly hit limits and fall back to Gemini 2.5 Flash. Sometimes, after logging out and back in, it works normally again with Gemini 2.5 Flash. A pro account offers a one-month free trial, after which users can cancel or continue with the stated price.

  • Weaknesses:

    • Requires a very large number of tokens (1M tokens for pro, 11M for flash) to build the website (even when incomplete)

    • Prone to repeated error loops, wasting tokens

    • Codebase is still weak and doesn’t always fully understand user intentions or basic web concepts, so prompts need to be very detailed

4. Claude Code CLI

  • Link: https://github.com/mhieupham1/Flashcard_ClaudeCodeCLI

  • Prompt Example:

    • Please make for me a website about using flashcard for learning English with HTML, CSS, Javascript, do the best for UI/UX

    • A flashcard set can archive many words, user can add more word to a new set or existed set

    • Function for folder that can add existed flashcard sets or remove it

    • Function for flashcard set that can edit transfer user to a web to practice in this flashcard set

    • Dashboard need to have more eye-catching, good layout

  • Strengths:

    • Understands user ideas very well, outputs high-quality, efficient, and minimal code without missing features

    • Only required 30K tokens for the flashcard web demo

    • Good, user-friendly UI/UX

    • Produced the demo with a single request (using only a pro account, not the max tier)

  • Weaknesses:

    • Requires a paid account or API key (tokens = dollars), but the code quality is worth the price

5. Conclusion

With the comparison above, it’s clear that Gemini CLI is currently much stronger than Claude Code CLI. However, a deeper dive into their practical efficiency and benefits for different use cases is still needed.

a. Gemini CLI

  • Strengths:

    • Free to use with high token limits, suitable for large projects needing a large context window

    • Highly compatible across platforms and easy to set up

    • Open source, ensuring rapid improvement through community contributions

    • Fast code reading and generation

  • Weaknesses:

    • Can randomly hit usage limits, dropping from Gemini Pro 2.5 to Gemini Flash 2.5, reducing effectiveness

    • Prone to repeated errors/loops, which can be difficult to escape’

    • Codebase may not be as efficient, often needing very detailed prompts

b. Claude Code CLI:

  • Strengths:

    • High-quality, thoughtful, and efficient codebase generation

    • Highly suitable for commercial projects thanks to token optimization

  • Weaknesses:

    • Requires a paid account, with different tiers for different performance levels; top tier is expensive

    • Limited cross-platform compatibility, making it less accessible or offering a poorer experience for some users

6. Which Should You Use? Summary of Best Use Cases

When is Claude Code CLI most convenient?
Claude Code CLI is the better choice if you prioritize high-quality, efficient, and minimal code output, especially for commercial projects that require clean UI/UX and robust functionality. It is also ideal when you want to achieve your result in a single, well-phrased prompt. However, you need to be willing to pay for a subscription or API access, and set up the tool on a supported platform.

When is Gemini CLI more convenient?
Gemini CLI is perfect if you need a free, open-source tool that works across all major operating systems and is easy to install. It’s best for large projects that require handling a lot of data or context, and for those who want to benefit from fast community-driven improvements. Gemini CLI is especially suitable for personal, experimental, or learning projects, or when you need flexibility and cross-platform compatibility—even though it might sometimes require more detailed prompts or troubleshooting.

Combining tmux and Claude to Build an Automated AI Agent System (for Mac & Linux)

1. Introduction

With the rapid growth of AI, multi-agent systems are attracting more attention due to their ability to coordinate, split tasks, and handle complex automation. An “agent” can be an independent AI responsible for a specific role or task.

In this article, I’ll show you how to combine tmux (a powerful terminal multiplexer) with Claude (Anthropic’s AI model) to build a virtual organization. Here, AI agents can communicate, collaborate, and work together automatically via the terminal.

 

2. What is tmux?

tmux lets you split your terminal into multiple windows or sessions, each running its own process independently. Even if you disconnect, these sessions stay alive. This is super useful when you want to run several agents in parallel, each in their own terminal, without interfering with each other.

 

3. What is Claude?

Claude is an advanced language AI model developed by Anthropic. It can understand and respond to text requests, and it’s easy to integrate into automated systems—acting as a “virtual employee” taking on part of your workflow.

 

4. Why combine tmux and Claude?

Parallel & Distributed: Each agent is an independent Claude instance running in its own tmux session.

Workflow Automation: Easily simulate complex workflows between virtual departments or roles.

Easy Debug & Management: You can observe each agent’s logs in separate panes or sessions.

 

5. System Architecture

Let’s imagine a simple company structure:

PRESIDENT: Project Director (sets direction, gives instructions)

boss1: Team Leader (splits up tasks)

worker1, worker2, worker3: Team members (do the work)

Each agent has its own instruction file so it knows its role when starting up.

Agents communicate using a script:

./agent-send.sh [recipient] “[message]”

Workflow:

PRESIDENT → boss1 → workers → boss1 → PRESIDENT

 

6. Installation

Since the code is a bit long, I’ll just share the GitHub link to keep things short.

tmux:
Install guide: tmux Installing Guide

Claude:
Install guide: Claude Setup Guide

Git:
Install guide: Git Download

Clone the project:

bash
git clone https://github.com/mhieupham1/claudecliagent

 

Inside, you’ll find the main folders and files:

CLAUDE.md: Describes the agent architecture, communication, and workflows.

instructions/: Contains guidance for each role.

.claude/: JSON files to manage permissions for bash scripts.

setup.sh: Launches tmux sessions for PRESIDENT, boss1, worker1, worker2, worker3 so agents can talk to each other.

agent-send.sh: Script for sending messages between agents.

 

7. Deployment

Run the setup script:

bash
./setup.sh
This will create tmux sessions for PRESIDENT and the agents (boss1, worker1, worker2, worker3) in the background.

To access the PRESIDENT session:

bash
tmux attach-session -t president


To access the multiagent session:

bash
tmux attach-session -t multiagent


In the PRESIDENT session, run the claude command to set up the Claude CLI.

Do the same for the other agents.

Now, in the PRESIDENT window, try entering a request like:

you are president. create a todo list website now
PRESIDENT will start the to-do list. PRESIDENT will send instructions to boss1, boss1 will assign tasks to worker1, worker2, and worker3.

You can watch boss1 and the workers do their jobs, approve commands to create code files, and wait for them to finish.

Result:

8. Conclusion

Combining tmux and Claude lets you create a multi-agent AI system that simulates a real company: communicating, collaborating, and automating complex workflows. Having each agent in its own session makes it easy to manage, track progress, and debug.

This system is great for AI research, testing, or even real-world workflow automation, virtual team assistants, or teamwork simulations.

If you’re interested in developing multi-agent AI systems, try deploying this model, customize roles and workflows to your needs, and feel free to contribute or suggest improvements to the original repo!

A Step-by-Step Guide to Integrating and Using Claude Code Action on GitHub

Investigate how Claude Code Action is great. Just create an issue and put  a mention to Claude  like @claude, Claude can write the code automatically

Introduction

In the current era of rapidly evolving technology, artificial intelligence (AI) 

stands out as one of the most significant and transformative breakthroughs on a global scale. Among the various AI-driven tools, Claude — particularly the Claude Action Code — represents a powerful integration that can be embedded into user’s GitHub repositories to address raised issues with remarkable accuracy and efficiency. This paper aims to explore the capabilities and applications of Claude Action Code in modern software development workflows.

Body content

Claude Code Action is a extension categorized as a “Action” and made available on the GitHub Marketplace by Anthropic. Users can search for and utilize it by following the provided setup instructions outlined in the README documentation. Below is a summary of the basic setup steps for integrating Claude Code Action into user’s GitHub repository: 

1.Create a workflow folder:

On GitHub: In user’s GitHub repository, click “Add file”:

insert the configuration into the path:“.git/workflows/[file_name].yml”. For instance: 

Next, insert the appropriate workflow configuration for this extension, depending on your intended use:

For example: 

name: Claude PR Assistant

on:

  issue_comment:

    types: [created]

  pull_request_review_comment:

    types: [created]

  issues:

    types: [opened, assigned]

  pull_request_review:

    types: [submitted]

 

jobs:

  claude-code-action:

    if: |

      (github.event_name == ‘issue_comment’ && 

contains(github.event.comment.body, ‘@claude’)) ||

      (github.event_name == ‘pull_request_review_comment’ && contains(github.event.comment.body, ‘@claude’)) ||

      (github.event_name == ‘pull_request_review’ && 

contains(github.event.review.body, ‘@claude’)) ||

      (github.event_name == ‘issues’ && contains(github.event.issue.body, ‘@claude’))

    runs-on: ubuntu-latest

    permissions:

      contents: write

      pull-requests: read

      issues: read

      id-token: write

    steps:

      – name: Checkout repository

        uses: actions/checkout@v4

        with:

          fetch-depth: 1

 

      – name: Run Claude PR Action

        uses: anthropics/claude-code-action@beta

        with:

          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

          timeout_minutes: “60”

Then, click “Commit changes” to successfully add the configuration to your repository.

On the user’s local machine: If a folder in VScode has already  been connected to the GitHub repository, the user can manually create a workflow directory and a .yml file to store the Claude configuration. Then, file can be pushed to the GitHub repository

2.API key:

  • After that, the API key should be added to the repository’s Secrets under the Setting tab, rather than being hard-coded directly into workflow file to prevent unauthorized access

 

Find Action in Secret and variables

Create a new repository secret

Add your API key to Secret’s description

Name secret as key’s name in the workflow file

✅Correct

❌Never do it

3. Using Claude Code Action

User creates a new issue within repository where Claude is intended to be used: 

The user describes the issue to be resolved – such as feature creation, bug fixing, code review, …  – in the issue’s description. You can tag “@claude” directly in the description or in a comment after the issue is created, in order trigger Claude to process the request

Ex: Ask Claude to generate complete login and registration pages based on the initial files in the repo

Claude is invoked via API to address the issue described, with the response time depending on the complexity of the request. It uses the token associated with your API key to read the issue content as well as to create or modify code within the repository

Claude’s response will appear in the comments section of the issue.

Here, Claude generates additional files, for example register.html and dashboard.html, as part of the requested implementation and show what changes are made to each file — including which parts are added, modified, or deleted.

At this point, Claude has created a separate branch in the repository containing the proposed changes. The user can then review and consider merging these updates into the main branch via a pull request.

After successfully merging into the main branch

 

Following a successful merge, the issue may be closed. At this point, Claude has been effectively utilized to generate complete, functional demo pages for user login and registration.

 

4.Result:

Registration page

Login screen

Dashboard screen

In summary, Claude Code Action proves to be a highly effective tool for streamlining development tasks, making it easier for both individuals and teams to enhance productivity.