1. Introduction
In today’s rapidly advancing field of AI, the use of AI models — or more specifically, running them on personal computers — has become more common than ever.
However, some AI models have become increasingly difficult to use because the training data required for them is massive, often involving millions of parameters.
This makes it nearly impossible for low-end computers to use them effectively for work or projects.
Therefore, in this article, we will explore Google Colab together with Unsloth’s fine-tuning tool, combined with LoRA, to fine-tune and use gpt-oss-20b according to our own needs.
2. Main Content
a. What is Unsloth?
- Unsloth is a modern Python library designed to speed up and optimize the fine-tuning of large language models (LLMs) such as LLaMA, Mistral, Mixtral, and others.
It makes model training and fine-tuning extremely fast, memory-efficient, and easy — even on limited hardware like a single GPU or consumer-grade machines.
b. What is Colab?
- Colab is a hosted Jupyter Notebook service that requires no setup and provides free access to computing resources, including GPUs and TPUs.
It is particularly well-suited for machine learning, data science, and education purposes.
c. What is LoRA?
- Low-Rank Adaptation (LoRA) is a technique for quickly adapting machine learning models to new contexts.
LoRA helps make large and complex models more suitable for specific tasks. It works by adding lightweight layers to the original model rather than modifying the entire architecture.
This allows developers to quickly expand and specialize machine learning models for various applications.
3. Using Colab to Train gpt-oss-20b
– Installing the Libraries
!pip install --upgrade -qqq uv
try:
import numpy
install_numpy = f"numpy=={numpy.__version__}"
except:
install_numpy = "numpy"
!uv pip install -qqq \
"torch>=2.8.0" "triton>=3.4.0" {install_numpy} \
"unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
"unsloth[base] @ git+https://github.com/unslothai/unsloth" \
torchvision bitsandbytes \
git+https://github.com/huggingface/[email protected] \
git+https://github.com/triton-lang/triton.git@05b2c186c1b6c9a08375389d5efe9cb4c401c075#subdirectory=python/triton_kernels
– After completing the installation, load the gpt-oss-20b model from Unsloth:
from unsloth import FastLanguageModel
import torch
max_seq_length = 1024
dtype = None
model_name = "unsloth/gpt-oss-20b"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
dtype = dtype, # None for auto detection
max_seq_length = max_seq_length, # Choose any for long context!
load_in_4bit = True, # 4 bit quantization to reduce memory
full_finetuning = False, # [NEW!] We have full finetuning now!
# token = "hf_...", # use one if using gated models
)

– Adding LoRA for Fine-Tuning
model = FastLanguageModel.get_peft_model(
model,
r = 8, # Choose any number > 0! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha = 16,
lora_dropout = 0, # Optimized fast path
bias = "none", # Optimized fast path
# "unsloth" uses less VRAM, fits larger batches
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False,
loftq_config = None,
)
max_seq_length, set a smaller r, or increase gradient_accumulation_steps.– Testing the Model Before Fine-Tuning
Now, let’s test how the model responds before fine-tuning:
messages = [
{"role": "system", "content": "Bạn là Shark B, một nhà đầu tư nổi tiếng, thẳng thắn và thực tế", "thinking": None},
{"role": "user", "content": "Bạn hãy giới thiệu bản thân"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = "pt",
return_dict = True,
reasoning_effort = "low",
).to(model.device)
from transformers import TextStreamer
_ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))

– Load data for finetune model
Dataset sample

def formatting_prompts_func(examples):
convos = examples["messages"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
from datasets import load_dataset
dataset = load_dataset("json", data_files="data.jsonl", split="train")
dataset
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True)
– Train model
The following code snippet defines the configuration and setup for the fine-tuning process.
Here, we use SFTTrainer and SFTConfig from the trl library to perform Supervised Fine-Tuning (SFT) on our model.
The configuration specifies parameters such as batch size, learning rate, optimizer type, and number of training epochs.
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
args = SFTConfig(
per_device_train_batch_size = 1,
gradient_accumulation_steps = 4,
warmup_steps = 5,
num_train_epochs = 1, # Set this for 1 full training run.
# max_steps = 30,
learning_rate = 2e-4,
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc.
),
)
trainer_stats = trainer.train()
– After training, try the fine-tuned model
# Example reload (set to True to run)
if False:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "finetuned_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = 1024,
dtype = None,
load_in_4bit = True,
)
messages = [
{"role": "system", "content": "Bạn là Shark B, một nhà đầu tư nổi tiếng, thẳng thắn và thực tế", "thinking": None},
{"role": "user", "content": "Bạn hãy giới thiệu bản thân"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = "pt",
return_dict = True,
reasoning_effort = "low",
).to(model.device)
from transformers import TextStreamer
_ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))
finetuned_model with your actual model path (e.g., outputs or the directory you saved/merged adapters to).Colab notebook: Open your Colab here.
4. Conclusion & Next Steps
By combining Unsloth (for speed and memory efficiency), LoRA (for lightweight adaptation), and Google Colab (for accessible compute), you can fine-tune gpt-oss-20b even on modest hardware. The workflow above helps you:
- Install a reproducible environment with optimized kernels.
- Load
gpt-oss-20bin 4-bit to reduce VRAM usage. - Attach LoRA adapters to train only a small set of parameters.
- Prepare chat-style datasets and run supervised fine-tuning with TRL’s
SFTTrainer. - Evaluate before/after to confirm your improvements.
Clone the notebook, plug in your dataset, and fine-tune your own assistant in minutes.








































































