We demonstrate in this tutorial how to fine-tune an extensive language model efficiently using Unsloth QLoRA. Our focus is on creating a stable end-to-end fine-tuning system that can handle common Colab problems such as GPU failures to detect, runtime crashes and incompatibilities with libraries. We demonstrate how, by controlling the training loop and model configuration with care, it is possible to train a highly-tuned instruction model while using limited resources.
import os, sys, subprocess, gc, locale
locale.getpreferredencoding = lambda: "UTF-8"
def run(cmd):
print("n$ " + cmd, flush=True)
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
For line in p.stdout
print(line, end="", flush=True)
rc = p.wait()
If rc > 0, you will get a message like this:
raise RuntimeError(f"Command failed ({rc}): {cmd}")
print("Installing packages (this may take 2–3 minutes)...", flush=True)
run("pip install -U pip")
run("pip uninstall -y torch torchvision torchaudio")
run(
"pip install --no-cache-dir "
"torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 "
"--index-url https://download.pytorch.org/whl/cu121"
)
run(
"pip install -U "
"transformers==4.45.2 "
"accelerate==0.34.2 "
"datasets==2.21.0 "
"trl==0.11.4 "
"sentencepiece safetensors evaluate"
)
run("pip install -U unsloth")
Buy a torch
try:
import unsloth
restarted = False
Except Exception
restarted = True
if restarted:
print("nRuntime needs restart. After restart, run this SAME cell again.", flush=True)
os._exit(0)
By reinstalling PyTorch, we create a compatible and controlled environment. Unsloth’s dependencies and Unsloth itself are matched to the CUDA-based runtime in Google Colab. Also, we handle runtime restart logic to ensure that the training environment is stable before beginning.
Import torch, gc
assert torch.cuda.is_available()
print("Torch:", torch.__version__)
print("GPU:", torch.cuda.get_device_name(0))
print("VRAM(GB):", round(torch.cuda.get_device_properties(0).total_memory / 1e9, 2))
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
Def Clean():
gc.collect()
torch.cuda.empty_cache()
import unsloth
FastLanguageModel for unslotted import
Import load_dataset from datasets
TextStreamer can be used to import transformers
Import SFTConfig and SFTTrainer from trl
PyTorch is configured for efficient computing after we verify GPU compatibility. Unsloth is imported before other training libraries in order to guarantee that the performance optimizations have been applied correctly. Also, we define utilities to manage GPU memory while training.
max_seq_length = 768
model_name = "unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=None,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model,
r=8,
target_modules=["q_proj","k_proj],
lora_alpha=16,
lora_dropout=0.0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=42,
max_seq_length=max_seq_length,
)
Unsloth’s utilities for fast loading allow us to quickly load an instruction-tuned, 4-bit-quantized model. We attach LoRA adapters onto the model for parameter-efficient fine tuning. The LoRA configuration is configured to achieve a balance between memory capacity and learning capability.
Load_dataset = ds"trl-lib/Capybara", split="train").shuffle(seed=42).select(range(1200))
Def to_text:
The following is an example of how to use["text"] = tokenizer.apply_chat_template(
The following is an example of how to use["messages"],
tokenize=False,
add_generation_prompt=False,
)
return example
Remove_columns =[c for c in ds.column_names if c != "messages"])
Remove columns from ds by using ds.["messages"])
split = ds.train_test_split(test_size=0.02, seed=42)
Train_ds, split = eval_ds["train"]The split["test"]
cfg=SFTConfig
output_dir="unsloth_sft_out",
dataset_text_field="text",
max_seq_length=max_seq_length,
packing=False,
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
max_steps=150,
learning_rate=2e-4,
warmup_ratio=0.03,
lr_scheduler_type="cosine",
logging_steps=10,
eval_strategy="no",
save_steps=0,
fp16=True,
optim="adamw_8bit",
report_to="none",
seed=42,
)
Trainer = SFTTrainer
model=model,
tokenizer=tokenizer,
train_dataset=train_ds,
eval_dataset=eval_ds,
args=cfg,
)
The training dataset is prepared by converting multiple-turn conversations to a text format that can be used for fine-tuning under supervision. To maintain the integrity of training, we split up the data. The training configuration controls batch size, the learning rate and the training duration.
You can clean your own teeth with ease()
trainer.train()
FastLanguageModel.for_inference(model)
Def chat(prompt; max_new_tokens=160).
The message = [{"role":"user","content":prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
With torch.inference_mode():
model.generate(
**inputs,
max_new_tokens=max_new_tokens,
temperature=0.7,
top_p=0.9,
do_sample=True,
streamer=streamer,
)
chat("Give a concise checklist for validating a machine learning model before deployment.")
save_dir = "unsloth_lora_adapters"
model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)
The training loop is executed and the GPU fine tuning process monitored. Switching the model into inference mode, we validate its performance using a test prompt. The trained LoRA adapters are saved so we can use or deploy them later.
In conclusion, we fine-tuned an instruction-following language model using Unsloth’s optimized training stack and a lightweight QLoRA setup. By constraining the sequence length, data size and training steps we were able to achieve stable GPU training without interruptions. We can use the resulting LoRA Adapters to deploy and extend this workflow.
Take a look at the Full Codes here. Also, feel free to follow us on Twitter Don’t forget about our 120k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.

