Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers
  • Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks
  • The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
AI-trends.todayAI-trends.today
Home»Tech»How to build a stable and efficient QLoRA fine-tuning pipeline using Unsloth with large language models

How to build a stable and efficient QLoRA fine-tuning pipeline using Unsloth with large language models

Tech By Gavin Wallace04/03/20264 Mins Read
Facebook Twitter LinkedIn Email
DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced
DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced
Share
Facebook Twitter LinkedIn Email

We demonstrate in this tutorial how to fine-tune an extensive language model efficiently using Unsloth QLoRA. Our focus is on creating a stable end-to-end fine-tuning system that can handle common Colab problems such as GPU failures to detect, runtime crashes and incompatibilities with libraries. We demonstrate how, by controlling the training loop and model configuration with care, it is possible to train a highly-tuned instruction model while using limited resources.

import os, sys, subprocess, gc, locale


locale.getpreferredencoding = lambda: "UTF-8"


def run(cmd):
   print("n$ " + cmd, flush=True)
   p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
 For line in p.stdout
       print(line, end="", flush=True)
   rc = p.wait()
 If rc > 0, you will get a message like this:
       raise RuntimeError(f"Command failed ({rc}): {cmd}")


print("Installing packages (this may take 2–3 minutes)...", flush=True)


run("pip install -U pip")
run("pip uninstall -y torch torchvision torchaudio")
run(
   "pip install --no-cache-dir "
   "torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 "
   "--index-url https://download.pytorch.org/whl/cu121"
)
run(
   "pip install -U "
   "transformers==4.45.2 "
   "accelerate==0.34.2 "
   "datasets==2.21.0 "
   "trl==0.11.4 "
   "sentencepiece safetensors evaluate"
)
run("pip install -U unsloth")


Buy a torch
try:
   import unsloth
   restarted = False
Except Exception
   restarted = True


if restarted:
   print("nRuntime needs restart. After restart, run this SAME cell again.", flush=True)
   os._exit(0)

By reinstalling PyTorch, we create a compatible and controlled environment. Unsloth’s dependencies and Unsloth itself are matched to the CUDA-based runtime in Google Colab. Also, we handle runtime restart logic to ensure that the training environment is stable before beginning.

Import torch, gc


assert torch.cuda.is_available()
print("Torch:", torch.__version__)
print("GPU:", torch.cuda.get_device_name(0))
print("VRAM(GB):", round(torch.cuda.get_device_properties(0).total_memory / 1e9, 2))


torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True


Def Clean():
   gc.collect()
   torch.cuda.empty_cache()


import unsloth
FastLanguageModel for unslotted import
Import load_dataset from datasets
TextStreamer can be used to import transformers
Import SFTConfig and SFTTrainer from trl

PyTorch is configured for efficient computing after we verify GPU compatibility. Unsloth is imported before other training libraries in order to guarantee that the performance optimizations have been applied correctly. Also, we define utilities to manage GPU memory while training.

max_seq_length = 768
model_name = "unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit"


model, tokenizer = FastLanguageModel.from_pretrained(
   model_name=model_name,
   max_seq_length=max_seq_length,
   dtype=None,
   load_in_4bit=True,
)


model = FastLanguageModel.get_peft_model(
   model,
   r=8,
   target_modules=["q_proj","k_proj],
   lora_alpha=16,
   lora_dropout=0.0,
   bias="none",
   use_gradient_checkpointing="unsloth",
   random_state=42,
   max_seq_length=max_seq_length,
)

Unsloth’s utilities for fast loading allow us to quickly load an instruction-tuned, 4-bit-quantized model. We attach LoRA adapters onto the model for parameter-efficient fine tuning. The LoRA configuration is configured to achieve a balance between memory capacity and learning capability.

Load_dataset = ds"trl-lib/Capybara", split="train").shuffle(seed=42).select(range(1200))


Def to_text:
 The following is an example of how to use["text"] = tokenizer.apply_chat_template(
 The following is an example of how to use["messages"],
       tokenize=False,
       add_generation_prompt=False,
   )
   return example


Remove_columns =[c for c in ds.column_names if c != "messages"])
Remove columns from ds by using ds.["messages"])
split = ds.train_test_split(test_size=0.02, seed=42)
Train_ds, split = eval_ds["train"]The split["test"]


cfg=SFTConfig
   output_dir="unsloth_sft_out",
   dataset_text_field="text",
   max_seq_length=max_seq_length,
   packing=False,
   per_device_train_batch_size=1,
   gradient_accumulation_steps=8,
   max_steps=150,
   learning_rate=2e-4,
   warmup_ratio=0.03,
   lr_scheduler_type="cosine",
   logging_steps=10,
   eval_strategy="no",
   save_steps=0,
   fp16=True,
   optim="adamw_8bit",
   report_to="none",
   seed=42,
)


Trainer = SFTTrainer
   model=model,
   tokenizer=tokenizer,
   train_dataset=train_ds,
   eval_dataset=eval_ds,
   args=cfg,
)

The training dataset is prepared by converting multiple-turn conversations to a text format that can be used for fine-tuning under supervision. To maintain the integrity of training, we split up the data. The training configuration controls batch size, the learning rate and the training duration.

You can clean your own teeth with ease()
trainer.train()


FastLanguageModel.for_inference(model)


Def chat(prompt; max_new_tokens=160).
 The message = [{"role":"user","content":prompt}]
   text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
   inputs = tokenizer([text], return_tensors="pt").to("cuda")
   streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
 With torch.inference_mode():
       model.generate(
           **inputs,
           max_new_tokens=max_new_tokens,
           temperature=0.7,
           top_p=0.9,
           do_sample=True,
           streamer=streamer,
       )


chat("Give a concise checklist for validating a machine learning model before deployment.")


save_dir = "unsloth_lora_adapters"
model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)

The training loop is executed and the GPU fine tuning process monitored. Switching the model into inference mode, we validate its performance using a test prompt. The trained LoRA adapters are saved so we can use or deploy them later.

In conclusion, we fine-tuned an instruction-following language model using Unsloth’s optimized training stack and a lightweight QLoRA setup. By constraining the sequence length, data size and training steps we were able to achieve stable GPU training without interruptions. We can use the resulting LoRA Adapters to deploy and extend this workflow.


Take a look at the Full Codes here. Also, feel free to follow us on Twitter Don’t forget about our 120k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.


ar large language model models
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

19/04/2026

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

18/04/2026
Top News

Apple’s Most Overlooked App Has Just gotten a Whole Lot Better

Internet Archive, the most popular tool for archiving data on the internet is at Risk

The US Army has built its own chatbot for Combat

Hacking the EU’s new age-verification app takes only 2 minutes

Chinese chatbots censor themselves

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

Palantir is being used to help ICE sort through the tips

28/01/2026

Google Maps is now chatty thanks to a Gemini interface

12/03/2026
Latest News

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

19/04/2026

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.