Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers
  • Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks
  • The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
AI-trends.todayAI-trends.today
Home»Tech»AutoResearch Framework: Hyperparameters Discovery, Experiment Tracking and the Building of an Autonomous Machine Learning Research Loop using Google Colab.

AutoResearch Framework: Hyperparameters Discovery, Experiment Tracking and the Building of an Autonomous Machine Learning Research Loop using Google Colab.

Tech By Gavin Wallace13/03/20265 Mins Read
Facebook Twitter LinkedIn Email
This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model
This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model
Share
Facebook Twitter LinkedIn Email

We will implement in this tutorial a Colab ready version of. AutoResearch framework originally proposed by Andrej Karpathy. We create an experimentation pipeline which clones AutoResearch, sets up a training environment and performs baseline experiments to determine initial performance metrics. Then, we create an automated loop which edits hyperparameters programmatically in train.py and runs training iterations. It then evaluates the model using the Validation Bits-Per-Byte metric. We demonstrate that by running the workflow on Google Colab we can replicate the fundamental idea behind autonomous machine-learning research, which is to iteratively modify training configurations and evaluate performance. The best configurations are then preserved without the need for specialized hardware.

import os, sys, subprocess, json, re, random, shutil, time
Import Path from pathlib


Define pip_install (pkg)
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])


Pkg for sale [
   "numpy","pandas","pyarrow","requests",
   "rustbpe","tiktoken","openai"
]:
   try:
       __import__(pkg)
   except:
       pip_install(pkg)


import pandas as pd


If not path("autoresearch").exists():
   subprocess.run(["git","clone","https://github.com/karpathy/autoresearch.git"])


os.chdir("autoresearch")


OPENAI_API_KEY=None
try:
 From Google.colab, import Userdata
   OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
except:
   OPENAI_API_KEY=os.environ.get("OPENAI_API_KEY")


OPENAI_API_KEY
   os.environ["OPENAI_API_KEY"]=OPENAI_API_KEY

The first step is to import the Python core libraries that are required for an automated research workflow. Installation of all required dependencies is performed, as well as cloning the autoresearch repository from GitHub. This ensures that the training framework has been included in the environment. If the OpenAI key is available, we configure the access, so that the system can support LLM assisted experimentation at a later stage.

prepare_path=Path("prepare.py")
train_path=Path("train.py")
program_path=Path("program.md")


prepare_text=prepare_path.read_text()
train_text=train_path.read_text()


prepare_text=re.sub(r"MAX_SEQ_LEN = d+","MAX_SEQ_LEN = 512",prepare_text)
prepare_text=re.sub(r"TIME_BUDGET = d+","TIME_BUDGET = 120",prepare_text)
prepare_text=re.sub(r"EVAL_TOKENS = .*","EVAL_TOKENS = 4 * 65536",prepare_text)


train_text=re.sub(r"DEPTH = d+","DEPTH = 4",train_text)
train_text=re.sub(r"DEVICE_BATCH_SIZE = d+","DEVICE_BATCH_SIZE = 16",train_text)
train_text=re.sub(r"TOTAL_BATCH_SIZE = .*","TOTAL_BATCH_SIZE = 2**17",train_text)
train_text=re.sub(r'WINDOW_PATTERN = "SSSL"','WINDOW_PATTERN = "L"',train_text)


prepare_path.write_text(prepare_text)
train_path.write_text(train_text)


program_path.write_text("""
Goal:
Run autonomous research loop on Google Colab.


Rules:
Train.py Hyperparameters can only be modified.


Metric:
It is best to have a lower value of val_bpb.
""")


subprocess.run(["python","prepare.py","--num-shards","4","--download-workers","2"])

The repository is modified to include key parameters that make it compatible with Google Colab hardware. The context length and training budget are reduced, as well as the evaluation token count, so that the experiments can run on limited GPU resources. Then, after we apply these patches to the code, the datasets shards are prepared for training. This allows the model’s experiments to immediately start.

subprocess.run("python train.py > baseline.log 2>&1",shell=True)


def parse_run_log(log_path):
   text=Path(log_path).read_text(errors="ignore")
   def find(p):
       m=re.search(p,text,re.MULTILINE)
 if M else none, return float(m.group(1)).Return
   return {
       "val_bpb":find(r"^val_bpb:s*([0-9.]+)"),
       "training_seconds":find(r"^training_seconds:s*([0-9.]+)"),
       "peak_vram_mb":find(r"^peak_vram_mb:s*([0-9.]+)"),
       "num_steps":find(r"^num_steps:s*([0-9.]+)")
   }


baseline=parse_run_log("baseline.log")


results_path=Path("results.tsv")


rows=[{
   "commit":"baseline",
   "val_bpb":baseline["val_bpb"] If you baseline["val_bpb"] 0
   "memory_gb":round((baseline["peak_vram_mb"] Or 0/1024 1,
   "status":"keep",
   "description":"baseline"
}]


pd.DataFrame(rows).to_csv(results_path,sep="t",index=False)


print("Baseline:",baseline)

The baseline run is executed to set up an initial reference performance for the model. A log-parsing feature is implemented to retrieve key metrics such as training time, GPU usage and training bits-per-byte. These baseline results are then stored in an experiment table that is structured so all subsequent experiments can be compared to this initial configuration.

TRAIN_FILE=Path("train.py")
BACKUP_FILE=Path("train.base.py")


If BACKUP_FILE.exists():
   shutil.copy2(TRAIN_FILE,BACKUP_FILE)


HP_KEYS=[
"WINDOW_PATTERN",
"TOTAL_BATCH_SIZE",
"EMBEDDING_LR",
"UNEMBEDDING_LR",
"MATRIX_LR",
"SCALAR_LR",
"WEIGHT_DECAY",
"ADAM_BETAS",
"WARMUP_RATIO",
"WARMDOWN_RATIO",
"FINAL_LR_FRAC",
"DEPTH",
"DEVICE_BATCH_SIZE"
]


Def read_text():
 Return Path (path).read_text()


def write_text(path,text):
   Path(path).write_text(text)


def extract_hparams(text):
   vals={}
 For k in HP_KEYS
       m=re.search(rf"^{k}s*=s*(.+?)$",text,re.MULTILINE)
 If m
 VALS[k]=m.group(1).strip()
 Return vals


def set_hparam(text,key,value):
 Return to re.sub."^{key}s*=.*$",f"{key} = {value}",text,flags=re.MULTILINE)


base_text=read_text(BACKUP_FILE)
base_hparams=extract_hparams(base_text)


SEARCH_SPACE={
"WINDOW_PATTERN":['"L"','"SSSL"'],
"TOTAL_BATCH_SIZE":["2**16","2**17","2**18"],
"EMBEDDING_LR":["0.2","0.4","0.6"],
"MATRIX_LR":["0.01","0.02","0.04"],
"SCALAR_LR":["0.3","0.5","0.7"],
"WEIGHT_DECAY":["0.05","0.1","0.2"],
"ADAM_BETAS":["(0.8,0.95)","(0.9,0.95)"],
"WARMUP_RATIO":["0.0","0.05","0.1"],
"WARMDOWN_RATIO":["0.3","0.5","0.7"],
"FINAL_LR_FRAC":["0.0","0.05"],
"DEPTH":["3","4","5","6"],
"DEVICE_BATCH_SIZE":["8","12","16","24"]
}


Def Sample_Candidate():
   keys=random.sample(list(SEARCH_SPACE.keys()),random.choice([2,3,4]))
   cand=dict(base_hparams)
   changes={}
   for k in keys:
 You can also read about it here[k]=random.choice(SEARCH_SPACE[k])
 Changes to the way you think[k]=cand[k]
   return cand,changes


def apply_hparams(candidate):
   text=read_text(BACKUP_FILE)
 For k,v candidate.items():
       text=set_hparam(text,k,v)
   write_text(TRAIN_FILE,text)


Def Run_Experiment(tag).
   log=f"{tag}.log"
   subprocess.run(f"python train.py > {log} 2>&1",shell=True)
 On the other hand,=parse_run_log(log)
   metrics["log"]=log
 Return Metrics

The core utilities are built to enable hyperparameter experiments that can be automated. We take the hyperparameters out of train.py and define the parameter space that is searchable. Finally, we implement functions which allow programmatic editing. In addition, we create mechanisms for generating candidate configurations. These are then applied to the training script. Finally, experiments with recorded outputs can be run.

N_EXPERIMENTS=3


df=pd.read_csv(results_path,sep="t")
best=df["val_bpb"].replace(0,999).min()


for i in range(N_EXPERIMENTS):


   tag=f"exp_{i+1}"


   candidate,changes=sample_candidate()


   apply_hparams(candidate)


   metrics=run_experiment(tag)


 If metrics["val_bpb"] The metric system["val_bpb"]

We then run an automated loop to evaluate and suggest new configurations of hyperparameters. Each experiment is a modification of the training script. After running the training, the resultant validation score is compared to the current best configuration. All experiment results are logged, improved configurations preserved, and the best script is exported along with all the history of the experiments for analysis.

As a conclusion, we created a completely automated research workflow which demonstrates that machines can explore different model configurations iteratively and improve performance in training with minimum manual intervention. We prepared the data, created a baseline, implemented a loop to propose new hyperparameter settings, run experiments and track results over multiple trials. We created an extensible and reproducible research process by maintaining logs of experiments and automatically preserving improvements. This closely resembles modern machine-learning experimentation workflow. The approach demonstrates how to combine automation, experimentation tracker, lightweight infrastructure, in order to enable scalable and rapid research within the cloud notebook environment.


Check it Out Full Codes here. Also, feel free to follow us on Twitter Join our Facebook group! 120k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.


ar autonomous Google learning mac machine learning met research search work x
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

19/04/2026

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

18/04/2026
Top News

Can AI Kill Venture Capitalists?

Trump signs executive order that threatens states with punishment for passing AI laws

Anthropic Plots Major London Expansion

Google AI Workers Fired Hundreds Amid Struggle Over Working Conditions

Jon M. Chu says AI couldn’t have made one of Wicked’s best moments

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

Nested Learning is a New Machine Learning Approach that views models as nested optimization problems to enhance long context processing.

08/11/2025

NVIDIA AI released DiffusionRenderer – An AI Model to Create Editable, Photorealistic Scenes in 3D from One Video

11/07/2025
Latest News

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

19/04/2026

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.