We will implement in this tutorial a Colab ready version of. AutoResearch framework originally proposed by Andrej Karpathy. We create an experimentation pipeline which clones AutoResearch, sets up a training environment and performs baseline experiments to determine initial performance metrics. Then, we create an automated loop which edits hyperparameters programmatically in train.py and runs training iterations. It then evaluates the model using the Validation Bits-Per-Byte metric. We demonstrate that by running the workflow on Google Colab we can replicate the fundamental idea behind autonomous machine-learning research, which is to iteratively modify training configurations and evaluate performance. The best configurations are then preserved without the need for specialized hardware.
import os, sys, subprocess, json, re, random, shutil, time
Import Path from pathlib
Define pip_install (pkg)
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])
Pkg for sale [
"numpy","pandas","pyarrow","requests",
"rustbpe","tiktoken","openai"
]:
try:
__import__(pkg)
except:
pip_install(pkg)
import pandas as pd
If not path("autoresearch").exists():
subprocess.run(["git","clone","https://github.com/karpathy/autoresearch.git"])
os.chdir("autoresearch")
OPENAI_API_KEY=None
try:
From Google.colab, import Userdata
OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
except:
OPENAI_API_KEY=os.environ.get("OPENAI_API_KEY")
OPENAI_API_KEY
os.environ["OPENAI_API_KEY"]=OPENAI_API_KEY
The first step is to import the Python core libraries that are required for an automated research workflow. Installation of all required dependencies is performed, as well as cloning the autoresearch repository from GitHub. This ensures that the training framework has been included in the environment. If the OpenAI key is available, we configure the access, so that the system can support LLM assisted experimentation at a later stage.
prepare_path=Path("prepare.py")
train_path=Path("train.py")
program_path=Path("program.md")
prepare_text=prepare_path.read_text()
train_text=train_path.read_text()
prepare_text=re.sub(r"MAX_SEQ_LEN = d+","MAX_SEQ_LEN = 512",prepare_text)
prepare_text=re.sub(r"TIME_BUDGET = d+","TIME_BUDGET = 120",prepare_text)
prepare_text=re.sub(r"EVAL_TOKENS = .*","EVAL_TOKENS = 4 * 65536",prepare_text)
train_text=re.sub(r"DEPTH = d+","DEPTH = 4",train_text)
train_text=re.sub(r"DEVICE_BATCH_SIZE = d+","DEVICE_BATCH_SIZE = 16",train_text)
train_text=re.sub(r"TOTAL_BATCH_SIZE = .*","TOTAL_BATCH_SIZE = 2**17",train_text)
train_text=re.sub(r'WINDOW_PATTERN = "SSSL"','WINDOW_PATTERN = "L"',train_text)
prepare_path.write_text(prepare_text)
train_path.write_text(train_text)
program_path.write_text("""
Goal:
Run autonomous research loop on Google Colab.
Rules:
Train.py Hyperparameters can only be modified.
Metric:
It is best to have a lower value of val_bpb.
""")
subprocess.run(["python","prepare.py","--num-shards","4","--download-workers","2"])
The repository is modified to include key parameters that make it compatible with Google Colab hardware. The context length and training budget are reduced, as well as the evaluation token count, so that the experiments can run on limited GPU resources. Then, after we apply these patches to the code, the datasets shards are prepared for training. This allows the model’s experiments to immediately start.
subprocess.run("python train.py > baseline.log 2>&1",shell=True)
def parse_run_log(log_path):
text=Path(log_path).read_text(errors="ignore")
def find(p):
m=re.search(p,text,re.MULTILINE)
if M else none, return float(m.group(1)).Return
return {
"val_bpb":find(r"^val_bpb:s*([0-9.]+)"),
"training_seconds":find(r"^training_seconds:s*([0-9.]+)"),
"peak_vram_mb":find(r"^peak_vram_mb:s*([0-9.]+)"),
"num_steps":find(r"^num_steps:s*([0-9.]+)")
}
baseline=parse_run_log("baseline.log")
results_path=Path("results.tsv")
rows=[{
"commit":"baseline",
"val_bpb":baseline["val_bpb"] If you baseline["val_bpb"] 0
"memory_gb":round((baseline["peak_vram_mb"] Or 0/1024 1,
"status":"keep",
"description":"baseline"
}]
pd.DataFrame(rows).to_csv(results_path,sep="t",index=False)
print("Baseline:",baseline)
The baseline run is executed to set up an initial reference performance for the model. A log-parsing feature is implemented to retrieve key metrics such as training time, GPU usage and training bits-per-byte. These baseline results are then stored in an experiment table that is structured so all subsequent experiments can be compared to this initial configuration.
TRAIN_FILE=Path("train.py")
BACKUP_FILE=Path("train.base.py")
If BACKUP_FILE.exists():
shutil.copy2(TRAIN_FILE,BACKUP_FILE)
HP_KEYS=[
"WINDOW_PATTERN",
"TOTAL_BATCH_SIZE",
"EMBEDDING_LR",
"UNEMBEDDING_LR",
"MATRIX_LR",
"SCALAR_LR",
"WEIGHT_DECAY",
"ADAM_BETAS",
"WARMUP_RATIO",
"WARMDOWN_RATIO",
"FINAL_LR_FRAC",
"DEPTH",
"DEVICE_BATCH_SIZE"
]
Def read_text():
Return Path (path).read_text()
def write_text(path,text):
Path(path).write_text(text)
def extract_hparams(text):
vals={}
For k in HP_KEYS
m=re.search(rf"^{k}s*=s*(.+?)$",text,re.MULTILINE)
If m
VALS[k]=m.group(1).strip()
Return vals
def set_hparam(text,key,value):
Return to re.sub."^{key}s*=.*$",f"{key} = {value}",text,flags=re.MULTILINE)
base_text=read_text(BACKUP_FILE)
base_hparams=extract_hparams(base_text)
SEARCH_SPACE={
"WINDOW_PATTERN":['"L"','"SSSL"'],
"TOTAL_BATCH_SIZE":["2**16","2**17","2**18"],
"EMBEDDING_LR":["0.2","0.4","0.6"],
"MATRIX_LR":["0.01","0.02","0.04"],
"SCALAR_LR":["0.3","0.5","0.7"],
"WEIGHT_DECAY":["0.05","0.1","0.2"],
"ADAM_BETAS":["(0.8,0.95)","(0.9,0.95)"],
"WARMUP_RATIO":["0.0","0.05","0.1"],
"WARMDOWN_RATIO":["0.3","0.5","0.7"],
"FINAL_LR_FRAC":["0.0","0.05"],
"DEPTH":["3","4","5","6"],
"DEVICE_BATCH_SIZE":["8","12","16","24"]
}
Def Sample_Candidate():
keys=random.sample(list(SEARCH_SPACE.keys()),random.choice([2,3,4]))
cand=dict(base_hparams)
changes={}
for k in keys:
You can also read about it here[k]=random.choice(SEARCH_SPACE[k])
Changes to the way you think[k]=cand[k]
return cand,changes
def apply_hparams(candidate):
text=read_text(BACKUP_FILE)
For k,v candidate.items():
text=set_hparam(text,k,v)
write_text(TRAIN_FILE,text)
Def Run_Experiment(tag).
log=f"{tag}.log"
subprocess.run(f"python train.py > {log} 2>&1",shell=True)
On the other hand,=parse_run_log(log)
metrics["log"]=log
Return Metrics
The core utilities are built to enable hyperparameter experiments that can be automated. We take the hyperparameters out of train.py and define the parameter space that is searchable. Finally, we implement functions which allow programmatic editing. In addition, we create mechanisms for generating candidate configurations. These are then applied to the training script. Finally, experiments with recorded outputs can be run.
N_EXPERIMENTS=3
df=pd.read_csv(results_path,sep="t")
best=df["val_bpb"].replace(0,999).min()
for i in range(N_EXPERIMENTS):
tag=f"exp_{i+1}"
candidate,changes=sample_candidate()
apply_hparams(candidate)
metrics=run_experiment(tag)
If metrics["val_bpb"] The metric system["val_bpb"]
We then run an automated loop to evaluate and suggest new configurations of hyperparameters. Each experiment is a modification of the training script. After running the training, the resultant validation score is compared to the current best configuration. All experiment results are logged, improved configurations preserved, and the best script is exported along with all the history of the experiments for analysis.
As a conclusion, we created a completely automated research workflow which demonstrates that machines can explore different model configurations iteratively and improve performance in training with minimum manual intervention. We prepared the data, created a baseline, implemented a loop to propose new hyperparameter settings, run experiments and track results over multiple trials. We created an extensible and reproducible research process by maintaining logs of experiments and automatically preserving improvements. This closely resembles modern machine-learning experimentation workflow. The approach demonstrates how to combine automation, experimentation tracker, lightweight infrastructure, in order to enable scalable and rapid research within the cloud notebook environment.
Check it Out Full Codes here. Also, feel free to follow us on Twitter Join our Facebook group! 120k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.

