On this tutorial, we construct a multi-agent workflow for organic programs modeling and discover how totally different computational elements work collectively inside one unified programs biology pipeline. We generate artificial organic knowledge, analyze gene regulatory construction, predict protein-protein interactions, optimize metabolic pathway exercise, and simulate a dynamic cell signaling cascade, all inside a Colab setting that continues to be sensible and reproducible. We additionally use an OpenAI mannequin to behave as a principal investigator, synthesizing the outputs of all specialised brokers right into a single expert-style organic interpretation that connects regulation, interplay networks, metabolism, and signaling right into a broader scientific story.
import sys, subprocess, pkgutil
def _install_if_missing(packages):
lacking = []
for p in packages:
import_name = p["import"]
if pkgutil.find_loader(import_name) is None:
lacking.append(p["pip"])
if lacking:
print("Installing:", ", ".be a part of(lacking))
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + lacking)
_install_if_missing([
{"pip": "openai", "import": "openai"},
{"pip": "numpy", "import": "numpy"},
{"pip": "pandas", "import": "pandas"},
{"pip": "matplotlib", "import": "matplotlib"},
{"pip": "networkx", "import": "networkx"},
{"pip": "scikit-learn", "import": "sklearn"},
])
import os
import json
import math
import textwrap
import random
import getpass
from dataclasses import dataclass
from typing import Dict, Listing, Tuple, Any
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, average_precision_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from openai import OpenAI
np.random.seed(42)
random.seed(42)
OPENAI_API_KEY = None
strive:
from google.colab import userdata
OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
if OPENAI_API_KEY:
print("Loaded OPENAI_API_KEY from Colab Secrets.")
besides Exception:
go
if not OPENAI_API_KEY:
strive:
OPENAI_API_KEY = getpass.getpass("Enter OPENAI_API_KEY (hidden input): ").strip()
besides Exception:
OPENAI_API_KEY = enter("Enter OPENAI_API_KEY: ").strip()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
shopper = OpenAI(api_key=OPENAI_API_KEY)
OPENAI_MODEL = "gpt-4o-mini"
We put together the Colab setting and ensure all required libraries can be found earlier than the workflow begins. We import the scientific computing, machine studying, graph evaluation, plotting, and OpenAI libraries that assist the total organic programs pipeline from begin to end. We additionally securely load the OpenAI API key both from Colab Secrets and techniques or hidden enter, initialize the shopper, and outline the mannequin so the pocket book is prepared for later LLM-based synthesis.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def fairly(title: str, physique: str, width: int = 100):
print("n" + "=" * width)
print(title)
print("=" * width)
print(physique)
def safe_float(x):
strive:
return float(x)
besides Exception:
return None
def generate_gene_regulatory_network(n_genes: int = 14, edge_prob: float = 0.18):
genes = [f"G{i+1}" for i in range(n_genes)]
W = np.zeros((n_genes, n_genes))
for i in vary(n_genes):
for j in vary(n_genes):
if i != j and np.random.rand() < edge_prob:
W[i, j] = np.random.uniform(-1.5, 1.5)
return genes, W
def simulate_gene_expression(W: np.ndarray, n_steps: int = 70, noise: float = 0.10):
n = W.form[0]
X = np.zeros((n_steps, n))
X[0] = np.random.uniform(0.2, 0.8, dimension=n)
for t in vary(1, n_steps):
sign = X[t-1] @ W
X[t] = sigmoid(sign + np.random.regular(0, noise, dimension=n))
return X
def generate_protein_features(n_proteins: int = 40, feature_dim: int = 10):
proteins = [f"P{i+1}" for i in range(n_proteins)]
options = np.random.regular(dimension=(n_proteins, feature_dim))
households = np.random.randint(0, 5, dimension=n_proteins)
localization = np.random.randint(0, 4, dimension=n_proteins)
return proteins, options, households, localization
def generate_ppi_dataset(proteins, options, households, localization):
rows = []
n = len(proteins)
hidden_w = np.random.regular(dimension=options.form[1])
for i in vary(n):
for j in vary(i + 1, n):
fi, fj = options[i], options[j]
sim = np.dot(fi, fj) / (np.linalg.norm(fi) * np.linalg.norm(fj) + 1e-8)
fam_same = 1 if households[i] == households[j] else 0
loc_same = 1 if localization[i] == localization[j] else 0
feat = np.concatenate([
np.abs(fi - fj),
fi * fj,
[sim, fam_same, loc_same]
])
rating = 1.4 * sim + 1.0 * fam_same + 0.8 * loc_same + 0.15 * np.dot((fi + fj) / 2, hidden_w)
prob = sigmoid(rating)
y = 1 if np.random.rand() < prob else 0
rows.append((proteins[i], proteins[j], feat, y))
return rows
def generate_metabolic_network():
metabolites = ["Glucose", "Pyruvate", "AcetylCoA", "ATP", "Biomass", "Lactate", "Ethanol"]
reactions = [
{"name": "R1_Glucose_Uptake", "yield_biomass": 0.0, "yield_atp": 0.3, "substrate_cost": 1.0, "oxygen_need": 0.0},
{"name": "R2_Glycolysis", "yield_biomass": 0.2, "yield_atp": 1.6, "substrate_cost": 0.7, "oxygen_need": 0.0},
{"name": "R3_TCA", "yield_biomass": 1.0, "yield_atp": 2.4, "substrate_cost": 0.8, "oxygen_need": 1.4},
{"name": "R4_Fermentation", "yield_biomass": 0.1, "yield_atp": 0.9, "substrate_cost": 0.4, "oxygen_need": 0.0},
{"name": "R5_Ethanol_Path", "yield_biomass": 0.15,"yield_atp": 0.8, "substrate_cost": 0.5, "oxygen_need": 0.0},
{"name": "R6_Biomass_Assembly","yield_biomass": 1.3, "yield_atp": -0.9,"substrate_cost": 0.6, "oxygen_need": 0.2},
]
return metabolites, reactions
def simulate_cell_signaling(T=200, dt=0.05, ligand_level=1.2):
t = np.arange(0, T * dt, dt)
ligand = np.ones_like(t) * ligand_level
receptor = np.zeros_like(t)
kinase = np.zeros_like(t)
tf = np.zeros_like(t)
phosphatase = np.zeros_like(t)
receptor[0] = 0.05
kinase[0] = 0.02
tf[0] = 0.01
phosphatase[0] = 0.30
for i in vary(1, len(t)):
dR = 1.6 * ligand[i-1] * (1 - receptor[i-1]) - 0.9 * receptor[i-1]
dK = 1.8 * receptor[i-1] * (1 - kinase[i-1]) - 1.1 * phosphatase[i-1] * kinase[i-1]
dTF = 1.4 * kinase[i-1] * (1 - tf[i-1]) - 0.55 * tf[i-1]
dP = 0.2 + 0.5 * tf[i-1] - 0.4 * phosphatase[i-1]
receptor[i] = np.clip(receptor[i-1] + dt * dR, 0, 1)
kinase[i] = np.clip(kinase[i-1] + dt * dK, 0, 1)
tf[i] = np.clip(tf[i-1] + dt * dTF, 0, 1)
phosphatase[i] = np.clip(phosphatase[i-1] + dt * dP, 0, 1.5)
return pd.DataFrame({
"time": t,
"ligand": ligand,
"receptor_active": receptor,
"kinase_active": kinase,
"tf_active": tf,
"phosphatase": phosphatase,
})
We outline the primary helper utilities and all artificial knowledge technology capabilities that energy the pocket book’s organic duties. We create capabilities for gene regulatory community building, gene expression simulation, protein characteristic technology, protein interplay dataset creation, metabolic community setup, and cell signaling dynamics, which collectively present 4 distinct organic views for evaluation. This snippet kinds the computational spine of the tutorial by creating the structured inputs that every specialised agent will later course of and interpret.
@dataclass
class AgentResult:
identify: str
abstract: Dict[str, Any]
class GeneRegulatoryNetworkAgent:
def run(self, genes, W, X) -> AgentResult:
corr = np.corrcoef(X.T)
inferred_edges = []
true_edges = []
n = len(genes)
for i in vary(n):
for j in vary(n):
if i == j:
proceed
if abs(corr[i, j]) > 0.35:
inferred_edges.append((genes[i], genes[j], float(corr[i, j])))
if abs(W[i, j]) > 1e-8:
true_edges.append((genes[i], genes[j], float(W[i, j])))
centrality_graph = nx.DiGraph()
for gi in genes:
centrality_graph.add_node(gi)
for i in vary(n):
for j in vary(n):
if abs(W[i, j]) > 1e-8:
centrality_graph.add_edge(genes[i], genes[j], weight=float(W[i, j]))
out_deg = dict(centrality_graph.out_degree())
in_deg = dict(centrality_graph.in_degree())
hubs = sorted(out_deg.objects(), key=lambda x: x[1], reverse=True)[:5]
sinks = sorted(in_deg.objects(), key=lambda x: x[1], reverse=True)[:5]
dynamic_var = X.var(axis=0)
most_dynamic = sorted(zip(genes, dynamic_var), key=lambda x: x[1], reverse=True)[:5]
abstract = {
"num_genes": n,
"num_true_regulatory_edges": len(true_edges),
"num_inferred_associations": len(inferred_edges),
"top_hub_genes": [{"gene": g, "out_degree": int(d)} for g, d in hubs],
"top_sink_genes": [{"gene": g, "in_degree": int(d)} for g, d in sinks],
"most_dynamic_genes": [{"gene": g, "variance": round(float(v), 4)} for g, v in most_dynamic],
"sample_inferred_edges": [
{"source": a, "target": b, "association": round(c, 3)}
for a, b, c in inferred_edges[:10]
],
"expression_tail_mean": spherical(float(X[-10:].imply()), 4),
}
return AgentResult(identify="GeneRegulatoryNetworkAgent", abstract=abstract)
We outline the shared outcome container and the gene regulatory community agent that analyzes regulatory construction and expression habits. We use correlation-based affiliation inference, true-edge extraction, degree-based graph evaluation, and variance-based rating to determine hub, sink, and extremely dynamic genes throughout simulated expression trajectories. This provides us a network-level image of how regulatory affect could also be distributed within the system and helps us determine vital candidate regulators for downstream interpretation.
class ProteinInteractionPredictionAgent:
def run(self, ppi_rows) -> AgentResult:
X = np.vstack([r[2] for r in ppi_rows])
y = np.array([r[3] for r in ppi_rows])
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.rework(X_test)
clf = LogisticRegression(max_iter=1000)
clf.match(X_train_s, y_train)
probs = clf.predict_proba(X_test_s)[:, 1]
auc = roc_auc_score(y_test, probs)
ap = average_precision_score(y_test, probs)
scored_pairs = []
Xt_full = scaler.rework(X)
full_probs = clf.predict_proba(Xt_full)[:, 1]
for (p1, p2, _, label), pr in zip(ppi_rows, full_probs):
scored_pairs.append((p1, p2, float(pr), int(label)))
top_candidates = sorted(scored_pairs, key=lambda x: x[2], reverse=True)[:10]
positive_rate = float(y.imply())
abstract = {
"num_pairs": int(len(ppi_rows)),
"positive_interaction_rate": spherical(positive_rate, 4),
"test_roc_auc": spherical(float(auc), 4),
"test_average_precision": spherical(float(ap), 4),
"top_predicted_interactions": [
{"protein_a": a, "protein_b": b, "pred_prob": round(pr, 4), "label": lab}
for a, b, pr, lab in top_candidates
],
}
return AgentResult(identify="ProteinInteractionPredictionAgent", abstract=abstract)
class MetabolicOptimizationAgent:
def run(self, reactions, oxygen_budget=3.5, substrate_budget=4.0):
best_score = -1e9
best_flux = None
hint = []
for _ in vary(8000):
flux = np.random.dirichlet(np.ones(len(reactions))) * np.random.uniform(1.5, 5.0)
oxygen = sum(f["oxygen_need"] * v for f, v in zip(reactions, flux))
substrate = sum(f["substrate_cost"] * v for f, v in zip(reactions, flux))
atp = sum(f["yield_atp"] * v for f, v in zip(reactions, flux))
biomass = sum(f["yield_biomass"] * v for f, v in zip(reactions, flux))
penalty = 0.0
if oxygen > oxygen_budget:
penalty += 6.0 * (oxygen - oxygen_budget)
if substrate > substrate_budget:
penalty += 6.0 * (substrate - substrate_budget)
rating = 2.2 * biomass + 0.6 * atp - penalty
hint.append(rating)
if rating > best_score:
best_score = rating
best_flux = {
"oxygen": oxygen,
"substrate": substrate,
"atp": atp,
"biomass": biomass,
"fluxes": {reactions[i]["name"]: float(flux[i]) for i in vary(len(reactions))}
}
ranked_fluxes = sorted(best_flux["fluxes"].objects(), key=lambda x: x[1], reverse=True)
abstract = {
"oxygen_budget": oxygen_budget,
"substrate_budget": substrate_budget,
"best_objective_score": spherical(float(best_score), 4),
"best_biomass": spherical(float(best_flux["biomass"]), 4),
"best_atp": spherical(float(best_flux["atp"]), 4),
"oxygen_used": spherical(float(best_flux["oxygen"]), 4),
"substrate_used": spherical(float(best_flux["substrate"]), 4),
"dominant_reactions": [
{"reaction": name, "flux": round(val, 4)} for name, val in ranked_fluxes[:6]
],
}
return AgentResult(identify="MetabolicOptimizationAgent", abstract=abstract), hint
We outline the protein interplay prediction agent and the metabolic optimization agent, which collectively increase the evaluation past regulation into interplay biology and pathway allocation. We prepare a logistic regression classifier on artificial pairwise protein options to estimate interplay chances, consider predictive efficiency, and rank the strongest candidate protein pairs. We additionally run a randomized flux search beneath oxygen and substrate constraints to determine metabolically favorable response allocations, permitting us to check how the system balances biomass progress, ATP manufacturing, and useful resource limitations.
class CellSignalingSimulationAgent:
def run(self, df_signal: pd.DataFrame) -> AgentResult:
peak_receptor = float(df_signal["receptor_active"].max())
peak_kinase = float(df_signal["kinase_active"].max())
peak_tf = float(df_signal["tf_active"].max())
t_receptor = float(df_signal.loc[df_signal["receptor_active"].idxmax(), "time"])
t_kinase = float(df_signal.loc[df_signal["kinase_active"].idxmax(), "time"])
t_tf = float(df_signal.loc[df_signal["tf_active"].idxmax(), "time"])
final_state = df_signal.iloc[-1].to_dict()
abstract = {
"peak_receptor_activity": spherical(peak_receptor, 4),
"peak_kinase_activity": spherical(peak_kinase, 4),
"peak_tf_activity": spherical(peak_tf, 4),
"time_to_peak_receptor": spherical(t_receptor, 4),
"time_to_peak_kinase": spherical(t_kinase, 4),
"time_to_peak_tf": spherical(t_tf, 4),
"final_state": {ok: spherical(float(v), 4) for ok, v in final_state.objects()},
}
return AgentResult(identify="CellSignalingSimulationAgent", abstract=abstract)
class PrincipalInvestigatorAgent:
def __init__(self, shopper, mannequin=OPENAI_MODEL):
self.shopper = shopper
self.mannequin = mannequin
def synthesize(self, outcomes: Listing[AgentResult]) -> str:
payload = {r.identify: r.abstract for r in outcomes}
immediate = f"""
You're a principal investigator in computational programs biology.
Given the outputs of 4 specialised AI brokers:
1. gene regulatory community evaluation
2. protein interplay prediction
3. metabolic pathway optimization
4. cell signaling simulation
Write a rigorous however readable report with these sections:
- Government Abstract
- Key Findings by Agent
- Cross-System Organic Interpretation
- Hypotheses Price Testing in Moist Lab
- Mannequin Limitations
- Subsequent Computational Extensions
Use concise scientific language.
Don't fabricate datasets past what's proven.
When helpful, join regulation, signaling, metabolism, and protein interactions right into a single programs biology story.
Agent outputs:
{json.dumps(payload, indent=2)}
"""
strive:
resp = self.shopper.chat.completions.create(
mannequin=self.mannequin,
messages=[
{"role": "user", "content": prompt},
],
temperature=0.4,
)
return resp.decisions[0].message.content material
besides Exception as e:
return f"OpenAI synthesis failed: {e}"
genes, W = generate_gene_regulatory_network(n_genes=14, edge_prob=0.20)
X_expr = simulate_gene_expression(W, n_steps=80, noise=0.08)
grn_agent = GeneRegulatoryNetworkAgent()
grn_result = grn_agent.run(genes, W, X_expr)
proteins, prot_features, prot_families, prot_localization = generate_protein_features(n_proteins=40, feature_dim=10)
ppi_rows = generate_ppi_dataset(proteins, prot_features, prot_families, prot_localization)
ppi_agent = ProteinInteractionPredictionAgent()
ppi_result = ppi_agent.run(ppi_rows)
metabolites, reactions = generate_metabolic_network()
met_agent = MetabolicOptimizationAgent()
met_result, met_trace = met_agent.run(reactions, oxygen_budget=3.5, substrate_budget=4.2)
df_signal = simulate_cell_signaling(T=220, dt=0.05, ligand_level=1.2)
sig_agent = CellSignalingSimulationAgent()
sig_result = sig_agent.run(df_signal)
all_results = [grn_result, ppi_result, met_result, sig_result]
for r in all_results:
fairly(r.identify, json.dumps(r.abstract, indent=2))
fig = plt.determine(figsize=(18, 14))
ax1 = plt.subplot(2, 2, 1)
im = ax1.imshow(W, cmap="coolwarm", side="auto")
ax1.set_title("Gene Regulatory Weight Matrix")
ax1.set_xticks(vary(len(genes)))
ax1.set_yticks(vary(len(genes)))
ax1.set_xticklabels(genes, rotation=90)
ax1.set_yticklabels(genes)
plt.colorbar(im, ax=ax1, fraction=0.046, pad=0.04)
ax2 = plt.subplot(2, 2, 2)
for i in vary(min(6, X_expr.form[1])):
ax2.plot(X_expr[:, i], label=genes[i])
ax2.set_title("Sample Gene Expression Dynamics")
ax2.set_xlabel("Time step")
ax2.set_ylabel("Expression")
ax2.legend(loc="upper right", fontsize=8)
ax3 = plt.subplot(2, 2, 3)
ax3.plot(df_signal["time"], df_signal["receptor_active"], label="Receptor")
ax3.plot(df_signal["time"], df_signal["kinase_active"], label="Kinase")
ax3.plot(df_signal["time"], df_signal["tf_active"], label="Transcription Factor")
ax3.plot(df_signal["time"], df_signal["phosphatase"], label="Phosphatase")
ax3.set_title("Cell Signaling Simulation")
ax3.set_xlabel("Time")
ax3.set_ylabel("Activity")
ax3.legend()
ax4 = plt.subplot(2, 2, 4)
ax4.plot(met_trace)
ax4.set_title("Metabolic Search Objective Trace")
ax4.set_xlabel("Iteration")
ax4.set_ylabel("Objective score")
plt.tight_layout()
plt.present()
G_grn = nx.DiGraph()
for g in genes:
G_grn.add_node(g)
for i in vary(len(genes)):
for j in vary(len(genes)):
if abs(W[i, j]) > 0.4:
G_grn.add_edge(genes[i], genes[j], weight=W[i, j])
plt.determine(figsize=(10, 8))
pos = nx.spring_layout(G_grn, seed=42)
edge_colors = ["green" if G_grn[u][v]["weight"] > 0 else "red" for u, v in G_grn.edges()]
nx.draw_networkx(G_grn, pos, with_labels=True, node_size=900, font_size=9, arrows=True, edge_color=edge_colors)
plt.title("Gene Regulatory Network Graph (green=activation, red=repression)")
plt.axis("off")
plt.present()
top_ppi = ppi_result.abstract["top_predicted_interactions"][:12]
G_ppi = nx.Graph()
for row in top_ppi:
a, b, p = row["protein_a"], row["protein_b"], row["pred_prob"]
G_ppi.add_edge(a, b, weight=p)
plt.determine(figsize=(10, 8))
pos = nx.spring_layout(G_ppi, seed=7)
widths = [2 + 4 * G_ppi[u][v]["weight"] for u, v in G_ppi.edges()]
nx.draw_networkx(G_ppi, pos, with_labels=True, node_size=1000, font_size=9, width=widths)
plt.title("Top Predicted Protein Interaction Subnetwork")
plt.axis("off")
plt.present()
grn_table = pd.DataFrame(grn_result.abstract["most_dynamic_genes"])
ppi_table = pd.DataFrame(ppi_result.abstract["top_predicted_interactions"])
met_table = pd.DataFrame(met_result.abstract["dominant_reactions"])
sig_table = pd.DataFrame([sig_result.summary])
fairly("Most Dynamic Genes", grn_table.to_string(index=False))
fairly("Top Predicted PPIs", ppi_table.to_string(index=False))
fairly("Dominant Metabolic Reactions", met_table.to_string(index=False))
pi_agent = PrincipalInvestigatorAgent(shopper=shopper, mannequin=OPENAI_MODEL)
final_report = pi_agent.synthesize(all_results)
fairly("OPENAI SYSTEMS BIOLOGY REPORT", final_report)
artifact = {
"grn": grn_result.abstract,
"ppi": ppi_result.abstract,
"metabolic": met_result.abstract,
"signaling": sig_result.abstract,
"llm_report": final_report,
}
with open("bio_agents_tutorial_results.json", "w") as f:
json.dump(artifact, f, indent=2)
print("nSaved results to: bio_agents_tutorial_results.json")
We outline the cell signaling simulation agent and the principal investigator agent, after which execute the whole end-to-end workflow. We run all 4 organic modules, print structured outputs, generate plots and community visualizations, construct tidy abstract tables, and eventually use the OpenAI mannequin to jot down an expert-style report that integrates the findings throughout all subsystems. We carry all the pieces collectively into an entire pipeline for organic programs modeling. It exhibits how multi-agent AI can assist scientific interpretation, visualization, and speculation technology.
In conclusion, we created an entire computational biology workflow that demonstrates how agent-based AI can be utilized to check a number of layers of organic group in a structured and interpretable manner. We moved from knowledge technology to modeling, optimization, simulation, visualization, and remaining scientific synthesis, which helps us see how specialised brokers can collaborate to supply richer organic perception than any single remoted evaluation. On the finish, we’ve got a powerful basis for extending this pocket book towards extra practical omics datasets, experimental priors, mechanistic constraints, and deeper organic speculation technology for superior programs biology analysis.
Take a look at the Full Codes with Notebook. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Must companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Connect with us
The publish Build a Multi-Agent AI Workflow for Biological Network Modeling, Protein Interactions, Metabolism, and Cell Signaling Simulation appeared first on MarkTechPost.

