This tutorial builds a small, compact framework to demonstrate how tool documentation can be converted into standard, callable APIs. We then register these tools in a central database and run them within an automated pipeline. In each step, we build a simple convertor, create mock bioinformatics tool designs, organise them into a registry and compare individual pipeline executions and those with multiple steps. This process explores how automation and tool-structured interfaces can be used to streamline data workflows. Click here to view the FULL CODES here.
Import re, json time, random
Dataclasses can be imported from other dataclasses
Import Callable, Dicts, Any, List, Tuple
@dataclass
Class ToolSpec
Name:
description: str
inputs: Dict[str, str]
outputs: Dict[str, str]
def parse_doc_to_spec(name: str, doc: str) -> ToolSpec:
desc = doc.strip().splitlines()[0].strip() if doc.strip() Another name
arg_block = "n".join([l for l in doc.splitlines() if "--" in l or ":" in l])
inputs = {}
Line in arg_block.splitlines():
If m is equal to re.findall (r), then m will be found."(--?w[w-]*|bw+b)s*[:=]?s*(w+)?", line)
Typing m in place of key will allow you to enter the correct code:
k = key.lstrip("-")
If k is not present in the inputs, and k does not appear in outputs ["Returns","Output","Outputs"]:
inputs[k] > (typo orIf not inputs, inputs= "str")
if not inputs: inputs = {"in": "str"}
return ToolSpec(name=name, description=desc, inputs=inputs, outputs={"out":"json"})
To begin, we define the structure of our tools. Then, we write a parser to convert plain documentation into an standardized tool specification. We can automatically extract outputs and parameters from descriptions. Take a look at the FULL CODES here.
def tool_fastqc(seq_fasta: str, min_len:int=30) -> Dict[str,Any]:
Seqs = [s for s in re.split(r">[^n]*n", seq_fasta)[1:]]
lens = [len(re.sub(r"s+","",s)) for s in seqs]
q30 = sum(l>=min_len for l in lens)/max(1,len(lens))
Sum(c) = gc "GCgc" For s, seqs for C in s),/max(1,sum (lens)).Return
return {"n_seqs":len(lens),"len_mean":(sum(lens)/max(1,len(lens))),"pct_q30":q30,"gc":gc}
def tool_bowtie2_like(ref:str, reads:str, mode:str="end-to-end") -> Dict[str,Any]:
def revcomp(s):
t=str.maketrans("ACGTacgt","TGCAtgca""; return s.translate()[::-1]
reads_list=[r for r in re.split(r">[^n]*n", reads)[1:]]
ref_seq="".join(ref.splitlines()[1:])
hits=[]
for i,r in enumerate(reads_list):
rseq="".join(r.split())
Aligned = (rseq ref_seq or revcomp (rseq ref_seq).Return
hits.append({"read_id":i,"aligned":bool(aligned),"pos":ref_seq.find(rseq)})
return {"n":len(hits),"aligned":sum(h["aligned"] "h" in Hits)"mode":mode,"hits":hits}
def tool_bcftools_like(ref:str, alt:str, win:int=15) -> Dict[str,Any]:
ref_seq="".join(ref.splitlines()[1:]); alt_seq="".join(alt.splitlines()[1:])
n=min(len(ref_seq),len(alt_seq)); vars=[]
for i in range(n):
If you want to know more about the sequence, please refer to the following:Return[i]!=alt_seq[i]: vars.append({"pos":i,"ref":ref_seq[i],"alt":alt_seq[i]})
return {"n_sites":n,"n_var":len(vars),"variants":vars[:win]}
FASTQC_DOC = ""FastQC quality control is now available for FASTA
--seq_fasta: str --min_len: int Outputs: json"""
BOWTIE_DOC = """Bowtie2-like alignmenter
--ref: str --reads: str --mode: str Outputs: json"""
BCF_DOC = ""Caller "bcftools like"
--ref: str --alt: str --win: int Outputs: json"""
We build mock-ups of tools in bioinformatics such as Bowtie2, Bcftools, and FastQC. They are then defined in terms of their inputs, outputs and expected behavior so they can all be used consistently via a common interface. Click here to see the FULL CODES here.
@dataclass
Class MCPTool
Spec: ToolSpec
fn: Callable[..., Dict[str,Any]]
Class MCPServer
def __init__(self): self.tools: Dict[str,MCPTool] = {}
def register(self, name:str, doc:str, fn:Callable[...,Dict[str,Any]]):
spec = parse_doc_to_spec(name, doc); self.tools[name]=MCPTool(spec, fn)
def list_tools(self) -> List[Dict[str,Any]]:
You can return to your original language by clicking here. [dict(name=t.spec.name, description=t.spec.description, inputs=t.spec.inputs, outputs=t.spec.outputs) for t in self.tools.values()]
def call_tool(self, name:str, args:Dict[str,Any]) -> Dict[str,Any]:
If the name does not exist in self.tools raise KeyError()"tool {name} not found")
spec = self.tools[name].spec
kwargs={k:args.get(k) for k in spec.inputs.keys()}
Self-tools return[name].fn(**kwargs)
server=MCPServer()
server.register("fastqc", FASTQC_DOC, tool_fastqc)
server.register("bowtie2", BOWTIE_DOC, tool_bowtie2_like)
server.register("bcftools", BCF_DOC, tool_bcftools_like)
Tuple is the same as TaskPIPELE =[str, Dict[str,Any]]
PIPELINES = {
"rnaseq_qc_align_call":[
("fastqc", {"seq_fasta":"{reads}", "min_len":30}),
("bowtie2", {"ref":"{ref}", "reads":"{reads}", "mode":"end-to-end"}),
("bcftools", {"ref":"{ref}", "alt":"{alt}", "win":15}),
]
}
def compile_pipeline(nl_request:str) -> List[Task]:
"key" = "rnaseq_qc_align_call" If re.search (r"rna|qc|align|variant|call"Re.I, else "rnaseq_qc_align_call"
Return PIPELINES[key]
Build a lightweight server This registers the tools and lists their specs, allowing us to use them in a programmatic way. The pipeline is a structure which defines the order in which each tool should be run. Visit the FULL CODES here.
def mk_fasta(header:str, seq:str)->str: return f">{header}n{seq}n"
random.seed(0)
REF_SEQ="".join(random.choice("ACGT""" for the range (300)"
REF = mk_fasta("ref",REF_SEQ)
READS = mk_fasta("r1", REF_SEQ[50:130]) + mk_fasta("r2","ACGT"*15) + mk_fasta("r3", REF_SEQ[180:240])
ALT = mk_fasta("alt", REF_SEQ[:150] + "T" + REF_SEQ[151:])
def run_pipeline(nl:str, ctx:Dict[str,str]) -> Dict[str,Any]:
plan=compile_pipeline(nl); results=[]; t0=time.time()
Name, use arg_tpl when planning:Return
args={k:(v.format(**ctx) if isinstance(v,str) else v) for k,v in arg_tpl.items()}
out=server.call_tool(name, args)
results.append({"tool":name,"args":args,"output":out})
return {"request":nl,"elapsed_s":round(time.time()-t0,4),"results":results}
For testing, we prepare synthetic FASTA files and then implement a function to run the pipeline. In this case, we pass dynamic tool parameters in order to execute every step of the sequence. See the FULL CODES here.
def bench_individual() -> List[Dict[str,Any]]:
cases=[
("fastqc", {"seq_fasta":READS,"min_len":25}),
("bowtie2", {"ref":REF,"reads":READS,"mode":"end-to-end"}),
("bcftools", {"ref":REF,"alt":ALT,"win":10}),
]
rows=[]
For name,args:
t0=time.time(); ok=True; err=None; out=None
try: out=server.call_tool(name,args)
except for Exception as e : ok=False ; err=str()
rows.append({"tool":name,"ok":ok,"ms":int((time.time()-t0)*1000),"out_keys":list(out.keys()) if ok else [],"err":err})
Return rows
def bench_pipeline() -> Dict[str,Any]:
t0=time.time()
res=run_pipeline("Run RNA-seq QC, align, and variant call.", {"ref":REF,"reads":READS,"alt":ALT})
All = ok["output"] For step in resReturn["results"])
return {"pipeline":"rnaseq_qc_align_call","ok":ok,"ms":int((time.time()-t0)*1000),"n_steps":len(res["results"])}
print("== TOOLS =="); print(json.dumps(server.list_tools(), indent=2))
print("n== INDIVIDUAL BENCH =="); print(json.dumps(bench_individual(), indent=2))
print("n== PIPELINE BENCH =="); print(json.dumps(bench_pipeline(), indent=2))
print("n== PIPELINE RUN =="); print(json.dumps(run_pipeline("Run RNA-seq QC, align, and variant call.", {"ref":REF,"reads":READS,"alt":ALT}), indent=2))
Benchmarking is done for both the individual tool and pipeline. We capture their performance metrics and outputs. We print out the results at the end to ensure that the entire workflow is running smoothly and successfully.
As a conclusion, we gain a better understanding of lightweight tool conversions, orchestration, and registration in a single, integrated environment. In this exercise, we observe that a single interface enables us to seamlessly connect and run multiple tools in sequence. We can also measure the performance of these tools. This practical exercise allows us to appreciate the simple design principles of standardization and automation that can increase reproducibility and efficiency in computational workflows.
Take a look at the FULL CODES here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.
Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost was his most recent venture. This platform, which specializes in covering machine learning and deep-learning news, is well known for being both technically correct and understandable to a broad audience. This platform has over 2,000,000 monthly views which shows its popularity.

