The Coding to Create a Framework for Unified Tool Orchestration from Documentation through Automated Pipelines

This tutorial builds a small, compact framework to demonstrate how tool documentation can be converted into standard, callable APIs. We then register these tools in a central database and run them within an automated pipeline. In each step, we build a simple convertor, create mock bioinformatics tool designs, organise them into a registry and compare individual pipeline executions and those with multiple steps. This process explores how automation and tool-structured interfaces can be used to streamline data workflows. Click here to view the FULL CODES here.

Import re, json time, random
Dataclasses can be imported from other dataclasses
Import Callable, Dicts, Any, List, Tuple


@dataclass
Class ToolSpec
 Name:
   description: str
   inputs: Dict[str, str]
   outputs: Dict[str, str]


def parse_doc_to_spec(name: str, doc: str) -> ToolSpec:
   desc = doc.strip().splitlines()[0].strip() if doc.strip() Another name
   arg_block = "n".join([l for l in doc.splitlines() if "--" in l or ":" in l])
   inputs = {}
 Line in arg_block.splitlines():
 If m is equal to re.findall (r), then m will be found."(--?w[w-]*|bw+b)s*[:=]?s*(w+)?", line)
 Typing m in place of key will allow you to enter the correct code:
           k = key.lstrip("-")
 If k is not present in the inputs, and k does not appear in outputs ["Returns","Output","Outputs"]:
               inputs[k] > (typo orIf not inputs, inputs= "str")
   if not inputs: inputs = {"in": "str"}
   return ToolSpec(name=name, description=desc, inputs=inputs, outputs={"out":"json"})

To begin, we define the structure of our tools. Then, we write a parser to convert plain documentation into an standardized tool specification. We can automatically extract outputs and parameters from descriptions. Take a look at the FULL CODES here.

def tool_fastqc(seq_fasta: str, min_len:int=30) -> Dict[str,Any]:
 Seqs = [s for s in re.split(r">[^n]*n", seq_fasta)[1:]]
   lens = [len(re.sub(r"s+","",s)) for s in seqs]
   q30 = sum(l>=min_len for l in lens)/max(1,len(lens))
 Sum(c) = gc "GCgc" For s, seqs for C in s),/max(1,sum (lens)).Return
   return {"n_seqs":len(lens),"len_mean":(sum(lens)/max(1,len(lens))),"pct_q30":q30,"gc":gc}


def tool_bowtie2_like(ref:str, reads:str, mode:str="end-to-end") -> Dict[str,Any]:
   def revcomp(s):
       t=str.maketrans("ACGTacgt","TGCAtgca""; return s.translate()[::-1]
   reads_list=[r for r in re.split(r">[^n]*n", reads)[1:]]
   ref_seq="".join(ref.splitlines()[1:])
   hits=[]
   for i,r in enumerate(reads_list):
       rseq="".join(r.split())
 Aligned = (rseq ref_seq or revcomp (rseq ref_seq).Return
       hits.append({"read_id":i,"aligned":bool(aligned),"pos":ref_seq.find(rseq)})
   return {"n":len(hits),"aligned":sum(h["aligned"] "h" in Hits)"mode":mode,"hits":hits}


def tool_bcftools_like(ref:str, alt:str, win:int=15) -> Dict[str,Any]:
   ref_seq="".join(ref.splitlines()[1:]); alt_seq="".join(alt.splitlines()[1:])
   n=min(len(ref_seq),len(alt_seq)); vars=[]
   for i in range(n):
 If you want to know more about the sequence, please refer to the following:Return[i]!=alt_seq[i]: vars.append({"pos":i,"ref":ref_seq[i],"alt":alt_seq[i]})
   return {"n_sites":n,"n_var":len(vars),"variants":vars[:win]}


FASTQC_DOC = ""FastQC quality control is now available for FASTA
--seq_fasta: str  --min_len: int   Outputs: json"""
BOWTIE_DOC = """Bowtie2-like alignmenter
--ref: str  --reads: str  --mode: str  Outputs: json"""
BCF_DOC = ""Caller "bcftools like"
--ref: str  --alt: str  --win: int  Outputs: json"""

We build mock-ups of tools in bioinformatics such as Bowtie2, Bcftools, and FastQC. They are then defined in terms of their inputs, outputs and expected behavior so they can all be used consistently via a common interface. Click here to see the FULL CODES here.

@dataclass
Class MCPTool
 Spec: ToolSpec
   fn: Callable[..., Dict[str,Any]]


Class MCPServer
   def __init__(self): self.tools: Dict[str,MCPTool] = {}
   def register(self, name:str, doc:str, fn:Callable[...,Dict[str,Any]]):
       spec = parse_doc_to_spec(name, doc); self.tools[name]=MCPTool(spec, fn)
   def list_tools(self) -> List[Dict[str,Any]]:
 You can return to your original language by clicking here. [dict(name=t.spec.name, description=t.spec.description, inputs=t.spec.inputs, outputs=t.spec.outputs) for t in self.tools.values()]
   def call_tool(self, name:str, args:Dict[str,Any]) -> Dict[str,Any]:
 If the name does not exist in self.tools raise KeyError()"tool {name} not found")
       spec = self.tools[name].spec
       kwargs={k:args.get(k) for k in spec.inputs.keys()}
 Self-tools return[name].fn(**kwargs)


server=MCPServer()
server.register("fastqc", FASTQC_DOC, tool_fastqc)
server.register("bowtie2", BOWTIE_DOC, tool_bowtie2_like)
server.register("bcftools", BCF_DOC, tool_bcftools_like)


Tuple is the same as TaskPIPELE =[str, Dict[str,Any]]
PIPELINES = {
   "rnaseq_qc_align_call":[
       ("fastqc", {"seq_fasta":"{reads}", "min_len":30}),
       ("bowtie2", {"ref":"{ref}", "reads":"{reads}", "mode":"end-to-end"}),
       ("bcftools", {"ref":"{ref}", "alt":"{alt}", "win":15}),
   ]
}


def compile_pipeline(nl_request:str) -> List[Task]:
 "key" = "rnaseq_qc_align_call" If re.search (r"rna|qc|align|variant|call"Re.I, else "rnaseq_qc_align_call"
 Return PIPELINES[key]

Build a lightweight server This registers the tools and lists their specs, allowing us to use them in a programmatic way. The pipeline is a structure which defines the order in which each tool should be run. Visit the FULL CODES here.

def mk_fasta(header:str, seq:str)->str: return f">{header}n{seq}n"
random.seed(0)
REF_SEQ="".join(random.choice("ACGT""" for the range (300)"
REF = mk_fasta("ref",REF_SEQ)
READS = mk_fasta("r1", REF_SEQ[50:130]) + mk_fasta("r2","ACGT"*15) + mk_fasta("r3", REF_SEQ[180:240])
ALT = mk_fasta("alt", REF_SEQ[:150] + "T" + REF_SEQ[151:])


def run_pipeline(nl:str, ctx:Dict[str,str]) -> Dict[str,Any]:
   plan=compile_pipeline(nl); results=[]; t0=time.time()
 Name, use arg_tpl when planning:Return
       args={k:(v.format(**ctx) if isinstance(v,str) else v) for k,v in arg_tpl.items()}
       out=server.call_tool(name, args)
       results.append({"tool":name,"args":args,"output":out})
   return {"request":nl,"elapsed_s":round(time.time()-t0,4),"results":results}

For testing, we prepare synthetic FASTA files and then implement a function to run the pipeline. In this case, we pass dynamic tool parameters in order to execute every step of the sequence. See the FULL CODES here.

def bench_individual() -> List[Dict[str,Any]]:
   cases=[
       ("fastqc", {"seq_fasta":READS,"min_len":25}),
       ("bowtie2", {"ref":REF,"reads":READS,"mode":"end-to-end"}),
       ("bcftools", {"ref":REF,"alt":ALT,"win":10}),
   ]
   rows=[]
 For name,args:
       t0=time.time(); ok=True; err=None; out=None
       try: out=server.call_tool(name,args)
 except for Exception as e : ok=False ; err=str()
       rows.append({"tool":name,"ok":ok,"ms":int((time.time()-t0)*1000),"out_keys":list(out.keys()) if ok else [],"err":err})
 Return rows


def bench_pipeline() -> Dict[str,Any]:
   t0=time.time()
   res=run_pipeline("Run RNA-seq QC, align, and variant call.", {"ref":REF,"reads":READS,"alt":ALT})
 All = ok["output"] For step in resReturn["results"])
   return {"pipeline":"rnaseq_qc_align_call","ok":ok,"ms":int((time.time()-t0)*1000),"n_steps":len(res["results"])}


print("== TOOLS =="); print(json.dumps(server.list_tools(), indent=2))
print("n== INDIVIDUAL BENCH =="); print(json.dumps(bench_individual(), indent=2))
print("n== PIPELINE BENCH =="); print(json.dumps(bench_pipeline(), indent=2))
print("n== PIPELINE RUN =="); print(json.dumps(run_pipeline("Run RNA-seq QC, align, and variant call.", {"ref":REF,"reads":READS,"alt":ALT}), indent=2))

Benchmarking is done for both the individual tool and pipeline. We capture their performance metrics and outputs. We print out the results at the end to ensure that the entire workflow is running smoothly and successfully.

As a conclusion, we gain a better understanding of lightweight tool conversions, orchestration, and registration in a single, integrated environment. In this exercise, we observe that a single interface enables us to seamlessly connect and run multiple tools in sequence. We can also measure the performance of these tools. This practical exercise allows us to appreciate the simple design principles of standardization and automation that can increase reproducibility and efficiency in computational workflows.

Take a look at the FULL CODES here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.

Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost was his most recent venture. This platform, which specializes in covering machine learning and deep-learning news, is well known for being both technically correct and understandable to a broad audience. This platform has over 2,000,000 monthly views which shows its popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

The Coding to Create a Framework for Unified Tool Orchestration from Documentation through Automated Pipelines

DeepSeek AI releases DeepSeek V4: Sparse attention and heavily compressed attention enable one-million-token contexts.

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

The Vibes-Based Pricing of ‘Pro’ AI Software

OpenAI has released its first Open-Weight models since GPT-2

There’s Neuralink—and There’s the Mind-Reading Company That Might Surpass It

The ICE has Spyware now | WIRED

Why AI Wants Massive Numerical Fashions (LNMs) for Mathematical Mastery • AI Weblog

Top Insights

WIRED Roundup: DHS’s Privateness Breach, AI Romantic Affairs, and Google Sues Textual content Scammers

YouTube to reinstate banned accounts for spreading misinformation

Latest News

DeepSeek AI releases DeepSeek V4: Sparse attention and heavily compressed attention enable one-million-token contexts.

AI-Designed drugs by a DeepMind spinoff are headed to human trials

The Coding to Create a Framework for Unified Tool Orchestration from Documentation through Automated Pipelines

Related Posts