This tutorial will show you how to use the PyBEL Google Colab is a powerful ecosystem that allows you to create and analyse rich biological knowledge graphs. Installation of all packages is the first thing we do. This includes PyBEL as well as NetworkX, Matplotlib Seaborn and Pandas. After that, we demonstrate using PyBEL how to create proteins, modifications, processes, etc. From there, we guide you through the creation of an Alzheimer’s disease-related pathway, showcasing how to encode causal relationships, protein–protein interactions, and phosphorylation events. In addition to graph creation, we will introduce you to advanced network analyses. These include centrality measures and node classifications, subgraph extraction, techniques for extracting evidence and citation data, etc. After completing this section, your BEL graph will be fully annotated, ready for further enrichment and visualization analyses. This is a good foundation for interactive exploration of biological knowledge.
!pip install pybel pybel-tools networkx matplotlib seaborn pandas -q
Import pybel
import pybel.dsl as dsl
From pybel export BELGraph
Import pybel.io to_pickle and import pickle from_pybel.io
Import networkx as Nx
Matplotlib.pyplot can be imported as a plt
import pandas as pd
Import Seaborn as Sns
Import Counter for Collections
import warnings
warnings.filterwarnings('ignore')
print("PyBEL Advanced Tutorial: Biological Expression Language Ecosystem")
print("=" * 65)
Installing PyBEL, along with its dependencies, directly into Colab ensures that we have all the necessary libraries for analysis, including NetworkX, Matplotlib Seaborn and Pandas. After installing, we will import the modules core and disable warnings so that our notebook remains clean.
print("n1. Building a Biological Knowledge Graph")
print("-" * 40)
BELGraph = graph
name="Alzheimer's Disease Pathway",
version="1.0.0",
description="Example pathway showing protein interactions in AD",
authors="PyBEL Tutorial"
)
app = dsl.Protein(name="APP", namespace="HGNC")
abeta = dsl.Protein(name="Abeta", namespace="CHEBI")
tau = dsl.Protein(name="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(name="GSK3B", namespace="HGNC")
inflammation = dsl.BiologicalProcess(name="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(name="apoptotic process", namespace="GO")
graph.add_increases(app, abeta, citation="PMID:12345678", evidence="APP cleavage produces Abeta")
graph.add_increases(abeta, inflammation, citation="PMID:87654321", evidence="Abeta triggers neuroinflammation")
tau_phosphorylated = dsl.Protein(name="MAPT", namespace="HGNC",
variants=[dsl.ProteinModification("Ph")])
graph.add_increases(gsk3b, tau_phosphorylated, citation="PMID:11111111", evidence="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, citation="PMID:22222222", evidence="Hyperphosphorylated tau causes cell death")
graph.add_increases(inflammation, apoptosis, citation="PMID:33333333", evidence="Inflammation promotes apoptosis")
graph.add_association(abeta, tau, citation="PMID:44444444", evidence="Abeta and tau interact synergistically")
print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")
The PyBEL DSL is used to define the proteins and processes in an Alzheimer’s Disease pathway. By adding protein modifications and associations to causal relationships we can construct a complete network which captures important molecular interactions.
print("n2. Advanced Network Analysis")
print("-" * 30)
degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)
most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most connected node: {most_central}")
print(f"Degree centrality: {degree_centrality[most_central]:.3f}")
To quantify the importance of each node within the graph, we compute betweenness and degree centralities. By identifying those nodes with the highest degree of connectivity, we can identify potential disease-causing hubs.
print("n3. Biological Entity Classification")
print("-" * 35)
Node_types is Counter()
For node in graph.nodes():
node_types[node.function] += 1
print("Node distribution:")
For func, add the count to node_types.items():
print(f" {func}: {count}")
Each node can be classified according to its function such as Proteins or BiologicalProcesses, then their count is added. It is easy to understand our network by looking at this breakdown.
print("n4. Pathway Analysis")
print("-" * 20)
proteins = [node for node in graph.nodes() if node.function == 'Protein']
Some processes [node for node in graph.nodes() if node.function == 'BiologicalProcess']
print(f"Proteins in pathway: {len(proteins)}")
print(f"Biological processes: {len(processes)}")
Edge_types = Counter()
for u, v, data in graph.edges(data=True):
edge_types[data.get('relation')] += 1
print("nRelationship types:")
If you want to count edge_types.items, use the rel.():
print(f" {rel}: {count}")
The complexity of the pathway is measured by separating all processes and proteins. By counting the types of relationship, we can see which interactions dominate our model.
print("n5. Literature Evidence Analysis")
print("-" * 32)
Citations []
Evidences []
for _, _, data in graph.edges(data=True):
if 'citation' in data:
citations.append(data['citation'])
if 'evidence' in data:
evidences.append(data['evidence'])
print(f"Total citations: {len(citations)}")
print(f"Unique citations: {len(set(citations))}")
print(f"Evidence statements: {len(evidences)}")
To evaluate the graph’s foundation in research, we extract evidence strings and citation identifiers from each edge. By summarizing the total number of citations and unique ones, we can gauge the extent of support literature.
print("n6. Subgraph Analysis")
print("-" * 22)
inflammation_nodes = [inflammation]
inflammation_neighbors = list(graph.predecessors(inflammation)) + list(graph.successors(inflammation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)
print(f"Inflammation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")
This allows us to isolate the inflammation-related subgraph, by focusing on its immediate neighbors. This subnetwork focuses on the interaction between inflammation and other diseases processes.
print("n7. Advanced Graph Querying")
print("-" * 28)
try:
paths = list(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
print(f"Paths from APP to apoptosis: {len(paths)}")
if you have paths
print(f"Shortest path length: {len(paths[0])-1}")
NetworkXNoPath except:
print("No paths found between APP and apoptosis")
apoptosis_inducers = list(graph.predecessors(apoptosis))
print(f"Factors that increase apoptosis: {len(apoptosis_inducers)}")
To explore the mechanistic pathways and identify important intermediates, we enumerate the simple routes between APP-apoptosis. List all precursors to apoptosis can also help us identify factors that may cause cell death.
print("n8. Data Export and Visualization")
print("-" * 35)
adj_matrix = nx.adjacency_matrix(graph)
node_labels = [str(node) for node in graph.nodes()]
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, k=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color="lightblue",
node_size=1000, font_size=8, font_weight="bold")
plt.title("BEL Network Graph")
plt.subplot(2, 2, 2)
centralities = list(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, color="green")
plt.title("Degree Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")
plt.subplot(2, 2, 3)
Functions = node_types.keys())
counts = list(node_types.values())
plt.pie(counts, labels=functions, autopct="%1.1f%%", startangle=90)
plt.title("Node Type Distribution")
plt.subplot(2, 2, 4)
Relationship = list (edge_types.keys())
rel_counts = list(edge_types.values())
plt.bar(relations, rel_counts, color="orange", alpha=0.7)
plt.title("Relationship Types")
plt.xlabel("Relation")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Then, we prepare the adjacency matrixes for use in downstream applications and create a multipanel figure that shows network structure and centrality distributions. We also generate node type proportions and edge count figures. The visualizations help to bring the BEL graph alive, allowing for a more detailed biological understanding.
We have shown in this tutorial the flexibility and power of PyBEL to model complex biological systems. In this tutorial, we demonstrated how to construct a white-box graph of Alzheimer’s disease interactions. We also showed you how easy it is to perform network level analyses in order identify the key hub nodes and then extract biologically significant subgraphs. In addition, we covered the essentials of literature evidence mining as well as data structures that would allow for powerful visualizations. Next, you can extend the framework to other pathways. This could include integrating additional data from omics, performing enrichment tests or combining machine-learning workflows.
Click here to find out more Codes here. This research is the work of researchers. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter.
Sana Hassan is a dual-degree IIT Madras student and consulting intern with Marktechpost. She loves to apply technology and AI in order to solve real-life challenges. Sana Hassan, an intern at Marktechpost and dual-degree student at IIT Madras is passionate about applying technology and AI to real-world challenges.


