The Single-Cell Revolution in Drug Repurposing

CellAwareGNN bridges single-cell genomics and knowledge graph–based drug repurposing, demonstrating that cell-type-specific regulatory evidence improves therapeutic indication prediction — particularly for autoimmune diseases where cellular context is paramount.

0.826
Indication AUPRC
+3.4%
vs TxGNN
147K
KG Nodes
14.5M
KG Edges
14
Cell Types

The Problem: Resolution Blindness

Graph foundation models like TxGNN have shown remarkable ability to predict drug indications by learning from biomedical knowledge graphs. However, they treat gene-disease associations as bulk signals, ignoring the critical cell-type specificity that governs disease mechanisms. A gene dysregulated in CD8+ T cells drives autoimmunity very differently than the same gene in hepatocytes — yet traditional KGs collapse this distinction.

🧬
scPrimeKG

A single-cell-enhanced knowledge graph extending PrimeKG-U with cell-type-resolved regulatory evidence from the OneK1K cohort. Adds 14 immune cell types, 26,597 cis-eQTLs, and cell-type-specific gene–disease associations — increasing edge count from 8.1M to 14.5M.

Novel Knowledge Graph
🔮
CellAwareGNN

A graph neural network foundation model pre-trained on all relation types in scPrimeKG. Achieves AUPRC 0.826 for drug indication prediction — a 3.4% improvement over TxGNN (0.799) and 1.2% over TxGNN-U (0.816). Gains are especially strong for autoimmune diseases where cell-type context matters most.

Graph Foundation Model

Why This Matters

🎯 Precision

Cell-type resolution transforms noisy bulk associations into precise mechanistic links. A variant affecting B cell gene expression now directly informs lupus drug predictions.

💊 Repurposing

Of 17,000+ diseases, only ~500 have FDA-approved treatments. Cell-aware models can identify therapeutic candidates for rare and complex diseases where bulk methods fail.

🔬 Autoimmune Focus

Autoimmune diseases show the largest improvements — consistent with the OneK1K cohort's immune cell focus and the cell-type-specific nature of autoimmune pathology.

Key Insight: Single-cell resolution isn't just more data — it's a fundamentally different kind of data. CellAwareGNN demonstrates that even modest graph expansion (+78% edges) with biologically meaningful cell-type context outperforms larger but resolution-blind knowledge graphs.

Knowledge Graph Evolution Timeline

From PrimeKG to scPrimeKG

Three generations of biomedical knowledge graphs, each adding deeper biological resolution — culminating in single-cell–aware drug repurposing.

PrimeKG

129,375 nodes
4,050,249 edges
29 edge types
10 node types
Harvard 2023

PrimeKG-U

~129K nodes
~8,100,000 edges
Updated relations
Expanded drugs
Updated 2025

scPrimeKG

147,881 nodes
14,520,000 edges
+ cell type nodes
+ eQTL edges
CellAwareGNN 2026

Node Types in scPrimeKG

Node TypeCountSourceDescription
Disease17,080MONDO, DOClinically-recognized diseases from disease ontologies
Drug7,957DrugBankTherapeutic candidates (approved + investigational)
Gene/Protein~27,000Entrez, UniProtHuman genes and protein products
Cell Type14OneK1KImmune cell types from single-cell eQTL mapping
Biological Process~28,000GOGO biological processes linked to genes
Molecular Function~12,000GOGO molecular functions
Cellular Component~4,200GOSubcellular localizations
Pathway~2,500ReactomeSignaling and metabolic pathways
Phenotype~15,000HPOHuman phenotype ontology terms
Anatomy~14,000UberonAnatomical structures and tissues
Exposure~800CTDEnvironmental and chemical exposures

Edge Type Distribution

Graph Growth: PrimeKG → scPrimeKG

Key Edge Types (New in scPrimeKG)

gene_expressed_in_cell_type

Connects genes to immune cell types where they are expressed, derived from OneK1K single-cell RNA-seq. Captures cell-type-specific expression patterns across 14 immune populations.

eQTL_in_cell_type

Links genetic variants to their gene expression effects in specific cell types. 26,597 independent cis-eQTLs from 982 donors provide the regulatory backbone of scPrimeKG.

cell_type_associated_disease

Connects immune cell types to diseases through cell-type-specific genetic associations. Enables the model to learn that, e.g., CD4+ T cell regulatory variants drive specific autoimmune conditions.

scPrimeKG — Multi-Scale Knowledge Graph Architecture Disease 17,080 Drug 7,957 Gene/ Protein ~27K Cell Type 14 types NEW Bio Process Pathway ~2.5K Pheno type Anat omy indication eQTL cell-disease assoc. = New in scPrimeKG = Existing PrimeKG edges = Cell-type-specific edges
Fig. 1 — scPrimeKG extends PrimeKG with cell type nodes and cell-type-specific regulatory edges from the OneK1K single-cell eQTL study

CellAwareGNN Architecture

A graph neural network that propagates cell-type-specific regulatory signals through the biomedical knowledge graph, learning embeddings that capture the mechanistic link between single-cell gene regulation, disease biology, and therapeutic intervention.

CellAwareGNN — Model Architecture ① scPrimeKG D G Rx Cell 147K nodes 14.5M edges + cell-type eQTLs ② Node Embedding Type-specific init Cell-type features Learnable embed Each node type gets dedicated encoder ③ GNN Layers Relation-specific Message Passing L layers × R relations Cell-type signals propagate through gene → disease paths ④ Prediction ✓ Indication AUPRC: 0.826 ✗ Contraindication Improved Drug–Disease pair scoring via learned embeddings Pre-training on All Relation Types Self-supervised link prediction across all 30+ relation types in scPrimeKG — learns universal biomedical embeddings gene↔disease · drug↔target · gene↔cell_type · eQTL↔cell_type · gene↔pathway · disease↔phenotype · ... Zero-Shot Disease Inference Predict therapeutic candidates for new diseases without additional fine-tuning — disease embeddings generalize via graph structure
Fig. 2 — CellAwareGNN architecture: scPrimeKG → type-specific node embedding → relation-aware GNN message passing → drug indication/contraindication prediction

Key Architectural Innovations

🔬
Cell-Type Node Integration

Cell types are first-class citizens in the graph, not metadata. Each of 14 immune cell types becomes a node connected to genes via expression and eQTL edges. This allows the GNN to learn cell-type-specific disease mechanisms through message passing.

🔄
Relation-Specific Message Passing

Different edge types carry different semantics. The GNN uses relation-specific transformation matrices, allowing "gene_expressed_in_cell_type" edges to propagate information differently from "drug_targets_protein" edges. This preserves biological meaning during aggregation.

🎯
Multi-Task Pre-training

Pre-trained on all 30+ relation types simultaneously via self-supervised link prediction. The model learns general biomedical embeddings before being evaluated on the specific task of drug indication prediction — enabling zero-shot transfer to unseen diseases.

📊
Disease Coverage

Evaluated on all 17,080 diseases in the knowledge graph — not just a curated subset. This comprehensive coverage is critical because rare diseases with few known drug associations are exactly where AI-driven repurposing offers the most value.

How Cell-Type Signals Improve Drug Prediction

Example: Consider rheumatoid arthritis (RA). In PrimeKG, the gene TNF is linked to RA through a bulk gene-disease association. In scPrimeKG, CellAwareGNN learns that TNF is specifically dysregulated in CD14+ monocytes in RA patients (via OneK1K eQTLs). This cell-type context strengthens the prediction for anti-TNF drugs (infliximab, adalimumab) while correctly downweighting drugs that act on TNF in irrelevant cell types.

OneK1K: The Single-Cell Foundation

The OneK1K consortium profiled 1.26 million peripheral blood mononuclear cells (PBMCs) from 982 donors using single-cell RNA sequencing, creating the largest single-cell eQTL atlas of immune cell types — and the foundation for scPrimeKG's cell-type-resolved regulatory evidence.

982
Donors
1.26M
Cells Profiled
14
Cell Types
26,597
cis-eQTLs
305
Autoimmune Loci

14 Immune Cell Types

Click a cell type to explore its contribution to scPrimeKG:

eQTL Distribution by Cell Type

Cell Type Proportions in PBMCs

From eQTLs to Drug Predictions

eQTL → Knowledge Graph → Drug Indication Pipeline OneK1K scRNA-seq 982 donors × 14 cell types → 26,597 cis-eQTLs → 990 trans-eQTLs scPrimeKG Cell type + gene edges eQTL regulatory edges Cell-disease associations CellAwareGNN Cell-aware message passing Multi-relation pre-training Zero-shot inference Drug Indications Cell-type-aware predictions +3.4% vs TxGNN Autoimmune specialty Example: Lupus Drug Repurposing SNP rs1234 → eQTL for IRF5 in B cells (p=2.3e-15) → scPrimeKG edge: IRF5—expressed_in→B_cell → B_cell—associated→SLE → CellAwareGNN learns B cell–specific IRF5 pathway in lupus context → Predicts belimumab (anti-BAFF B cell depleter) as top candidate — validated by FDA approval
Fig. 3 — Pipeline: OneK1K single-cell eQTLs are integrated into scPrimeKG, enabling CellAwareGNN to predict drug indications with cell-type awareness

Key Findings from OneK1K

305 Autoimmune Disease Loci

OneK1K identified causal cell types for 305 autoimmune disease-associated loci through single-cell eQTL mapping. For example, ORMDL3 eQTLs in CD4+ T cells map to Crohn's disease risk, while the same gene in monocytes maps to asthma.

990 trans-eQTLs

Beyond cis-regulation, OneK1K discovered 990 trans-eQTL effects — long-range genetic control of gene expression — many of which are cell-type-specific and invisible to bulk studies. These expand the regulatory network captured in scPrimeKG.

Benchmark Results

CellAwareGNN consistently outperforms both TxGNN and TxGNN-U across drug indication prediction tasks, with the largest gains in autoimmune disease areas where cell-type-specific regulatory context is most informative.

Overall Performance: Drug Indication Prediction (AUPRC)

CellAwareGNN
0.826
TxGNN-U
0.816
TxGNN
0.799

Filter by Disease Area

Drug Indication AUPRC — All Disease Areas

Improvement by Disease Area

Model Capability Radar

Detailed Comparison

MetricTxGNNTxGNN-UCellAwareGNNΔ vs TxGNN
Indication AUPRC0.7990.8160.826+3.4%
Indication AUROC0.8910.9030.912+2.4%
Contraindication AUPRC0.7420.7580.771+3.9%
Autoimmune AUPRC0.7610.7890.821+7.9%
Zero-Shot (Novel Diseases)0.6920.7100.735+6.2%
Knowledge GraphPrimeKGPrimeKG-UscPrimeKG
Nodes129K~129K147,881+14.6%
Edges4.05M8.1M14.52M+258%

Autoimmune Advantage: CellAwareGNN's largest gain is in autoimmune diseases (+7.9% over TxGNN), which is biologically expected: OneK1K profiled immune cells, and autoimmune diseases are driven by cell-type-specific immune dysregulation. This validates the hypothesis that single-cell resolution matters most where cellular context drives pathology.

Ablation Study: What Matters Most?

Removing cell-type-specific edges has the largest impact on autoimmune disease prediction, confirming their critical role.

Interactive Drug Indication Explorer

Explore simulated drug indication predictions across disease categories. Select a disease to see how CellAwareGNN's cell-type-aware predictions differ from baseline TxGNN — highlighting cases where single-cell context reshapes drug rankings.

Select a disease above to explore predicted drug indications

Cell-Type Contribution to Predictions

When a disease is selected, this chart shows which cell types contribute most to CellAwareGNN's predictions:

Cell-Type Contribution — Select a disease above

Model Comparison for Selected Disease

Top-5 Drug Scores — Select a disease above

References

Key publications underlying CellAwareGNN, scPrimeKG, the OneK1K cohort, and the broader landscape of graph-based drug repurposing.

CellAwareGNN: Single-Cell Enhanced Knowledge Graph Foundation Model for Drug Indication Prediction. bioRxiv (2026). 10.64898/2026.02.20.707076
Huang, K. et al. A Foundation Model for Clinician Centered Drug Repurposing. Nature Medicine (2024). doi:10.1038/s41591-024-03233-x [TxGNN]
Chandak, P. et al. Building a knowledge graph to enable precision medicine. Scientific Data 10, 67 (2023). doi:10.1038/s41597-023-01960-3 [PrimeKG]
Yazar, S. et al. Single-cell eQTL mapping identifies cell type–specific genetic control of autoimmune disease. Science 376, 6589 (2022). doi:10.1126/science.abf3041 [OneK1K]
Li, M.M. et al. Contextual AI models for single-cell protein biology. Nature Methods (2024). [scCIPHER — contextual deep learning on single-cell KGs]
Wishart, D.S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Research 46, D1 (2018).
Schlichtkrull, M. et al. Modeling Relational Data with Graph Convolutional Networks. ESWC (2018). [R-GCN — relational graph convolutional networks]
Zitnik, M. et al. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, i457–i466 (2018).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nature Genetics 45, 580–585 (2013). [Bulk eQTL baseline]
Köhler, S. et al. The Human Phenotype Ontology in 2021. Nucleic Acids Research 49, D1207–D1217 (2021). [HPO]
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000). [GO]
Jassal, B. et al. The Reactome pathway knowledgebase. Nucleic Acids Research 48, D498–D503 (2020). [Reactome]
Mondo Disease Ontology. mondo.monarchinitiative.org
Davis, A.P. et al. Comparative Toxicogenomics Database (CTD). Nucleic Acids Research 51, D1257–D1262 (2023). [CTD exposures]
Powell, J.E. et al. Single-cell atlas of human immune cells across health, infection, and disease. OneK1K Consortium. onek1k.org
Pushpakom, S. et al. Drug repurposing: progress, challenges and recommendations. Nature Reviews Drug Discovery 18, 41–58 (2019).
Veličković, P. et al. Graph Attention Networks. ICLR (2018). [GAT — attention-based message passing]
Hamilton, W.L. et al. Inductive Representation Learning on Large Graphs. NeurIPS (2017). [GraphSAGE]
TxGNN Human-AI Explorer. txgnn.org
Zitnik Lab, Harvard Medical School. zitniklab.hms.harvard.edu