The Single-Cell Revolution in Drug Repurposing
CellAwareGNN bridges single-cell genomics and knowledge graph–based drug repurposing, demonstrating that cell-type-specific regulatory evidence improves therapeutic indication prediction — particularly for autoimmune diseases where cellular context is paramount.
The Problem: Resolution Blindness
Graph foundation models like TxGNN have shown remarkable ability to predict drug indications by learning from biomedical knowledge graphs. However, they treat gene-disease associations as bulk signals, ignoring the critical cell-type specificity that governs disease mechanisms. A gene dysregulated in CD8+ T cells drives autoimmunity very differently than the same gene in hepatocytes — yet traditional KGs collapse this distinction.
A single-cell-enhanced knowledge graph extending PrimeKG-U with cell-type-resolved regulatory evidence from the OneK1K cohort. Adds 14 immune cell types, 26,597 cis-eQTLs, and cell-type-specific gene–disease associations — increasing edge count from 8.1M to 14.5M.
A graph neural network foundation model pre-trained on all relation types in scPrimeKG. Achieves AUPRC 0.826 for drug indication prediction — a 3.4% improvement over TxGNN (0.799) and 1.2% over TxGNN-U (0.816). Gains are especially strong for autoimmune diseases where cell-type context matters most.
Why This Matters
🎯 Precision
Cell-type resolution transforms noisy bulk associations into precise mechanistic links. A variant affecting B cell gene expression now directly informs lupus drug predictions.
💊 Repurposing
Of 17,000+ diseases, only ~500 have FDA-approved treatments. Cell-aware models can identify therapeutic candidates for rare and complex diseases where bulk methods fail.
🔬 Autoimmune Focus
Autoimmune diseases show the largest improvements — consistent with the OneK1K cohort's immune cell focus and the cell-type-specific nature of autoimmune pathology.
Key Insight: Single-cell resolution isn't just more data — it's a fundamentally different kind of data. CellAwareGNN demonstrates that even modest graph expansion (+78% edges) with biologically meaningful cell-type context outperforms larger but resolution-blind knowledge graphs.
Knowledge Graph Evolution Timeline
From PrimeKG to scPrimeKG
Three generations of biomedical knowledge graphs, each adding deeper biological resolution — culminating in single-cell–aware drug repurposing.
PrimeKG
4,050,249 edges
29 edge types
10 node types
Harvard 2023
PrimeKG-U
~8,100,000 edges
Updated relations
Expanded drugs
Updated 2025
scPrimeKG
14,520,000 edges
+ cell type nodes
+ eQTL edges
CellAwareGNN 2026
Node Types in scPrimeKG
| Node Type | Count | Source | Description |
|---|---|---|---|
| Disease | 17,080 | MONDO, DO | Clinically-recognized diseases from disease ontologies |
| Drug | 7,957 | DrugBank | Therapeutic candidates (approved + investigational) |
| Gene/Protein | ~27,000 | Entrez, UniProt | Human genes and protein products |
| Cell Type | 14 | OneK1K | Immune cell types from single-cell eQTL mapping |
| Biological Process | ~28,000 | GO | GO biological processes linked to genes |
| Molecular Function | ~12,000 | GO | GO molecular functions |
| Cellular Component | ~4,200 | GO | Subcellular localizations |
| Pathway | ~2,500 | Reactome | Signaling and metabolic pathways |
| Phenotype | ~15,000 | HPO | Human phenotype ontology terms |
| Anatomy | ~14,000 | Uberon | Anatomical structures and tissues |
| Exposure | ~800 | CTD | Environmental and chemical exposures |
Edge Type Distribution
Graph Growth: PrimeKG → scPrimeKG
Key Edge Types (New in scPrimeKG)
gene_expressed_in_cell_type
Connects genes to immune cell types where they are expressed, derived from OneK1K single-cell RNA-seq. Captures cell-type-specific expression patterns across 14 immune populations.
eQTL_in_cell_type
Links genetic variants to their gene expression effects in specific cell types. 26,597 independent cis-eQTLs from 982 donors provide the regulatory backbone of scPrimeKG.
cell_type_associated_disease
Connects immune cell types to diseases through cell-type-specific genetic associations. Enables the model to learn that, e.g., CD4+ T cell regulatory variants drive specific autoimmune conditions.
CellAwareGNN Architecture
A graph neural network that propagates cell-type-specific regulatory signals through the biomedical knowledge graph, learning embeddings that capture the mechanistic link between single-cell gene regulation, disease biology, and therapeutic intervention.
Key Architectural Innovations
Cell types are first-class citizens in the graph, not metadata. Each of 14 immune cell types becomes a node connected to genes via expression and eQTL edges. This allows the GNN to learn cell-type-specific disease mechanisms through message passing.
Different edge types carry different semantics. The GNN uses relation-specific transformation matrices, allowing "gene_expressed_in_cell_type" edges to propagate information differently from "drug_targets_protein" edges. This preserves biological meaning during aggregation.
Pre-trained on all 30+ relation types simultaneously via self-supervised link prediction. The model learns general biomedical embeddings before being evaluated on the specific task of drug indication prediction — enabling zero-shot transfer to unseen diseases.
Evaluated on all 17,080 diseases in the knowledge graph — not just a curated subset. This comprehensive coverage is critical because rare diseases with few known drug associations are exactly where AI-driven repurposing offers the most value.
How Cell-Type Signals Improve Drug Prediction
Example: Consider rheumatoid arthritis (RA). In PrimeKG, the gene TNF is linked to RA through a bulk gene-disease association. In scPrimeKG, CellAwareGNN learns that TNF is specifically dysregulated in CD14+ monocytes in RA patients (via OneK1K eQTLs). This cell-type context strengthens the prediction for anti-TNF drugs (infliximab, adalimumab) while correctly downweighting drugs that act on TNF in irrelevant cell types.
OneK1K: The Single-Cell Foundation
The OneK1K consortium profiled 1.26 million peripheral blood mononuclear cells (PBMCs) from 982 donors using single-cell RNA sequencing, creating the largest single-cell eQTL atlas of immune cell types — and the foundation for scPrimeKG's cell-type-resolved regulatory evidence.
14 Immune Cell Types
Click a cell type to explore its contribution to scPrimeKG:
eQTL Distribution by Cell Type
Cell Type Proportions in PBMCs
From eQTLs to Drug Predictions
Key Findings from OneK1K
305 Autoimmune Disease Loci
OneK1K identified causal cell types for 305 autoimmune disease-associated loci through single-cell eQTL mapping. For example, ORMDL3 eQTLs in CD4+ T cells map to Crohn's disease risk, while the same gene in monocytes maps to asthma.
990 trans-eQTLs
Beyond cis-regulation, OneK1K discovered 990 trans-eQTL effects — long-range genetic control of gene expression — many of which are cell-type-specific and invisible to bulk studies. These expand the regulatory network captured in scPrimeKG.
Benchmark Results
CellAwareGNN consistently outperforms both TxGNN and TxGNN-U across drug indication prediction tasks, with the largest gains in autoimmune disease areas where cell-type-specific regulatory context is most informative.
Overall Performance: Drug Indication Prediction (AUPRC)
Filter by Disease Area
Improvement by Disease Area
Model Capability Radar
Detailed Comparison
| Metric | TxGNN | TxGNN-U | CellAwareGNN | Δ vs TxGNN |
|---|---|---|---|---|
| Indication AUPRC | 0.799 | 0.816 | 0.826 | +3.4% |
| Indication AUROC | 0.891 | 0.903 | 0.912 | +2.4% |
| Contraindication AUPRC | 0.742 | 0.758 | 0.771 | +3.9% |
| Autoimmune AUPRC | 0.761 | 0.789 | 0.821 | +7.9% |
| Zero-Shot (Novel Diseases) | 0.692 | 0.710 | 0.735 | +6.2% |
| Knowledge Graph | PrimeKG | PrimeKG-U | scPrimeKG | — |
| Nodes | 129K | ~129K | 147,881 | +14.6% |
| Edges | 4.05M | 8.1M | 14.52M | +258% |
Autoimmune Advantage: CellAwareGNN's largest gain is in autoimmune diseases (+7.9% over TxGNN), which is biologically expected: OneK1K profiled immune cells, and autoimmune diseases are driven by cell-type-specific immune dysregulation. This validates the hypothesis that single-cell resolution matters most where cellular context drives pathology.
Ablation Study: What Matters Most?
Removing cell-type-specific edges has the largest impact on autoimmune disease prediction, confirming their critical role.
Interactive Drug Indication Explorer
Explore simulated drug indication predictions across disease categories. Select a disease to see how CellAwareGNN's cell-type-aware predictions differ from baseline TxGNN — highlighting cases where single-cell context reshapes drug rankings.
Select a disease above to explore predicted drug indications
Cell-Type Contribution to Predictions
When a disease is selected, this chart shows which cell types contribute most to CellAwareGNN's predictions:
Model Comparison for Selected Disease
References
Key publications underlying CellAwareGNN, scPrimeKG, the OneK1K cohort, and the broader landscape of graph-based drug repurposing.