⚛️ Human Digital Twin Knowledge Graph
Multilevel, multi‑omics graph connecting genomics, transcriptomics, proteomics, metabolomics, phenotypes, and clinical data. Built by Karute for predictive, personalized medicine.
📦 Core entities (13)
🔗 11 relationship types
Example breast cancer: Alice (P1) HAS_DISEASE Breast Cancer; HAS_VARIANT BRCA1; variant AFFECTS transcript BRCA1-001; transcript TRANSLATES_TO protein BRCA1; protein ANNOTATED_WITH GO:DNA repair; biomarker CA-15-3 INDICATES Breast Cancer.
🗂️ Graph schema (label & property highlights)
Node examples & properties
- GenomicVariant: variant_id, chromosome, position, ref/alt
- Protein: uniprot_id, abundance, activity_state
- Metabolite: metabolite_id, chemical_formula, concentration
- HPOTerm: hpo_id, name, definition
- SignalingPathway: reactome_id, description
Relationship properties
- ASSOCIATED_WITH {odds_ratio, risk_allele, pmid}
- BINDS_TO {affinity, kd}
- HAS_GLUCOSE_STATE {fasting, timestamp}
- INDICATES {sensitivity, specificity}
- HAS_PHENOTYPE {onset_age, severity}
🧬 Predictive applications
Early cancer (breast) prediction
Alice's twin
- Genomic: BRCA1 pathogenic variant
- Proteomic: CA‑15‑3 25% above baseline
- Metabolomic: dysregulated X pathway
- KG traversal → risk 12% → 48% integrated
Type 2 diabetes prediction
TCF7L2 risk variant + HbA1c 6.0% + BCAAs↑
5‑year risk: HIGH (40%)
Modifiable: BCAAs, TNF‑alpha
Subtype: Severe insulin deficient (based on postprandial spikes)
“GLP‑1 agonist may reduce glucose variability by 25%.”
⏱️ Dynamic updates & LLM query
New lab results create new graph states. LLM fine‑tuned to translate questions → Cypher:
🧅 Multi‑omics data layers
🔄 Cross‑layer integration example (diabetes)
⚙️ Technology framework
Graph database
Neo4j / Amazon Neptune · TigerGraph. Property graph model with versioned states.
ETL & integration
Public ontologies: HPO, GO, Reactome, ClinVar, DisGeNET, HMDB, GTEx, ENCODE.
HDT‑LLM (specialized medical AI)
Base: LLaMA/Mistral fine‑tuned on biomedical text + Cypher generation. RAG pipeline: natural language → graph query → subgraph → answer.
📁 Data sources layer
📐 Simplified graph snippet
├─ HAS_VARIANT → TCF7L2 (rs7903146) → LOCATED_IN → Gene(TCF7L2)
│ └─ ASSOCIATED_WITH → Type2Diabetes
├─ HAS_GLUCOSE_STATE → HbA1c 6.5%
├─ HAS_PHENOTYPE → Hyperglycemia
└─ HAS_DISEASE_HISTORY → Type2Diabetes
Gene(TCF7L2) — TRANSCRIBES_TO → Transcript(TCF7L2 RNA) — TRANSLATES_TO → Protein(TCF7L2)
Protein(TCF7L2) — PARTICIPATES_IN → Wnt signaling pathway
Hormone(Insulin) — ACTIVATES → Insulin signaling pathway — OCCURS_IN → Beta cell
🕒 Temporal trajectory (diabetes risk)
🧠 Vision & challenges
From reactive to predictive medicine. Causality vs association, data volume (WGS ~100GB), interpretability, ethics.
Karute's HDT-KG enables digital twin trajectory, early disease预警, and personalized intervention simulation.
✨ Application 2 detailed: Diabetes scenarios – 5‑year risk, subtyping, treatment what‑if (semaglutide vs metformin) all encoded via KG paths.