The Dark Matter of the Genome
Recoding the non-expressed genome into a new generation of molecules
For nearly fifty years, drug discovery focused almost entirely on the ~2% of the genome that codes for proteins. The remaining ~98% — non-coding, non-expressing, and “retired” sequences — were dismissed as junk. Our work is built on a different premise, now supported by more than fifteen years of evidence: this dark matter of the genome is a vast, untapped reservoir of functional molecules that can be computationally decoded, synthetically expressed, and engineered into first-in-class therapeutics, enzymes, and vaccines.
A discovery, eighteen years in the making
The foundation was laid in 2009, when our group provided the first experimental proof that naturally silent intergenic DNA from Escherichia coli can be synthetically expressed into functional proteins. Six non-expressing intergenic sequences were cloned and induced; all six produced protein, one (Eka1) showed clear biological activity, and computational modelling predicted stable globular folds. What began as a single experiment has since become a reproducible, sustained discovery platform.
The untapped reservoir
We organise the dark genome into three functional classes, each a distinct source of new biomolecules:
-
Non-expressed DNA
intergenic regions, antisense strands, reverse open reading frames, and repetitive elements
-
Non-translating RNA
introns, ribosomal RNA, transfer RNA, microRNA, and long non-coding RNA.
-
Retired genes
pseudogenes, the evolutionary relics scattered across every genome.
Conventional biology treats all three as silent. We have shown that their redesigned versions encode functional peptides, proteins, and pathways.
The platform: The Dark Genome
The platform is not a concept; it has delivered functional molecules against major disease classes:
Anti-malarial
peptides from yeast intergenic sequences blocked more than 60% of Plasmodium falciparum parasites from invading red blood cells.
Anti-Alzheimer's
peptides from yeast intergenic sequences blocked more than 60% of Plasmodium falciparum parasites from invading red blood cells.
Antimicrobials
peptides from E. coli intergenic sequences showed strong activity against both Gram-positive and Gram-negative bacteria.
Vaccines
Anti-leishmanial (tREPs)
tREP-18, a peptide encoded by transfer RNA, showed potent activity at nanomolar concentrations (IC₅₀ ≈ 22 nM) while remaining safe to human cells — the first evidence that tRNA can be repurposed into a therapeutic molecule, defining an entirely new class we call tRNA-encoded peptides.
Why it matters
For academia
This is Functional Genomics 2.0 — a shift from studying individual genes to treating the entire genome as a design canvas. It reframes how we read evolution, expands the known proteome, and opens new questions about why nature transcribes only a fraction of its own code.
For industry
Traditional pipelines are stalling on derivatives of known drugs, rising resistance, and ballooning R&D costs. The dark genome supplies entirely new molecular starting points — unconstrained by homology or historical annotation — for diseases where current therapies fall short, from drug-resistant infections to neurodegeneration. Because a single genome can be mined for thousands of candidates with AI and quantum tools, the approach also democratises discovery: a fast, data-rich, adaptive pipeline that does not depend on large compound libraries.
Our vision
To build a deep genome foundry from the dark matter of the genome to deliver novel medicines, enzymes, and pathways, originating from the unread instructions already within our own DNA.
References
- Chakrabarti, A., Kaushik, M., Khan, J., et al. (2022). tREPs – _a new class of functional tRNA encoded peptides. ACS Omega, 7(22), 18361–18373. https://doi.org/10.1021/acsomega.2c01234
- Dhar, P. K., Nanduri, B., et al. (2009). Synthesizing non-natural parts from natural genomic template. Journal of Biological Engineering, 3, 2. https://doi.org/10.1186/1754-1611-3-2
- Garg, M., & Dhar, P. K. (2023a). Repurposing the Dark Genome I: Antisense Proteins. bioRxiv. https://doi.org/10.1101/2023.03.15.532699
- Garg, M., & Dhar, P. K. (2023b). Repurposing The Dark Genome. III - Intronic Proteins. bioRxiv. https://doi.org/10.1101/2023.06.10.544447
- Joshi, M., Kundapura, S. V., Poovaiah, T., Ingle, K., & Dhar, P. K. (2013). Discovering novel anti-malarial peptides from the not-coding genome—A working hypothesis. Current Synthetic and Systems Biology, 1(1).
- Krishnan, R., Kumar, V., Ananth, V., et al. (2015). Computational identification of novel microRNAs and their targets in the malarial vector Anopheles stephensi. Systems and Synthetic Biology Journal, 9, 11–17.
- Krishnan, K.; Chugh, A.; Niranjan, V.; Dhar, P. K. Recoding Genomic Elements with AI and Quantum Computation to Build the Next Generation Drug Discovery Platform. Preprints 2025, 2025051422. https://doi.org/10.20944/preprints202505.1422.v1
- Nayak, S., & Dhar, P. K. (2023a). Repurposing the Dark Genome II – _Reverse Proteins. bioRxiv. https://doi.org/10.1101/2023.03.20.533367
- Nayak, S., & Dhar, P. K. (2023b). Repurposing the Dark Genome IV – _Noncoding Proteins. bioRxiv. https://doi.org/10.1101/2023.06.29.547021
- Raj, N., Helen, A., Manoj, N., et al. (2015). In silico study of peptide inhibitors against BACE. Systems and Synthetic Biology Journal, 9, 67–72.
- Shidhi, P. R., Suravajhala, P., Nayeema, A., et al. (2015). Making novel proteins from pseudogenes. Bioinformatics, 31(1), 33–39. https://doi.org/10.1093/bioinformatics/btu585 Varughese, D., Nair, A. S., & Dhar, P. K. (2017). Function annotation of novel peptides generated from the non-expressing genome of Drosophila melanogaster. Bioinformation, 13(1), 17–20.
- Verma, N., Manvati, S., & Dhar, P. K. (2023). Harnessing Escherichia coli’s Dark Genome to Produce Anti-Alzheimer Peptides. bioRxiv. https://doi.org/10.1101/2023.06.23.546343