Yiwei (David) Liang

Yiwei (David) Liang

Sophomore @ MIT, Biology, Chemistry, & AI
Email / liangyw@mit.edu · LinkedIn · CV · Google Scholar

Hi, I'm Yiwei (David) Liang, an undergraduate at MIT double majoring in Biochemistry and AI. I am currently an undergraduate researcher in the Sellers Lab at the Broad Institute, where I work on regulated CRISPR–Cas9 systems and in vivo tumor microenvironment screens. My interests include ML for medicine, drug discovery, and cancer therapy.

Research

Cancer Genomics Research
Sellers Lab, Broad Institute of MIT and Harvard · Sep 2024 – Present
Mentor: Matthew J. Emmett, M.D., Ph.D.
Research on in vivo cancer dependencies utilizing a Cas9/Anti-CRISPR switch in the tumor microenvironment. Poster to be presented at 2025 Broad Institute Cancer Program Retreat.
Cancer Immunotherapy Research
Gabr Lab, Weill Cornell Medicine · May 2022 – Aug 2023
Identified and validated a small-molecule TIGIT inhibitor for immune checkpoint therapy: performed virtual screening and confirmed binding/activity via cell-free and cell-based assays.

Projects

Nov 2025 – Dec 2025 · GitHub
Developed a novel protein function annotation framework that combines 3D structural information with sequence-based methods, achieving up to 114% performance improvement on Meta-BLEU-2. Designed multi-modal fusion algorithms (Weighted Similarity and Reciprocal Rank Fusion) to integrate structure-aware ProstT5 and evolutionary ESM-2 embeddings for robust protein retrieval. Engineered hierarchical prompt templates with task-specific biological terminology guidance, demonstrating that retrieval quality and prompt design are equally crucial for LLM performance. Built scalable Python infrastructure leveraging FAISS indexing, transformer models, and automated benchmarking across multiple tasks. Validated approach on the rigorous Prot-Inst-OOD dataset, showing particular strength in structure-dependent tasks like catalytic activity and domain motif.
Oct 2025 – Dec 2025 · GitHub
Investigated how representation choice affects neural reasoning efficiency in Transformers by multilingual mathematical benchmarks. Built a comprehensive evaluation framework testing SOTA models and 8B-parameter open-source models on GSM8K and MMATH across multiple languages. Discovered that well-trained models achieve representation-invariant reasoning, and when tokenizers align with denser representations, 510% token efficiency gains can be achieved. Demonstrated that representation bottlenecks stem from training distribution rather than architecture by reducing English-Chinese performance gap through LoRA fine-tuning. Developed modular Python infrastructure featuring multitokenizer analysis pipeline and automated evaluation across API and local models.