PhD @ Max Planck Institute for Informatics · Advised by Yiting Xia
I build datacenter systems — from optical datacenter networks and nanosecond-precision time synchronization to fault-tolerant distributed ML training.
Research · Google Scholar
- 📄 Two first-author papers at NSDI'26 — OpenOptics (optical DCN framework) and SyncWise (time synchronization)
- 📄 HotNets'22 — parallelism-aware flow scheduling for distributed training
- 📄 Phoenix — checkpoint-less failure recovery for auto-parallelism in JAX/XLA (under submission)
Open Source
- 🔧 OpenOptics — design, test, and deploy optical DCN architectures in ~10 lines of Python
- 🔧 SyncWise — error-aware time synchronization for reconfigurable DCNs
- 🔧 JAX — contributed to JAX's fault tolerance API
Industry
- 🏢 Applied Scientist Intern @ AWS AI (2024–2025) — resilient distributed training with JAX/XLA

