2025 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze MLSys 2025 arXiv XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, Tianqi Chen MLSys 2025 arXiv MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Ranajoy Sadhukhan*, Jian Chen*, Zhuoming Chen, Vashisth Tiwari, Ruihang Lai, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Tianqi Chen, Beidi Chen ICLR 2025 PDF Relax: Composable Abstractions for End-to-End Dynamic Machine Learning Ruihang Lai*, Junru Shao*, Siyuan Feng*, Steven Lyubomirsky*, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared Roesch, Todd C. Mowry, Tianqi Chen ASPLOS 2025 arXiv 2022 SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze ASPLOS 2023 arXiv Code TensorIR: An Abstraction for Automatic Tensorized Program Optimization Siyuan Feng*, Bohan Hou*, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen ASPLOS 2023 arXiv Tensor Program Optimization with Probabilistic Programs Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen NeurIPS 2022 arXiv