Publications

2025

  1. FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
    Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze
    MLSys 2025
  2. XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
    Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, Tianqi Chen
    MLSys 2025
  3. MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
    Ranajoy Sadhukhan*, Jian Chen*, Zhuoming Chen, Vashisth Tiwari, Ruihang Lai, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Tianqi Chen, Beidi Chen
    ICLR 2025
  4. Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
    Ruihang Lai*, Junru Shao*, Siyuan Feng*, Steven Lyubomirsky*, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared Roesch, Todd C. Mowry, Tianqi Chen
    ASPLOS 2025

2022

  1. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
    Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze
    ASPLOS 2023
  2. TensorIR: An Abstraction for Automatic Tensorized Program Optimization
    Siyuan Feng*, Bohan Hou*, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen
    ASPLOS 2023
  3. Tensor Program Optimization with Probabilistic Programs
    Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen
    NeurIPS 2022