Publications
2025
- FlashInfer: Efficient and Customizable Attention Engine for LLM Inference ServingMLSys 2025 (Outstanding Paper Award)
- XGrammar: Flexible and Efficient Structured Generation Engine for Large Language ModelsMLSys 2025
- MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative DecodingICLR 2025
-