Draft& Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

出版物
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024