Draft& Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Publication
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024