LazyLLM: Dynamic token pruning for efficient LLM inference in long contexts by Technical Terrence Team 08/02/2024 0 This article was accepted at the Workshop on Efficient Systems for Foundation Models at ICML 2024 Inference of large transformer-based ...