LazyLLM: Dynamic token pruning for efficient LLM inference in long contexts
This article was accepted at the Workshop on Efficient Systems for Foundation Models at ICML 2024 Inference of large transformer-based ...
This article was accepted at the Workshop on Efficient Systems for Foundation Models at ICML 2024 Inference of large transformer-based ...