KVSharer: A plug-and-play machine learning method that shares KV cache between layers to achieve layered compression
In recent times, large language models (LLMs) built on the Transformer architecture have demonstrated remarkable capabilities in a wide range ...