Snowflake and CMU researchers present SuffixDecoding: a new model-free approach to accelerate large language model (LLM) inference using speculative decoding
Large Language Models (LLM) have quickly become a critical component of today's consumer and enterprise applications. However, the need for ...