Combining large and small LLMs to increase inference time and quality | by Richa Gadgil | December 2024
Implementation of speculative and contrastive decodingLarge language models are composed of billions of parameters (weights). For each word it generates, ...