This AI research introduces flash decoding: a new FlashAttention-based AI approach to perform long-context LLM inference up to 8x faster 10/18/2023