Meet Marlin: an LLM FP16xINT4 inference core that can achieve near-ideal ~4x speedups up to medium batch sizes of 16 to 32 tokens 01/22/2024