Towards Low-Bit Communication for Parallel Tensor LLM Inference
This paper was accepted into the Efficient Natural Speech and Language Processing (ENLSP) Workshop at NeurIPS 2024. Tensor parallelism provides ...
This paper was accepted into the Efficient Natural Speech and Language Processing (ENLSP) Workshop at NeurIPS 2024. Tensor parallelism provides ...
The domain of large language model (LLM) quantization has attracted attention due to its potential to make powerful ai technologies ...