Late last year and so far this year, 2023 has been a great time for ai people to create ai applications, and this is made possible thanks to a list of advances in ai made by non-profit researchers . Here is a list of them:
ALiBi is a method that efficiently addresses the problem of text extrapolation when it comes to Transformers, which extrapolates text sequences in inference that are longer than those it was trained on. ALiBi is an easy-to-implement method that does not affect runtime or require additional parameters and allows models to extrapolate by simply changing a few lines of the existing transformer code.
RoPE-based extrapolation scaling laws
This method is a framework that improves the extrapolation capabilities of transformers. Researchers found that tuning a Rotary
Position embedding (RoPe)-based LLM with a smaller or larger base on the pre-training context length could lead to better performance.
Transformers are powerful models capable of processing textual information. However, they require a large amount of memory when working with large text sequences. FlashAttention is an IO-friendly algorithm that trains transformers faster than existing baselines.
Conformators (a variant of transformers) are very effective in speech processing. They use a convolutional and self-attention layer sequentially, which makes their architecture difficult to interpret. Branchformer is an encoder alternative that is flexible and interpretable and has parallel branches to model dependencies in end-to-end speech processing tasks.
Although diffusion models achieve state-of-the-art performance on numerous image processing tasks, they are computationally very expensive and often consume hundreds of GPU days. Latent diffusion models are a variation of diffusion models and can achieve high performance on various image-based tasks while requiring much fewer resources.
CLIP-Guidance is a new method for generating 3D text that does not require large-scale labeled data sets. It works by leveraging (or taking guidance from) a pre-trained vision language model like CLIP that can learn to associate text descriptions with images, so researchers use it to generate images from text descriptions of 3D objects.
GPT-NeoX is an autoregressive language model consisting of 20 billion parameters. Performs reasonably well on a variety of mathematical and knowledge-based tasks. Their model weights have been made publicly available to promote research in a wide range of areas.
QLoRA is a tuning approach that efficiently reduces memory usage, allowing you to tune a 65 billion parameter model on a single 48 GB GPU while maintaining optimal task performance with full 16-bit precision. Through QLoRA fine-tuning, models can achieve state-of-the-art results, outperforming previous SoTA models, even with a smaller model architecture.
The receive-weighted key value (RMKV) model is a novel architecture that leverages and combines the strengths of transformers and recurrent neural networks (RNNs) while avoiding their key drawbacks. RMKV offers performance comparable to similarly sized transformers, paving the way to develop more efficient models in the future.
All credit for this research goes to the researchers of these individual projects. This article is inspired by this tweet. Also, don’t forget to join. our 32k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and have a keen interest in Data Science, especially Neural Networks and its application in various areas.
<!– ai CONTENT END 2 –>