Stable ai has recently launched a new next-generation model, ai/news/stable-code-2024-llm-code-completion-release?”>Stable code-3B, designed for code completion in multiple programming languages with multiple additional capabilities. The model is a continuation of the Stable Code Alpha 3B. It is trained on 1.3 trillion tokens, including natural language data and code data in 18 programming languages and codes. Compared to existing CodeLLaMA 7b models, stable-code-3b is 60% smaller while maintaining the high-level performance of the model.
Stable code-3B is an autoregressive language model based on the transformer decoder architecture. It offers many more features, uses the concept of padding intermediate capacity (FIM), and is trained on 16384 long sequence tokens that support long contexts. Its two key features are spin position inlays and a special tokenizer for intermediate capacity, along with other tokens. Training has been performed on several large-scale open source data sets. It is trained on a robust infrastructure using 256 NVIDIA A100 40GB GPUs and optimized using AdamW with bfloat16 accuracy. The model operates under 2D parallelism with ZeRO-1, incorporating innovative features such as flash attention and rotating embedding kernels from FlashAttention-2. Experiments with 6 existing models with various programming languages show the efficiency of the model by achieving around 30% accuracy in languages: CPP, Rust, Python, Java, PHP and Javascript. Other models showed slightly better performance in only one of the languages or in an extremely large model with 2.5 times more than Stable-Code-3B.
In conclusion, the stable-code-3b model represents a powerful tool for developers seeking a fundamental foundation in natural language processing applications. However, it is essential to note that the model has limitations and possible biases. As a base model, it requires careful evaluation and tuning to achieve safe and reliable performance in specific downstream applications. Developers should be aware of potential undesirable behavior, and it is recommended that these aspects be thoroughly evaluated and corrected before implementation to ensure that the model aligns with ethical and security standards.
Pragati Jhunjhunwala is a Consulting Intern at MarktechPost. She is currently pursuing B.tech from the Indian Institute of technology (IIT), Kharagpur. She is a technology enthusiast and has a keen interest in the scope of data science software and applications. She is always reading about the advancements in different fields of ai and ML.
<!– ai CONTENT END 2 –>