The scaling rule of linguistic models has produced success like never before. These enormous language models have acquired new emerging capabilities, as well as demonstrating tremendous superiority over previous paradigms for many disciplines when trained on immense amounts of textual data. Although very robust and evolving rapidly, these scale models should still be ideal or sufficient for most real-world applications. The open source community has worked hard to provide robust, open-access LLMs that cover a variety of data sources, architectures, language modeling goals, training pipelines, model scales, and experience languages, such as BLOOM, LLaMA, FlanT5 and AlexaTM. .
Chinese-LLaMA, MOSS, Huatuo, Luotuo, and Phoenix are some of the many great language models made available by the open source community, either by pre-training from scratch or by further refining existing multilingual models. These publicly accessible LLMs make robust general language models and several unique decoder variants available to researchers and developers. Still, the Encoder-Decoder framework remains unexplored, and is universally effective for multiple tasks, including language comprehension, common sense reasoning, question and answering, information retrieval, and multi-turn conversations.
Researchers from Soochow University contribute an open source bilingual asymmetric 15B (OpenBA) seq2seq model that has been pre-trained from scratch to fill this gap, providing not only model checkpoints but also data collection and processing. of information to create the pre-training. Bilingual Flan data and collection from freely available data sources (such as Common Crawl, Pile corpus, and C-Book), the motivations and empirical observations for the design of the model architecture, and key information from other improved models. They specifically collected pre-training data balanced between English and Chinese tokens to aid in Chinese language modeling. They include additional English data from the Flan collection in their Bilingual-Flan corpus, as it is challenging to create a Flan-like Chinese collection that covers a wide range of works and settings using only available resources.
They use a different asymmetric model structure, i.e., deep decoder with shallow encoder, to improve the generation capacity. This differs from the vanilla Flan-T5 of a balanced encoder-decoder structure and the AlexaTM asymmetric deep encoder-decoder. The three stages of their training procedure are UL2 pre-training, length adaptation, and Flan training. They also apply improvement tactics to model architecture and training to improve model capability, stability, and effectiveness. The effectiveness of their model has been demonstrated in tests using a variety of benchmarks (MMLU, CMMLU, C-Eval, SuperGLUE, BELEBELE, and BBH) and tasks (such as comprehension, reasoning, and generation). These tests also included no-shot, low-shot, hold, and hold configurations.
Their model can outperform many typical models, such as LLaMA-70B on BELEBELE, BLOOM-176B on MMLU, ChatGLM-6B on CMMLU, and C-Eval, despite being recently trained with 380B tokens. Compared to the LLaMA-7B model, which uses 14 tCO2eq during the training phase, OpenBA-15B uses approximately 6.5 tCO2eq in total. All information related to the implementation, including data collection and processing, codes, model checkpoints, and evaluations, is publicly available. They encourage any comments and recommendations as they continue to work on ways to improve and implement the OpenBA paradigm, and look forward to continuing their collaboration with the open source community.
Review the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our SubReddit of more than 30,000 ml, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Data Science and artificial intelligence at the Indian Institute of technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around it. She loves connecting with people and collaborating on interesting projects.
<!– ai CONTENT END 2 –>