The famous BERT model has recently been one of the main language models for natural language processing. The language model is suitable for a number of NLP tasks, those that transform the input stream into an output stream. BERT (Bidirectional Encoder Representations of Transformers) uses a transformer attention mechanism. An attention mechanism learns contextual relationships between words or subwords in a textual corpus. The BERT language model is one of the most prominent examples of NLP advances and uses self-supervised learning techniques.
Before developing the BERT model, a language model analyzed the text sequence at the time of training from left to right or left to right and right to left combined. This one-way approach worked well for generating sentences by next word prediction, attaching that to the sequence, followed by next word prediction until a complete meaningful sentence is obtained. With BERT, bidirectional training was introduced, providing a deeper sense of language context and flow compared to previous language models.
The original BERT model was released for the English language. Following that, other language models such as CamemBERT for French and GilBERTo for Italian were developed. Recently, a team of researchers from the University of Zurich has developed a multilingual linguistic model for Switzerland. Called SwissBERT, this model has been trained on more than 21 million Swiss news articles in Swiss Standard German, French, Italian, and Romansh Grischun with a total of 12 billion tokens.
SwissBERT was introduced to overcome the challenges faced by researchers in Switzerland due to the inability to perform multilingual tasks. Switzerland mainly has four official languages: German, French, Italian, and Romansh, and the individual language models for each particular language are difficult to combine for multilingual tasks. Furthermore, there is no separate neural language model for the fourth national language, Romansh. Since the implementation of multilingual tasks is somewhat difficult in the field of NLP, there was no unified model for the Swiss national language before SwissBERT. SwissBERT overcomes this challenge simply by combining articles in these languages and creating multilingual representations by implicitly exploiting common entities and events in the news.
The SwissBERT model has been remodeled from a multilingual modular transformer (X-MOD) that was pre-trained in 81 languages. The researchers have adapted a pretrained X-MOD transformer to their corpus by training custom language adapters. They have created a Swiss-specific subword vocabulary for SwissBERT, and the resulting model consists of a whopping 153 million parameters.
The team evaluated SwissBERT’s performance on tasks including recognizing entities named in contemporary news (SwissNER) and detecting stance in user-generated comments on Swiss politics. SwissBERT exceeds common baselines and improves on XLM-R in posture detection. When evaluating the model’s capabilities in Romansh, it was found that SwissBERT far outperforms models that have not been trained in the language in terms of zero-shot cross-language transfer and alignment of words and sentences between German and Romansh. However, the model did not perform very well in recognizing named entities in OCR-processed historical news.
The researchers have published SwissBERT with examples to refine subsequent tasks. This model looks promising for future research and even for non-commercial purposes. With further adaptation, subsequent tasks can benefit from the multilingualism of the model.
review the Paper, Blog and Model. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.
🔥 Promoted Reading: Document Processing and Intelligent Character Recognition (ICR) Innovations Over the Last Decade