Creating large language models for European languages that may have less data than English is a challenge for artificial intelligence. Companies in the technology world have been working on this and, recently, ai/”>a startup from Helsinki, Finland, ai/blog/poro-a-family-of-open-models-that-bring-european-languages-to-the-frontier”>introduced a new solution to this problem.
Before this, some language models were available, but they were often language-specific and might have worked better for languages with less data. The problem was that these models had to capture the unique characteristics, culture and value base of each European language. Existing solutions were limited and something more inclusive was needed.
Now, a ai/”>Finnish ai Startup has developed an open source solution called Half. It is a great linguistic model that aims to cover the 24 official languages of the European Union. The idea is to create a family of models that understand and represent the diversity of European languages. The startup believes this is important for digital sovereignty, ensuring that the value created by these models remains within Europe.
Poro is designed to address the challenge of training language models for languages with less data available, such as Finnish. It uses a multilingual training approach, meaning it learns from data in higher-resource languages, such as English, to improve its performance in lower-resource languages.
He The Poro 34B model has 34.2 billion parameters and uses a unique architecture called BLOOM transformer with ALiBi embeddings.. It is trained on a massive multilingual dataset, covering languages and programming languages such as Python and Java. Training is carried out on one of the fastest supercomputers in Europe, which provides enormous computing power.
The startup publishes checkpoints throughout the model training process, showing its progress. Even with a 30% advance, Poro is showing cutting-edge results. In tests, it outperforms existing models for Finns and is on track to match or surpass English performance.
In conclusion, Poro represents a step forward in ai, specifically for European languages. It is not just about creating a powerful linguistic model, but about doing so in an open and transparent way that respects the diversity of languages and cultures in Europe. If successful, Poro could be a game-changer and offer a local alternative to the language models of major tech companies.
Niharika is a Technical Consulting Intern at Marktechpost. She is a third-year student currently pursuing her B.tech degree at the Indian Institute of technology (IIT), Kharagpur. She is a very enthusiastic person with a keen interest in machine learning, data science and artificial intelligence and an avid reader of the latest developments in these fields.
<!– ai CONTENT END 2 –>