Google AI Introduces WeatherBench 2: A Machine Learning Framework for Evaluating and Comparing Various Weather Forecasting Models

Machine learning (ML) has been used increasingly in weather forecasting in recent years. Now that ML models can compete with operational physics-based models in terms of accuracy, there is hope that this progress may soon make it possible to enhance the precision of weather forecasts around the world. Open and reproducible evaluations of novel methods using objective and established metrics are crucial to achieving this goal.

Recent research by Google, Deepmind, and the European Centre for Medium-Range Weather Forecasts presents WeatherBench 2, a benchmarking and comparison framework for weather prediction models. In addition to a thorough replica of the ERA5 dataset used for training most ML models, WeatherBench 2 features an open-source evaluation code and publicly available, cloud-optimized ground-truth and baseline datasets.

Currently, WeatherBench 2 is optimized for global, medium-range (1-15 day) forecasting. The researchers plan to look at incorporating evaluation and baselines for more jobs, such as nowcasting and short-term (0-24 hour) and long-term (15+ day) prediction, in the near future.

The accuracy of weather predictions is difficult to evaluate with a simple score. The average temperature may be more important to one user than the frequency and severity of wind gusts. Because of this, WeatherBench 2 includes numerous measures. Several important criteria, or “headline” metrics, were defined to summarize the study in a way consistent with the standard assessment performed by meteorological agencies and the World Meteorological Organization.

WeatherBench 2.0 (WB2) is the gold standard for data-driven, worldwide weather forecasting. It’s inspired by all the new AI techniques that have cropped up since the first WeatherBench benchmark was released. WB2 is built to closely mimic the operational forecast evaluation used by many weather centers. It also provides a solid foundation for comparing experimental methods to these operational standards.

The goal is to facilitate efficient machine learning operations and guarantee reproducible findings by publicly making evaluation codes and data available. The researchers believe WB2 can be expanded with additional metrics and baselines based on the community’s demands. The paper has already hinted at several potential extensions, including more attention to assessing extremes and impact variables at fine scales, maybe through station observations.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.