The return of spring in the northern hemisphere begins tornado season. A tornado's twisting funnel of dust and debris seems like an unmistakable sight. But that vision can be hidden from radar, the tool of meteorologists. It is difficult to know exactly when a tornado has formed, or even why.
A new data set could contain answers. It contains radar data from thousands of tornadoes that have hit the United States over the past 10 years. The storms that spawned tornadoes are flanked by other severe storms, some with nearly identical conditions, that never did. The researchers at MIT Lincoln Laboratory who curated the data set, called TorNet, they have now released it open source. They hope to allow advances in the detection of one of the most mysterious and violent phenomena in nature.
“Much of the progress is due to readily available reference data sets. We hope that TorNet lays the foundation for machine learning algorithms to detect and predict tornadoes,” says Mark Veillette, co-principal investigator of the project with James Kurdzo. Both researchers work in the Air Traffic Control Systems Group.
Along with the data set, the team is releasing models trained on it. The models show promise for machine learning's ability to detect a tornado. Building on this work could open new frontiers for forecasters, helping them provide more accurate warnings that could save lives.
Swirling uncertainty
About 1,200 tornadoes occur in the United States each year, causing millions to billions of dollars in losses. economic damage and claiming 71 lives on average. Last year, one unusually long lasting tornado It killed 17 people and injured at least 165 others along a 59-mile path in Mississippi.
However, tornadoes are notoriously difficult to forecast because scientists don't have a clear idea why they form. “We can see two storms that look identical, one will produce a tornado and the other will not. We don't fully understand it,” says Kurdzo.
The basic ingredients of a tornado are thunderstorms with instability caused by a rapid rise of warm air and wind shear that causes rotation. Weather radar is the primary tool used to monitor these conditions. But tornadoes are too low to be detected, even when they are moderately close to radar. As the radar beam at a given tilt angle moves away from the antenna, it rises above the ground and sees primarily the reflections of rain and hail carried by the “mesocyclone,” the storm's broad rotating updraft. . A mesocyclone does not always produce a tornado.
With this limited view, forecasters must decide whether or not to issue a tornado warning. They often err on the side of caution. As a result, the false alarm rate for tornado warnings is more than 70 percent. “That can lead to the boy who cried wolf syndrome,” Kurdzo says.
In recent years, researchers have turned to machine learning to better detect and predict tornadoes. However, raw data sets and models have not always been accessible to the broader community, which has stifled progress. TorNet is filling this gap.
The data set contains more than 200,000 radar images, 13,587 of which depict tornadoes. The rest of the images are non-tornadic, taken from storms in one of two categories: randomly selected severe storms or false alarm storms (those that led a forecaster to issue a warning but did not produce a tornado).
Each storm or tornado sample comprises two sets of six radar images. The two sets correspond to different radar scanning angles. The six images represent different radar data products, such as reflectivity (which shows the intensity of precipitation) or radial velocity (which indicates whether winds are moving toward or away from the radar).
A challenge in selecting the data set was first finding tornadoes. Within the corpus of weather radar data, tornadoes are extremely rare events. The team then had to balance those tornado samples with difficult non-tornado samples. If the data set were too simple, for example comparing tornadoes to blizzards, an algorithm trained on the data would likely overclassify the storms as tornadoes.
“The nice thing about a true benchmark data set is that we are all working with the same data, with the same level of difficulty, and we can compare results,” says Veillette. “It also makes meteorology more accessible to data scientists and vice versa. “It becomes easier for these two parties to work on a common problem.”
Both researchers represent the progress that can come from cross-collaboration. Veillette is a mathematician and algorithm developer who has long been fascinated by tornadoes. Kurdzo is a meteorologist by training and an expert in signal processing. In graduate school, he chased tornadoes with customized mobile radars, collecting data to analyze in new ways.
“This data set also means that a graduate student does not have to spend a year or two creating a data set. They can jump right into their research,” says Kurdzo.
This project was funded by Lincoln Laboratory. technology-office/climate-change-technology-national-security”>Climate change initiativewhich aims to leverage the laboratory's diverse technical strengths to help address climate issues that threaten human health and global security.
Seeking answers with deep learning
Using the data set, the researchers developed basic artificial intelligence (ai) models. They were especially interested in applying deep learning, a form of machine learning that excels at processing visual data. On its own, deep learning can extract features (key observations that an algorithm uses to make a decision) from images in a data set. Other machine learning approaches require humans to first manually label features.
“We wanted to see if deep learning could rediscover what people typically look for in tornadoes and even identify new things that forecasters don't typically look for,” Veillette says.
The results are promising. Their deep learning model performed similarly or better than all known tornado detection algorithms in the literature. The trained algorithm correctly classified 50 percent of the weakest EF-1 tornadoes and more than 85 percent of the tornadoes classified as EF-2 or higher, which are the most devastating and costly events of these storms.
They also evaluated two other types of machine learning models and a traditional model for comparison. The source code and parameters of all these models are freely available. The models and data set are also described in a paper submitted to a journal of the American Meteorological Society (AMS). Veillette presented this work at the AMS annual meeting in January.
“The main reason for publishing our models is for the community to improve them and do other great things,” Kurdzo says. “The best solution might be a deep learning model, or someone might discover that a non-deep learning model is actually better.”
TorNet could also be useful in the meteorological community for other uses, such as conducting large-scale case studies on storms. It could also be expanded with other data sources, such as satellite images or lightning maps. Merging multiple data types could improve the accuracy of machine learning models.
Taking steps towards operations
In addition to detecting tornadoes, Kurdzo hopes the models can help unravel the science of why they form.
“As scientists, we see all of these precursors to tornadoes: an increase in low-level rotation, a hook echo in the reflectivity data, specific differential phase feet (KDP), and arcs of differential reflectivity (ZDR). But how do they all go together? And are there physical manifestations that we don't know about? he asks.
Unraveling those answers could be possible with explainable ai. Explainable ai refers to methods that allow a model to provide its reasoning, in a human-understandable format, for why it made a certain decision. In this case, these explanations could reveal physical processes that occur before tornadoes. This knowledge could help train forecasters and models to recognize signals sooner.
“None of these technologies are intended to replace a forecaster. But maybe one day it can guide forecasters' eyes in complex situations and give a visual warning about an area where tornado activity is expected,” Kurdzo says.
This assistance could be especially useful as radar technology improves and future networks potentially become denser. Data update rates on a next-generation radar network are expected to increase from every five minutes to about a minute, perhaps faster than forecasters can interpret the new information. Since deep learning can process huge amounts of data quickly, it could be well suited for monitoring radar returns in real time, alongside humans. Tornadoes can form and disappear in minutes.
But the path to a working algorithm is a long one, especially in safety-critical situations, Veillette says. “I think the forecasting community is still understandably skeptical of machine learning. One way to build trust and transparency is to have public reference data sets like this. It is a first step.”
The team hopes the next steps will be taken by researchers around the world who are inspired by the data set and motivated to build their own algorithms. Those algorithms, in turn, will move into testbeds, where they will eventually be shown to forecasters, to begin a process of transitioning to operations.
In the end, the path could return to trust.
“We may never get a tornado warning longer than 10 to 15 minutes using these tools. But if we could reduce the rate of false alarms, we could begin to improve public perception,” says Kurdzo. “People will use those warnings to take the necessary steps to save their lives.”