Researchers often use simulations when designing new algorithms, since testing ideas in the real world can be expensive and risky. But since it’s impossible to capture all the details of a complex system in a simulation, they typically collect a small amount of real data that they reproduce while simulating the components they want to study.
Known as trace-based simulation (the small pieces of actual data are called traces), this method sometimes produces skewed results. This means that researchers may unknowingly choose an algorithm that is not the best they have tested and that will perform worse on real data than the simulation predicted.
MIT researchers have developed a new method that removes this source of bias in trace-based simulation. By enabling unbiased trace-based simulations, the new technique could help researchers design better algorithms for a variety of applications, including improving the quality of Internet video and increasing the performance of data processing systems.
The researchers’ machine learning algorithm relies on causality principles to learn how data traces were affected by system behavior. In this way, they can reproduce the correct and unbiased version of the trace during the simulation.
Compared to a previously developed trace-based simulator, the researchers’ simulation method correctly predicted which newly designed algorithm would be best for streaming video—that is, the one that would lead to less buffering and higher visual quality. Existing simulators that do not account for bias would have pointed researchers to a worse performing algorithm.
“Data is not the only thing that matters. The story behind how the data is generated and collected is also important. If you want to answer a hypothetical question, you need to know the underlying story of data generation so you only get involved in the things you really want to simulate,” says Arash Nasr-Esfahany, a graduate student in electrical engineering and computer science (EECS). and co-lead author of a paper about this new technique.
He is joined on the paper by co-lead authors and fellow EECS graduate students Abdullah Alomar and Pouya Hamadanian; recent graduate student Anish Agarwal PhD ’21; and lead authors Mohammad Alizadeh, associate professor of electrical and computer engineering; and Devavrat Shah, the Andrew and Erna Viterbi Professor at EECS and a member of the Institute for Data, Systems and Society and the Information and Decision Systems Laboratory. The research was recently presented at the USENIX Symposium on Design and Implementation of Networked Systems.
misleading simulations
MIT researchers studied trace-based simulation in the context of streaming video applications.
In video streaming, an adaptive bitrate algorithm continually decides the video quality, or bitrate, to transfer to a device based on real-time data from the user’s bandwidth. To test how different adaptive bitrate algorithms affect network performance, researchers can collect real data from users during a video stream for a trace-based simulation.
They use these traces to simulate what would have happened to network performance if the platform had used a different adaptive bitrate algorithm under the same underlying conditions.
Researchers have traditionally assumed that tracking data is exogenous, meaning that it is unaffected by factors that change during simulation. They would assume that, during the period in which they collected the network performance data, the choices made by the bitrate adaptation algorithm did not affect that data.
But this is often a false assumption that results in biases about the behavior of new algorithms, rendering the simulation invalid, Alizadeh explains.
“We recognized, and others have recognized, that this way of doing simulation can be error-inducing. But I don’t think people necessarily knew how significant those bugs could be,” he says.
To develop a solution, Alizadeh and her collaborators framed the problem as a causal inference problem. To collect an unbiased trace, one must understand the different causes that affect the observed data. Some causes are intrinsic to a system, while others are affected by the actions that are taken.
In the streaming video example, the network throughput is affected by the choices made by the bit rate matching algorithm, but is also affected by intrinsics such as network capacity.
“Our task is to unravel these two effects, to try to understand which aspects of the behavior that we are seeing are intrinsic to the system, and how much of what we are observing is based on the actions that were taken. If we can tease out these two effects, then we can do unbiased simulations,” he says.
Learn from data
But researchers often cannot directly observe intrinsic properties. This is where the new tool, called CausalSim, comes in. The algorithm can learn the underlying characteristics of a system using only the trace data.
CausalSim takes tracking data that was collected through a randomized control trial and estimates the underlying functions that produced that data. The model tells researchers, under exactly the same underlying conditions that a user experienced, how a new algorithm would change the outcome.
Using a typical trace-based simulator, bias can lead a researcher to select a worse performing algorithm, even though the simulation indicates that it should be better. CausalSim helps researchers select the best algorithm to be tested.
The MIT researchers observed this in practice. When they used CausalSim to design an improved bitrate adaptation algorithm, it led them to select a new variant that had a block rate nearly 1.4 times lower than a well-accepted competitive algorithm, while achieving the same video quality. . The block rate is the amount of time a user spent re-caching the video.
By contrast, an expertly designed trace-based simulator predicted otherwise. He indicated that this new variant should cause an almost 1.3 times higher crash rate. The researchers tested the algorithm on real world video streaming and confirmed that CausalSim was correct.
“The gains we were making on the new variant were very close to CausalSim’s prediction, while the expert simulator was very far off. This is really exciting because this expertly designed simulator has been used in research for the last decade. If CausalSim can be so clearly better than this, who knows what we can make of it? Hamadanian says.
Over a 10-month experiment, CausalSim steadily improved simulation accuracy, resulting in algorithms that made about half as many errors as those designed with reference methods.
In the future, the researchers want to apply CausalSim in situations where data from randomized controlled trials are not available or where it is particularly difficult to recover the causal dynamics of the system. They also want to explore how to design and monitor systems to make them more amenable to causal analysis.