CrowdStrike has blamed faulty testing software for a buggy update that bricked 8.5 million Windows machines worldwide, it wrote in a statement. Post-incident review (PIR). “Due to an error in the Content Validator, one of the two (updates) passed validation despite containing problematic data,” the company said. It promised a series of new measures to prevent the problem from happening again.
The massive Blue Screen of Death (BSOD) affected several companies around the world, including airlines, broadcasters, the London Stock Exchange, and many others. The problem forced Windows machines to boot into a loop, and technicians needed local access to the machines to recover (Apple and Linux machines were not affected). Many companies, such as Delta Airlines, are still recovering.
To prevent DDoS and other types of attacks, CrowdStrike has a tool called Falcon Sensor. It comes with content that works at the core level (called Sensor Content) that uses a “Template Type” to define how it defends against threats. If something new appears, it sends “Rapid Response Content” in the form of “Template Instances.”
On March 5, 2024, a template type for a new sensor was released and it worked as expected. However, on July 19, two new template instances were released and one (just 40KB in size) passed validation despite having “problematic data,” CrowdStrike said. “When the sensor received it and loaded it into the content interpreter, (this) resulted in an out-of-bounds memory read that triggered an exception. This unexpected exception could not be handled properly, resulting in a Windows operating system crash (BSOD).”
To prevent a repeat of the incident, CrowdStrike has promised to take several steps. First, more thorough testing of Rapid Response content will be conducted, including local developer testing, content update and rollback testing, stress testing, stability testing, and more. Validation checks will also be added and error handling will be improved.
Additionally, the company will begin using a phased rollout strategy for rapid response content to prevent a repeat of the global disruption. It will also give customers greater control over the delivery of such content and provide release notes for updates.
However, some analysts and engineers believe the company should have implemented these measures from the start. “CrowdStrike must have been aware that these updates are interpreted by drivers and could cause problems,” said engineer Florian Roth. twitter.com/cyb3rops/status/1815981491949645959″ rel=”nofollow noopener” target=”_blank” data-ylk=”slk:posted on x;cpos:5;pos:1;elm:context_link;itc:0;sec:content-canvas” class=”link “>published in x“They should have implemented a phased rollout strategy for rapid response content from the beginning.”
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>