Airport delays, banks not working correctly, hospitals not able to provide services and many other daily services were affected due to the CrowdStrike Falcon EDR issue. But, what was the cause of this? And what can we learn from this incident? Let’s explore the issue report provided from CrowdStrike itself and make an analysis of it.
What was the real cause?
On August 8th CrowdStrike released the root cause analysis of the issue related to a problematic update to its Falcon Sensor agent on Windows.
In brief the report[1] stated that it was due to an out-of-bounds read issue. From the 21 input parameter fields that should be used in the process, only 20 were actually passed, so even when the update was tested rigorously with the Template Instance the flaw was not discovered.
On July 19th a new update made use of the 21st parameter, but the sensor program was only expecting 20 parameters. Since the parameter was used in a comparison, the attempt to access the value caused the out of bound reads issue.
Although it seems to be a very simple reason, it had a huge impact on end users and different industries.
Oh my Windows!! [2]
How could it be?!
Software development has never been an easy task. Big and complex systems depend on automated processes to test the software quality and security. Sometimes small details such as this are not perceived by the development and test teams. Other times, releasing software updates directly to production without properly testing it on stage systems could be the reason for many problems.
Automation can be complicated too![3]
Software dedicated to OS security can easily create such a catastrophic effect on the system. They have high permission levels and can read, write and modify OS files or processes. This can also provide opportunities for privilege elevation or execution of code remotely. Unfortunately there is still not a clear option to avoid such high privileges in these programs.
As mentioned by Johannes Ullrich in the SANS NewBites post[4] :
“Complex security software requiring frequent updates requires high levels of runtime protection and extensive pre-release testing of updates.”.
So it is important to keep good testing and release strategies to avoid this problem as much as possible.
What was the impact on CrowdStrike’s reputation?
The CrowdStrike’s public image has received a hard strike. Right after the incident happened the company market dropped significantly.
CrowdStrike market showing the impact after July 19th incident[5]
People stuck in airports, flights delayed, and financial loss did not help to ease the impact on the company’s image. Even the Canadian Cybersecurity Center made a public alert related to this incident[6] and suggested steps for mitigating the incident as soon as possible since this was a cyber threat. Misinformation also helped to spread fear among users and stakeholders.
CrowdStrike has already corrected the issue with the sensors. They have acknowledged the issue publicly and made a report of the incident and how it was mitigated[1]. This helps to renew the confidence in them, as mentioned by Lee Neely in the SANS blog[4]:
“CrowdStrike has been extremely forthcoming in acknowledging and subsequently releasing technical details of the flaw in their application development and update process. […] Publishing root cause analysis and hiring not one but two outside security review teams are each calculated steps by CrowdStrike at damage control. It appears to be working.”.
But only time will really tell if it will not affect the company’s image permanently.
What can we learn from this?
As information security experts, it is necessary to be aware of all threats as much as possible. Even if the issue is not directly related to our systems or programs, they could be sharing information with affected assets. It is important to have a broad panorama of the assets and how they communicate with each other. Remember, even if you try to lock all the doors and backdoors, attackers only need one “open” door to do damage. That is why keeping updated and working with the current best practices will help to avoid possible problems.
References:
[2]https://www.deviantart.com/salmanarif/art/Windows-Error-Reporting-120299488
[3]https://open.substack.com/pub/workchronicles/p/comic-automation?utm_campaign=post&utm_medium=web
[4]https://www.sans.org/newsletters/newsbites/xxvi-61/
[5] NASDAQ: CRWD
It is amazing and concerning how a single flaw can affect so many aspects and business activities worldwide, affecting market value as well. The trust of organizations and individuals in cybersecurity security firms was shaken by how quickly they reacted to this CrowdStrike incident and how negatively it affected their daily activities. Though CrowdStrike seems to be recovering, it will take some time for them to gain the trust of their clients. As said in the following web articles: https://www.cybersecuritydive.com/news/insured-losses-crowdstrike-1-billion/723315/#:~:text=The%20CrowdStrike%20incident%20is%20widely,offline%20in%20multiple%20U.S.%20cities.
I totally agree Michael! One mistake and the whole system goes down. It surely will have an impact on cybersecurity software industry. But at least we can learn from this incident and try to avoid them as much as possible. Thanks for the comment!
It is even more interesting reading about Microsoft’s recovery method. The error presents the notorious and clueless Window Blue Screen of Death which could be a challenge point especially if the computer’s disk is encrypted. In a mitigation effort from Microsoft, it was mentioned in the step (5) of the recovery procedure to logon to a Microsoft portal and retrieve the disk encryption key. But what if disk encrypted with different software that’s not Microsoft, or what if the encryption management system is running on a Windows impacted system?
From a different angle, many of the “zero-day attacks” could mandate a rapid deployment where no time for conducting rigorous testing and staging phases.
Finally, even the CrowdStrike agent flaw is not a security attack, however it surfaced a vulnerability that very similar to the Heart Bleeding happened to OpenSsl and waiting for an exploitation either by an attacker or by a misconfiguration. According to the OWASP Top Ten Risks 2025, in particular A04:2021 points to the “Insecure Design is a new category for 2021, with a focus on risks related to design flaws”.
[1] https://support.microsoft.com/en-au/topic/kb5042421-crowdstrike-issue-impacting-windows-endpoints-causing-an-0x50-or-0x7e-error-message-on-a-blue-screen-b1c700e0-7317-4e95-aeee-5d67dd35b92f
[2] https://owasp.org/www-project-top-ten/
Great post, Oscar!
The effects on airports, banks, and hospitals demonstrate the interconnectivity of our systems and our dependence on cybersecurity solutions such as CrowdStrike. As Michael pointed out, it’s concerning how quickly trust in these firms can be shaken when a single flaw disrupts so much.
Being transparent about the improvements could aid in regaining trust from clients and the public. This incident underscores the importance for all cybersecurity firms to prioritize security measures and ongoing employee training to reduce the risk of future breaches.
Ultimately, it’s crucial for the industry to learn from this event and strengthen their defenses to ensure a more secure digital environment for everyone.
Thanks Ankita! You are right in pointing out the dependence on specific companies. But, on the other hand this kind of software is very specialized, there are not many options out there. But as you well mentioned, it is very important for us to learn from these events and to be aware of these issues. Thanks for your comment!
Hi Oscar,
I really enjoyed the information you provided during your discussion. Before reading your post, I needed to understand better what technical functionality failed during this summer’s CrowdStrike issue. However, your post has helped me grasp that the main problem occurred due to missing input parameters in a process that used 21 fields. Reading about the aftermath and its impact on the company’s reputation was shocking, as its market value dropped significantly after the incident. Lastly, I would like to share your opinions regarding time, revealing whether the company’s actions will fully restore its public image. After causing such a significant outage, it will be challenging for regular users to have the utmost faith in these systems.
Glad to hear you enjoyed it Harshad! I agree with the company’s possible future, it will be complicated to change the CrowdStrike’s public opinion. But don’t forget that this has not been the only company that has performed such a disastrous mistake, all companies made mistakes so let’s hope they can rearrange things and cope with the damage. Thanks for sharing your thoughts!
Thank you Oscar!! I certainly enjoyed reading your analysis on this incident and I actually have a question if you wouldn’t mind. In your opinion, what tactics can businesses use to more effectively strike a balance between these conflicting priorities? How does this incident highlight the contradiction between the necessity for quick security updates and the possible consequences of broad disruption in enterprise environments?
Hey David! Thanks for the comment! About the question, in my opinion it is necessary to have more release controls. To me, the most shocking point is that they did not perform tests in a development environment; they just moved it directly to production! In my experience, making tests in different environments is an important step in software development and doing it directly in production is a bad practice. Even when the updates need to be uploaded quickly, it is necessary to make these checks. Thanks for pointing out this!
It’s worrying that they moved forward with the rollout despite one test failure, even though 20 tests passed. For a security product like CrowdStrike Falcon EDR, achieving a 100% pass rate is essential before production. Even a single failed test could leave room for vulnerabilities or security weaknesses. I wasn’t aware of the specific test details until now.
A simple flaw in the implementation made such an impact in business and daily life activities! It is a real challenge for all of us to make sure the products and service complies with quality standards. Thanks for the comment Smruti!
Great post Oscar, this has made headlines for weeks and headaches for multiple organizations worldwide, and as everyone rightly quoted this has crippled lives of people on the move, looking for services , withdrawing money, getting paycheques, availing healthcare pretty much everyone was touched around the globe. You have very clearly identified the mistakes on the developer side and overlooking testing of the application before rolling out to its clients, what struck me is how it impacted all those clients and Microsoft even though its not directly their fault. Microsoft was compelled by an EU order back in 2009 to allow security companies access to its Kernel allowing third party applications to provide protection. Ref# https://cepa.org/article/crowdstrike-crash-regulators-please-dont-interfere/.
I am also baffled, how so many of these Crowdstrike Clients, major businesses across the world has allowed this update to be rolled out directly to all endpoints across their environment without even testing. In any organization as a best practice all apps be it third party or in-house are tested in QA/Dev environments before they are being rolled out , which is a norm across IT infrastructure. If we dig more into this news we will come across organizations waiting to test this new update on the fateful day and roll out to a group of test endpoints before being rolled out across the board. This basic step in application deployment would have saved hundreds of organizations across the world from such catastrophic breakdowns in services.
Thanks Kaushik! The impact of this issue was pretty noticeable worldwide. I agree that quality tests and environments are very important for such crucial software tools. Let’s take this as a lesson for all of us and try to do our best to ensure the quality of our products and services.
Great post Oscar, this has made headlines for weeks and headaches for multiple organizations worldwide, and as everyone rightly quoted this has crippled lives of people on the move, looking for services , withdrawing money, getting paycheques, availing healthcare pretty much everyone was touched around the globe. You have very clearly identified the mistakes on the developer side and overlooking testing of the application before rolling out to its clients, what struck me is how it impacted all those clients and Microsoft even though its not directly their fault. Microsoft was compelled by an EU order back in 2009 to allow security companies access to its Kernel allowing third party applications to provide protection. Ref# https://cepa.org/article/crowdstrike-crash-regulators-please-dont-interfere/.
I am also baffled, how so many of these Crowdstrike Clients, major businesses across the world has allowed this update to be rolled out directly to all endpoints across their environment without even testing. In any organization as a best practice all apps be it third party or in-house are tested in QA/Dev environments before they are being rolled out , which is a norm across IT infrastructure. If we dig more into this news we will come across organizations waiting to test this new update on the fateful day and roll out to a group of test endpoints before being rolled out across the board. This basic step in application deployment would have saved hundreds of organizations across the world from such catastrophic breakdowns in services.
Great Post! It discusses the CrowdStrike Falcon EDR incident, in which a minor coding error led to significant disruptions. This incident highlights the importance of rigorous testing and swift responses in cybersecurity.
Well done, Oscar. I think that to prevent situations like this, it is in part our joint responsibility as students and future professionals in the cybersecurity ecosystem to come up with lengthy and trustworthy answers to problems of this kind. Although doctors save lives, as we were taught growing up, the paradigm has changed in part because of our strong dependence on technology. We are too far gone and there is no way out. Because of this, I believe the only way out is to enforce a more secure and safe digital environment free from costly mistakes like these.
I totally agree Mohammed! We have a dependency over the technology. It could be good or bad, that depends on the personal opinion on the matter. But as you well mentioned, a doctor save lives, now IT experts will also provide services that could improve the life of many people and the security of the technology will also play an important role in. Thanks for sharing your thoughts!
Great post, Oscar!
This incident serves as a crucial reminder for us to conduct thorough testing and validation in software development, especially for security applications. This occurrence serves as an important lesson for all organizations to prioritize careful testing of their system or software to avoid any attack or disaster in the future.
Great insights!, it is even more interesting reading about Microsoft’s recovery method. The error presents the notorious and clueless Window Blue Screen of Death which could be a challenge point especially if the computer’s disk is encrypted. In a mitigation effort from Microsoft, it was mentioned in the step (5) of the recovery procedure to logon to a Microsoft portal and retrieve the disk encryption key. But what if disk encrypted with different software that’s not Microsoft, or what if the encryption management system is running on a Windows impacted system?
From a different angle, many of the “zero-day attacks” could mandate a rapid deployment where no time for conducting rigorous testing and staging phases.
Finally, even the CrowdStrike agent flaw is not a security attack, however it surfaced a vulnerability that very similar to the Heart Bleeding happened to OpenSsl and waiting for an exploitation either by an attacker or by a misconfiguration. According to the OWASP Top Ten Risks 2025, in particular A04:2021 points to the “Insecure Design is a new category for 2021, with a focus on risks related to design flaws”.
I think that the CrowdStrike Falcon EDR event also highlights the inherent dangers of security software that operate at elevated privilege levels. Because of the architecture of this kind of program, it needs a lot of access to system resources, which could unintentionally lead to the creation of potential exploitation vectors. When software runs with elevated privileges, a little bug can have a big impact.
Thank you Oscar for the insight on the Crowdstrike incident!. This goes to show how technology connects the whole world and the disruptions a single update flaw caused. Impacts of threats and vulnerabilities is not limited to information technology professionals alone therefore, everyone has a role to play in keeping the technology space safe and leaving no room for threats and vulnerabilities (they never stop showing up..lol) at least to some extent keep the world moving safely.
Absolutely, Ukamaka! While IT professionals implement multi-layer security measures, adversaries deploy new approaches. I believe that training users about different types of social engineering attacks will help combat this problem.