It’s been a couple of months since the CrowdStrike incident occurred and the world is starting to resume some sort of normality after massive technology outage that was caused by a flawed software update being sent to millions of computers. The update of a “sensor configuration update” rendered impacted Windows systems useless. These updates are released regularly, sometimes several times a day, as an ongoing part of the protection mechanisms of the company’s Falcon platform.
The precise reasons this flawed update was allowed to be pushed out are still being investigated. But three decades of working in the IT industry tell me the root cause will come down to a combination of human error in writing software code and a failure of test procedures. But that’s not the big lesson. What we have learned is that a single piece of seemingly innocuous software that can be updated many times a week, without any interaction from end-users, can bring a computer system to its knees.
While CrowdStrike and thousands of IT support teams all over the world scramble to remediate systems (CrowdStrike has done a good job of issuing fixes and providing guidance) I wonder how many really understand how deeply embedded some software products are and the impact of a bad update.
Apple, of course, is not immune from such incidents, even though Macs (and Linux) systems were not impacted by the CrowdStrike incident.
In November 2020 the release of macOS Big Sur led to extensive slowdowns and failures in downloading the software. These affected access to services including iMessage and Apple Pay. This is similar to the CrowdStrike software issue. A similar incident in April 2024 led to widespread outages for many Apple services.
What we are talking about here is risk management. And, more specifically, the ability to understand risks. How many risk registers around the world include “Catastrophic systems failure caused by a flawed software update”? In my experience, the answer will be very few. In many cases, the risk of a software update causing problems might be recognised. In most cases it will be accepted because the risk of not taking a security update, such as the one CrowdStrike pushed out on 19 July 2024, outweighs the risk of not having a patched and up-to-date system.
The big takeaway from this incident is that organisations will need to better understand what software is installed on systems, how deeply embedded into the operating system they are (my understanding is that the CrowdStrike Falcon software operates at a very deep level of the operating system), how often they are updated and what procedures are in place to check updates before they are pushed put across entire corporate networks.
If I was back in the chair as an IT leader, I’d be ensuring I had some ‘canary in the coal mine’ systems that receive updates automatically and are monitored before I allowed automatic updates to propagate across a network. There was a time when this was standard operating procedure. But, it seems based on the impact of this update, that trust has taken the place of verification.
In the word of information security, we often talk about Zero Trust – where all activities are verified to ensure they are not intentionally or accidentally malicious. Perhaps it’s time to take a Zero Trust approach to software updates and all third party vendors and suppliers.
Anthony is the founder of Australian Apple News. He is a long-time Apple user and former editor of Australian Macworld. He has contributed to many technology magazines and newspapers as well as appearing regularly on radio and occasionally on TV.