Supply Chain
Friday, August 23, 2024
By Davis DeRodes
The CrowdStrike outage that occurred on the evening of July 18 (UTC) into the following morning affected thousands of organizations worldwide, serving as a stark reminder of how interconnected our world is today. This incident – the largest IT outage that’s ever taken place – forced many organizations to resort to manual processes or halt services altogether, highlighting the vulnerabilities that can arise from reliance on a single software provider or third-party resource.
The widespread impact across various sectors, including major airlines, healthcare organizations, emergency response teams, and financial services, showcases the interlinked nature of our modern business ecosystem – and the potential for cascading failures.
As organizations navigate an increasingly complex risk environment, it’s clear that traditional approaches to resilience planning have fallen short. The CrowdStrike outage created widespread disruption across multiple industries, demonstrating the need for organizations to be prepared for severe yet plausible scenarios that may have previously been considered unlikely or outside of their control.
To truly enhance their organizational resilience, firms must implement solutions that offer comprehensive scenario simulation capabilities. This means having the ability to run thousands of permutations across various scenarios, allowing organizations to understand the potential impact of disruptive events and identify vulnerabilities that may not be apparent through traditional planning methods.
By automating this process, teams can gain insights that would be impossible to achieve through manual analysis alone. This type of understanding will allow organizations to know what could occur, how they will be impacted, and how to be better prepared for when a disruption happens – even in the case of the most severe disruptions.
Davis DeRodes: What is unlikely “can quickly become reality.”
These solutions should leverage data-driven insights, utilizing connected data from across the organization to generate a more accurate picture of the risk landscape and the interdependencies between various systems and processes. This holistic view is crucial in today’s evolving business landscape, where a disruption in one area of the business can have far-reaching consequences across the entire organization.
Organizations need to be able to answer the question: “If this plausible scenario were to happen, would I be able to provide my important services?”
Effective scenario testing should involve stakeholders from multiple departments to ensure a coordinated and effective response to disruption. This cross-functional approach helps to break down silos and foster a culture of shared responsibility for resilience. It can also ensure that response plans are practical and can be implemented effectively across different areas of the business.
The CrowdStrike outage particularly highlights the importance of supply chain scrutiny. Organizations must understand which services rely on IT systems and third-party vendors that business users may be unaware of – but that could have devasting consequences. This includes not just direct suppliers, but also the suppliers of those suppliers, creating a complex web of dependencies that needs to be mapped and understood. This also includes the concentration of said suppliers.
Ask yourself: “Is my organization too reliant on any individual cyber provider?”
Regular and diverse testing is essential. Organizations should conduct scenario testing more frequently, at all levels, and against a wide range of potential disruptions. After all, the events of the past few years have shown us that what once seemed plausible but unlikely can quickly become reality.
It’s no longer about if the next major incident will occur, but when. By expanding the scope of scenario testing, organizations can be better prepared for a wider range of potential disruptions.
Crucially, this type of scenario testing provides teams with the ability to identify and prioritize vulnerabilities based on material impact, rather than just size. A non-critical, low-concentration supplier could take a day to recover and have minimal impact on your organization, while a highly critical application being disrupted for an hour could be disastrous for your customers.
This nuanced approach to risk assessment allows organizations to focus their resources on addressing the most critical vulnerabilities first, maximizing the effectiveness of their resilience efforts.
While it’s impossible to predict every potential disruption, organizations that invest in robust scenario testing capabilities will be better positioned to test more frequently and more effectively, and, as a result, respond quickly when a crisis occurs. This proactive approach helps to minimize downtime, reduce financial losses and protect the organization’s reputation in the face of unexpected challenges.
Comprehensive scenario testing can also reveal opportunities for improving business processes and identifying redundancies or alternative solutions that can be implemented to enhance overall resilience. This can lead to a more agile and adaptive organization that is better equipped to navigate an uncertain future.
The CrowdStrike outage demonstrates the importance of having contingency plans in place, even for systems and services that are considered highly reliable. Organizations should not simply rely on their IT departments or third-party providers to resolve issues, but, rather, should have clear plans for how to continue critical operations in the face of extended outages.
As we move forward, it’s evident that the ability to anticipate and prepare for a wide range of scenarios will be a key differentiator for successful organizations. Those that can effectively simulate and prepare for various disruptions will be better positioned to maintain continuity of critical operations, protect their reputation, and even gain a competitive advantage during times of crisis.
The CrowdStrike outage was a wake-up call for organizations to reassess their resilience strategies and implement more comprehensive scenario testing and risk management programs now. This is a critical step that organizations must take to bolster their resilience postures and be better equipped to face the challenges of an increasingly unpredictable business environment.
As the complexity and interconnectedness of business operations continues to grow, so must our approach to risk management and resilience planning. We’ve now seen that even the most reliable systems can fail, and the consequences can be far-reaching. Now is the time to invest in powerful, intelligent scenario-testing tools in order to anticipate, prepare for, and respond to any disruption as well as remain resilient in the face of any challenge that the future may bring.
As the lead data scientist at Fusion Risk Management, Davis DeRodes leads the product research and development of applications of data science and artificial intelligence to solve business continuity and operational resilience problems, such as scenario testing. He resides in Durham, North Carolina, and is an alumnus of Columbia’s Data Science Institute.
•Bylaws •Code of Conduct •Privacy Notice •Terms of Use © 2024 Global Association of Risk Professionals