How Smarty® sidestepped the global IT outage with 100% uptime
A recent defective update from CrowdStrike, a leading cybersecurity provider, caused a global outage affecting millions of Windows systems. Oops. The faulty update to CrowdStrike's Falcon software, intended for endpoint detection and response, led to the infamous "blue screen of death" (BSOD) and rendered many systems non-bootable. The widespread disruption primarily impacted large organizations reliant on CrowdStrike's services, sparing home PCs and systems running on Mac and Linux.
Like the rest of the world, Smarty employees were also impacted. We experienced hassles ranging from delayed Amazon shipments and failed pizza orders to 24-hour+ flight delays.
Smarty's resilience and uptime
Amid the widespread disruption impacting the globe and our employees, we’re pleased to report that our operations, including our suite of address APIs, have experienced no downtime or performance issues related to the outage.
How did Smarty® manage to escape the carnage? Here's what our team has to say:
"We have zero dependencies on any Microsoft products at all," Jonathan Duncan, Smarty's Operations Engineer.
Smarty continues to deliver robust services with a stellar uptime of 99.98+% for their customers. As an industry leader in address validation, address autocomplete, and geocoding for US and international addresses, Smarty's unaffected status underscores its reliable infrastructure and commitment to customer satisfaction.
The key to Smarty's success is system redundancy. "This is what happens when so much of the world relies on a single company for their systems," Jayden Backus, Smarty's IT Security Specialist, said regarding 10-hour flight delays and other issues related to the crash.
Most important, is Smarty's approach to infrastructure strategy.
"We don't put all of our eggs in one basket. We don't rely on any single provider for our infrastructure to operate properly," continued Duncan. "We are provider-agnostic. If any provider goes down, we don't notice because we're already using another."
Setting up such an infrastructure is challenging, but when dependability is important to your business, Smarty has you covered.
The nature of the outage
The issue stemmed from a content update for Windows machines, as explained by CrowdStrike's CEO George Kurtz. "This is not a security incident or cyberattack," Kurtz stated. "The issue has been identified, isolated, and a fix has been deployed.” The problem originated from a bug in the update that disrupted the low-level operating system layer, which is crucial for booting Windows machines. Approximately 8.5 million devices were affected globally, causing significant disruptions in various sectors, including banking, healthcare, and air travel.
Lukasz Olejnik, an independent cybersecurity researcher and consultant, emphasized the severity of the situation: "The CrowdStrike software works at the low-level operating system layer. Issues at this level make the OS not bootable."
Manual intervention required
While CrowdStrike quickly rolled out an automated fix, many systems required manual intervention to resolve the issue. IT teams worldwide faced the daunting task of addressing affected systems, especially those turned off during the update rollout and didn’t receive the automated fix.
Andrew Dwyer of the Department of Information Security at Royal Holloway, University of London, pointed out the potential for more severe consequences: "We’ve been quite lucky that this is an outage and not an exploitation by a criminal gang or another state. It also shows how easy it is to inflict quite significant global damage if you get into the right part of the IT supply chain."
CrowdStrike's response and resolution
CrowdStrike has been actively working with affected customers, providing continuous updates via their support portal and website. The company has emphasized that while the outage caused substantial inconvenience, it didn’t lead to any long-term damage to affected systems.
Conclusion
The CrowdStrike outage highlighted vulnerabilities in critical IT infrastructure and the significant impact such disruptions can have on global operations. However, companies like Smarty demonstrate the importance of resilient systems and proactive management in maintaining service continuity. As businesses increasingly rely on technology, ensuring robust and dependable services becomes paramount in mitigating risks and sustaining operations.