We hope you’re not caught up in remediation after the Microsoft and CrowdStrike outage. Even if you escaped this latest global infrastructure crisis, you are likely asking “we should patch less?” Not at all. In fact, we need to get much, much better at updates and changes.

To keep up with rapid innovation in the industry, IT infrastructure faces the dual challenge of maintaining stability and embracing adaptability. The need for reliable, uninterrupted service clashes with the push to innovate and adapt to new technologies and vendor changes. Striking the right balance is key for the long-term success and resilience of IT infrastructure.

Want more details?  Checkout out our Crowd Strike discussion on the Cloud2030 Podcast.

The systems stability vs. adaptability dilemma

For too long, operations teams have been trapped in an either/or conundrum.

 

More Stability Please! But We Need Adaptability!
Our stability avoids costly downtime and maintains resilient, cost-effective operations. We need change and innovation to remain secure and competitive.
Beware of a risk-averse approach, relying only on freezing technologies and processes. Beware of jumping to new technology without an integrated infrastructure management strategy.

 

Achieving a balance between stability and adaptability requires a strategic approach:

  1. Standardized processes

Implementing standardized processes can provide a reliable and repeatable foundation for managing infrastructure changes. Infrastructure as Code (IaC) techniques like version control and tracking of changes ensure that updates are managed in a controlled manner, enhancing stability.

  1. Cross-organization compatibility 

Standardized processes allow for building infrastructure management systems that are compatible across different companies, vendors and technologies for greater flexibility and resilience. This ensures that solutions can be tested, validated and shared between diverse environments, contributing to both robustness and reliability.

  1. Continuous improvement

Cross-organization compatibility enables rapid feedback loops both inside and across the industry which then enables continuous improvement of processes. This approach ensures that the latest advancements and best practices are consistently integrated into your infrastructure management system even before you are ready to adopt them.

  1. Resilience through redundancy

Continuous improvement can then provide multiple ways to accomplish critical tasks is essential for maintaining resilience. Alternative paths for managing infrastructure tasks ensure that if one control mechanism fails, there are backup options available.

 

Real-world challenges and adaptations

Global outages, like CrowdStrike’s Blue Screen of Death patch, demonstrate the critical need for continuous updates and tuning of infrastructure. But these updates can cause serious disruptions if not managed correctly. A controlled, standardized process for updates can reduce the risk of widespread outages.

Other market factors demonstrate the need for adaptability, like how shortages in GPUs, memory, and processors expose the need for flexibility in vendor choices to maintain supply chain resilience. Even long term environmental needs are likely to drive emerging infrastructure technologies, such as ARM processors, that offer power and cooling advantages for data centers. These examples only go to show that infrastructure has to be capable of quickly responding to sudden developments in the landscape without sacrificing stability.

It’s clear that IT operations teams must design evolving technical needs into their strategy. It’s necessary to have multiple ways to accomplish your objectives because over-reliance on a single vendor or platform is risky. For instance, in the CrowdStrike failure, being able to re-image a server and reinstall critical software outside of Microsoft’s process would ensure that operations can resume even when the primary control mechanisms fail. This redundancy is strategic across software, appliance, and utility vendors.

 

Beyond the CrowdStrike outage

This most recent event makes the need for a consistent, standardized strategy even more obvious. Managing infrastructure effectively requires navigating the paradox of maintaining stability while embracing change. This involves adopting standardized processes to ensure controlled updates, fostering cross-vendor compatibility to enhance flexibility, and building resilience through redundancy to handle disruptions.

By focusing on these strategies, IT operations can create an infrastructure that is both stable and adaptable. This balanced approach ensures that systems maintain stability while embracing the necessary innovations to stay ahead.

To keep up with news like this and more, subscribe to the RackN newsletter.

 

Graphic for CrowdStrike outage

Date

July 19, 2024

Author

Categories

Tags